This put up has been co-written with Seunghyun Jeong, Sunwoo Lee and Eric Davis from SK Telecom.
SK Telecom (SKT), South Korea’s main telecommunications firm serving 30 million clients, is on the forefront of AI innovation. Consistent with its AI Pyramid Technique, which goals to unlock AI’s potential for anybody, wherever, anytime, SKT has collaborated with the AWS Generative AI Innovation Heart (GenAIIC) Customized Mannequin Program to discover domain-trained fashions utilizing Amazon Bedrock for telco-specific use instances.
This collaboration aligns with SKT’s imaginative and prescient of utilizing AI experience and strategic partnerships to develop progressive AI-based services. One such initiative centered on growing a customized resolution for grounded query answering (Q&A) based mostly on reference paperwork.
Retrieval Augmented Technology (RAG) is a well-liked approach for Q&A duties, providing improved factual accuracy and data grounding. Nonetheless, RAG faces challenges with producing a response not matching most popular tone, type, and manners for telco use instances, in addition to retrieving irrelevant paperwork, doubtlessly resulting in inaccurate responses. To deal with this, SKT and AWS GenAIIC aimed to make use of mannequin customization to enhance Anthropic Claude fashions on Amazon Bedrock in three key areas:
- Offering concise and informative solutions
- Appropriately referencing hyperlinks from retrieved paperwork
- Answering in a tone and elegance in line with SKT and just like floor reality solutions
Moreover, the group explored boosting smaller mannequin efficiency utilizing artificial information generated by larger giant language fashions (LLMs) for data distillation and situations with restricted labeled coaching information.
Amazon Bedrock is a totally managed service that provides a wide range of LLMs and basis fashions (FMs) together with capabilities comparable to Amazon Bedrock Information Bases, Amazon Bedrock Brokers, and Amazon Bedrock Guardrails that may expedite many generative AI use instances. Amazon Bedrock is the one totally managed service that gives you with the power to fine-tune Claude fashions. Amazon Bedrock presents an intuitive and safe method of fine-tuning Anthropic’s Claude fashions and extra. The fine-tuned Claude mannequin could be deployed utilizing Amazon Bedrock and may use the capabilities of Amazon Bedrock seamlessly, for instance, Amazon Bedrock Information Bases for the telco domain-specific RAG or Amazon Bedrock Brokers for the agentic utilization.
On this put up, we share how SKT customizes Anthropic Claude fashions for telco-specific Q&A relating to technical telecommunication paperwork of SKT utilizing Amazon Bedrock.
Answer overview
The group explored mixtures of immediate optimization, customization (fine-tuning), and information augmentation with artificial information. This multifaceted strategy aimed to maximise the advantages of every approach for the grounded Q&A era process.
Within the following sections, we discover these strategies in additional element.
Anthropic’s Claude customization with immediate optimization
Effective-tuning, which is out there by means of Amazon Bedrock for varied FMs, together with Anthropic’s Claude, permits adaptation of pre-trained language fashions for particular use instances. It’s significantly efficient for tailoring response type and format adherence.
The group first optimized the system immediate, implementing standardized pointers for reply formatting and doc quotation based mostly on Anthropic mannequin prompting greatest practices. Key focus areas included:
- Clear presentation of system instructions
- Constant use of code block formatting
- Context-based tailor-made responses
This immediate engineering, mixed with fine-tuning, yielded substantial enhancements:
- Over 50% improve in ROUGE-3 rating
- Over 25% enchancment in ROUGE-L rating
- Over 4% improve in embedding similarity rating
- Important progress in correct reference quotation
The iterative enhancement course of demonstrated cumulative advantages, with immediate updates alone exhibiting 35–40 % enhancements in key metrics, and the ultimate custom-made mannequin reaching 50–60 % features in some metrics.
This development clearly illustrates the cumulative advantages of mannequin customization by means of RAG, immediate engineering, and fine-tuning, leading to a mannequin that considerably outperformed each the baseline and the prompt-updated variations by way of ROUGE scores and quotation accuracy. ROUGE rating measures the similarity between floor truths and generated outcomes by computing N-gram phrase overlaps. The next desk summarizes these enhancements.
LLM | Immediate replace | Effective-tuning | Relative enchancment over baseline | ||
ROUGE-3 | ROUGE-L | Quotation accuracy | |||
Anthropic’s Claude 3 Sonnet | – | – | baseline | baseline | baseline |
Anthropic’s Claude 3 Sonnet | ✅ | – | +38.30% | +13.4% | +52.94% |
Anthropic’s Claude 3 Sonnet | ✅ | ✅ | +58.1% | +26.8% | +70.59% |
Artificial information for fine-tuning
To deal with the problem of restricted high-quality labeled coaching information, the group explored artificial information era methods. This strategy additionally facilitates data distillation from bigger LLMs to smaller, extra focused fashions, providing advantages comparable to decrease latency and price.
The group performed managed experiments utilizing:
- A baseline set of 500 floor reality samples
- An augmented set with 500 authentic over 1,500 artificial samples
- A bigger authentic set of two,000 samples
Artificial information was generated utilizing Anthropic’s Claude Sonnet 3, creating new question-answer pairs over the identical retrieved paperwork utilized in floor reality examples.
The outcomes had been evaluated utilizing each LLM-based comparability and human choice analysis. Human evaluators blindly ranked mannequin outputs, with scores assigned based mostly on choice (Greatest: 4, Second: 3, Third: 2, Worst: 1). The next desk reveals the outcomes of the human choice analysis scores.
Rank | Mannequin | Cumulative rating (absolute best: 160) |
1 | Effective-tuned with 2,000 authentic samples | 114 |
2 | Effective-tuned with 500 authentic and 1,500 artificial samples | 112 |
3 | Effective-tuned with 500 authentic samples | 85 |
4 | No fine-tuning (baseline) | 84 |
Some key findings embody:
- Small coaching units (500 samples) confirmed minimal enchancment over baseline
- Bigger coaching units (2,000 samples) scored significantly larger
- Synthetically augmented information carried out equally to equivalent-sized authentic information
Though having a big quantity of domain-specific coaching information is at all times ultimate, many companies have restricted obtainable datasets. In such situations, artificial information can play a vital function rather than authentic information. This demonstrates the potential of artificial information for mannequin customization.
Conclusion
SK Telecom’s collaboration with AWS GenAIIC showcases the corporate’s dedication to growing progressive AI options for telco challenges. Through the use of Amazon Bedrock to customise Anthropic’s Claude fashions, SKT has achieved vital efficiency enhancements for telco-specific, Korean language use instances with out the necessity to construct fashions from scratch. The proof of idea demonstrated vital enhancements:
- ~58% improve in ROUGE-3 rating
- ~27% improve in ROUGE-L rating
- Substantial enchancment in returning appropriate reference hyperlinks
This strategy, mixed with artificial information era methods, aligns with SKT’s AI Pyramid Technique, enabling quicker testing and growth of recent approaches. As SKT continues to give attention to key areas comparable to private AI assistants, AI healthcare, and AI information facilities, this collaboration with AWS represents a big step of their AI evolution and long-term competitiveness within the world AI panorama.
For these all in favour of working with AWS on comparable tasks, go to Generative AI Innovation Heart.
Concerning the Authors
Sungmin Hong is a Senior Utilized Scientist at AWS Generative AI Innovation Heart the place he helps expedite the number of use instances of AWS clients. Earlier than becoming a member of Amazon, Sungmin was a postdoctoral analysis fellow at Harvard Medical College. He holds Ph.D. in Laptop Science from New York College. Exterior of labor, Sungmin enjoys climbing, studying and cooking.
Sujeong Cha is a Deep Studying Architect on the AWS Generative AI Innovation Heart, the place she makes a speciality of mannequin customization and optimization. She has intensive hands-on expertise in fixing clients’ enterprise use instances by using generative AI in addition to conventional AI/ML options. Sujeong holds a M.S. diploma in Information Science from New York College.
Arijit Ghosh Chowdhury is a Scientist with the AWS Generative AI Innovation Heart, the place he works on mannequin customization and optimization. In his function, he works on utilized analysis in fine-tuning and mannequin evaluations to allow GenAI for varied industries. He has a Grasp’s diploma in Laptop Science from the College of Illinois at Urbana Champaign, the place his analysis centered on query answering, search and area adaptation.
Yiyue Qian is an Utilized Scientist II on the AWS Generative AI Innovation Heart, the place she helps offering generative AI options to AWS clients. On this function, she collaborates with a group of specialists to develop progressive AI-driven fashions for AWS clients throughout varied industries. Yiyue holds a Ph.D. in Laptop Science from the College of Notre Dame, the place her analysis centered on superior machine studying and deep studying methods.
Wei-Chih Chen is a Machine Studying Engineer on the AWS Generative AI Innovation Heart, the place he works on mannequin customization and optimization for LLMs. He additionally builds instruments to assist his group sort out varied facets of the LLM growth life cycle—together with fine-tuning, benchmarking, and load-testing—that accelerating the adoption of various use instances for AWS clients. He holds an M.S. diploma in Laptop Science from UC Davis.
Hannah Marlowe is a Senior Supervisor of Mannequin Customization on the AWS Generative AI Innovation Heart. Her group makes a speciality of serving to clients develop differentiating Generative AI options utilizing their distinctive and proprietary information to realize key enterprise outcomes. She holds a Ph.D in Physics from the College of Iowa, with a give attention to astronomical X-ray evaluation and instrumentation growth. Exterior of labor, she could be discovered climbing, mountain biking, and snowboarding across the mountains in Colorado.
Seunghyun Jeong (Steve) is a group chief of the Platform Utility group at SKT. He’s accountable for commercializing the International Intelligence Platform (GIP), which gives AI fashions and instruments. For many of his profession, he has been a PM growing varied cellular providers comparable to cellular pockets, style streaming, and unified login providers for SK. His group is increasing the supply of fashions and options to make it simpler for inside groups to use AI, contributing to SKT’s AI Transformation. Earlier than coming into the AI house, he was a Product Supervisor, growing and working varied cellular providers comparable to cellular pockets, style streaming, and unified login providers for the US and Korea.
Sunwoo Lee (Lois) is the group chief of the Information Building and Analysis Group inside SK Telecom’s International AI Tech division. She oversees the design and building of coaching information for language fashions, the mannequin efficiency analysis course of, and its utility to providers. Her profession has centered on NLP inside IT, which is a good match together with her background in Linguistics and Korean language schooling. Alongside her world-class group, she continues to discover and clear up fascinating issues comparable to methods to optimize the design of information for language mannequin coaching, which duties and strategies to implement for validating language mannequin efficiency, and the very best design of AI-human conversations.
Eric Davis is the vice chairman of the AI Tech Collaboration Group at SKT. Eric oversees tech collaborations with worldwide tech companions to customise giant language fashions (LLMs) for the telecommunications area. His groups are accountable for designing and constructing the datasets to tune LLMs, in addition to benchmarking LLMs usually and for the telecommunications area. Eric holds a Grasp of Science diploma in Laptop Science from Carnegie Mellon from the Language Applied sciences Institute and a Bachelor of Arts in Linguistics and Psychology from the College of California, Los Angeles.