How Cisco accelerated using generative AI with Amazon SageMaker Inference

This submit is co-authored with Travis Mehlinger and Karthik Raghunathan from Cisco.

Webex by Cisco is a number one supplier of cloud-based collaboration options, together with video conferences, calling, messaging, occasions, polling, asynchronous video, and buyer expertise options like contact middle and purpose-built collaboration units. Webex’s give attention to delivering inclusive collaboration experiences fuels their innovation, which makes use of synthetic intelligence (AI) and machine studying (ML), to take away the obstacles of geography, language, persona, and familiarity with expertise. Its options are underpinned with safety and privateness by design. Webex works with the world’s main enterprise and productiveness apps—together with AWS.

Cisco’s Webex AI (WxAI) group performs an important position in enhancing these merchandise with AI-driven options and functionalities, utilizing massive language fashions (LLMs) to enhance consumer productiveness and experiences. Previously yr, the group has more and more targeted on constructing AI capabilities powered by LLMs to enhance productiveness and expertise for customers. Notably, the group’s work extends to Webex Contact Heart, a cloud-based omni-channel contact middle resolution that empowers organizations to ship distinctive buyer experiences. By integrating LLMs, the WxAI group allows superior capabilities equivalent to clever digital assistants, pure language processing (NLP), and sentiment evaluation, permitting Webex Contact Heart to offer extra customized and environment friendly buyer assist. Nevertheless, as these LLM fashions grew to include a whole lot of gigabytes of information, the WxAI group confronted challenges in effectively allocating assets and beginning functions with the embedded fashions. To optimize its AI/ML infrastructure, Cisco migrated its LLMs to Amazon SageMaker Inference, bettering pace, scalability, and price-performance.

This submit highlights how Cisco applied new functionalities and migrated present workloads to Amazon SageMaker inference parts for his or her industry-specific contact middle use instances. By integrating generative AI, they will now analyze name transcripts to higher perceive buyer ache factors and enhance agent productiveness. Cisco has additionally applied conversational AI experiences, together with chatbots and digital brokers that may generate human-like responses, to automate customized communications primarily based on buyer context. Moreover, they’re utilizing generative AI to extract key name drivers, optimize agent workflows, and achieve deeper insights into buyer sentiment. Cisco’s adoption of SageMaker Inference has enabled them to streamline their contact middle operations and supply extra satisfying, customized interactions that tackle buyer wants.

On this submit, we talk about the next:

Cisco’s enterprise use instances and outcomes
How Cisco accelerated using generative AI powered by LLMs for his or her contact middle use instances with the assistance of SageMaker Inference
Cisco’s generative AI inference structure, which is constructed as a strong and safe basis, utilizing numerous providers and options equivalent to SageMaker Inference, Amazon Bedrock, Kubernetes, Prometheus, Grafana, and extra
How Cisco makes use of an LLM router and auto scaling to route requests to acceptable LLMs for various duties whereas concurrently scaling their fashions for resiliency and efficiency effectivity.
How the options on this submit impacted Cisco’s enterprise roadmap and strategic partnership with AWS
How Cisco helped SageMaker Inference construct new capabilities to deploy generative AI functions at scale

Enhancing collaboration and buyer engagement with generative AI: Webex’s AI-powered options

On this part, we talk about Cisco’s AI-powered use instances.

Assembly summaries and insights

For Webex Conferences, the platform makes use of generative AI to mechanically summarize assembly recordings and transcripts. This extracts the important thing takeaways and motion gadgets, serving to distributed groups keep knowledgeable even when they missed a stay session. The AI-generated summaries present a concise overview of essential discussions and choices, permitting staff to shortly stand up to hurry. Past summaries, Webex’s generative AI capabilities additionally floor clever insights from assembly content material. This consists of figuring out motion gadgets, highlighting crucial choices, and producing customized assembly notes and to-do lists for every participant. These insights assist make conferences extra productive and maintain attendees accountable.

Enhancing contact middle experiences

Webex can be making use of generative AI to its contact middle options, enabling extra pure, human-like conversations between clients and brokers. The AI can generate contextual, empathetic responses to buyer inquiries, in addition to mechanically draft customized emails and chat messages. This helps contact middle brokers work extra effectively whereas sustaining a excessive stage of customer support.

Webex clients notice optimistic outcomes with generative AI

Webex’s adoption of generative AI is driving tangible advantages for purchasers. Shoppers utilizing the platform’s AI-powered assembly summaries and insights have reported productiveness features. Webex clients utilizing the platform’s generative AI for contact facilities have dealt with a whole lot of 1000’s of calls with improved buyer satisfaction and decreased deal with instances, enabling extra pure, empathetic conversations between brokers and shoppers. Webex’s strategic integration of generative AI is empowering customers to work smarter and ship distinctive experiences.

For extra particulars on how Webex is harnessing generative AI to boost collaboration and buyer engagement, see Webex | Distinctive Experiences for Each Interplay on the Webex weblog.

Utilizing SageMaker Inference to optimize assets for Cisco

Cisco’s WxAI group is devoted to delivering superior collaboration experiences powered by cutting-edge ML. The group develops a complete suite of AI and ML options for the Webex ecosystem, together with audio intelligence capabilities like noise removing and optimizing speaker voices, language intelligence for transcription and translation, and video intelligence options like digital backgrounds. On the forefront of WxAI’s improvements is the AI-powered Webex Assistant, a digital assistant that gives voice-activated management and seamless assembly assist in a number of languages. To construct these refined capabilities, WxAI makes use of LLMs, which might include as much as a whole lot of gigabytes of coaching knowledge.

Initially, WxAI embedded LLM fashions straight into the applying container pictures operating on Amazon Elastic Kubernetes Service (Amazon EKS). Nevertheless, because the fashions grew bigger and extra advanced, this method confronted vital scalability and useful resource utilization challenges. Working the resource-intensive LLMs by means of the functions required provisioning substantial compute assets, which slowed down processes like allocating assets and beginning functions. This inefficiency hampered WxAI’s potential to quickly develop, check, and deploy new AI-powered options for the Webex portfolio. To deal with these challenges, the WxAI group turned to SageMaker Inference—a totally managed AI inference service that permits seamless deployment and scaling of fashions independently from the functions that use them. By decoupling the LLM internet hosting from the Webex functions, WxAI might provision the mandatory compute assets for the fashions with out impacting the core collaboration and communication capabilities.

“The functions and the fashions work and scale basically in another way, with totally completely different price issues; by separating them relatively than lumping them collectively, it’s a lot less complicated to resolve points independently.”

– Travis Mehlinger, Principal Engineer at Cisco.

This architectural shift has enabled Webex to harness the facility of generative AI throughout its suite of collaboration and buyer engagement options.

Answer overview: Enhancing effectivity and lowering prices by migrating to SageMaker Inference

To deal with the scalability and useful resource utilization challenges confronted with embedding LLMs straight into their functions, the WxAI group migrated to SageMaker Inference. By benefiting from this totally managed service for deploying LLMs, Cisco unlocked vital efficiency and cost-optimization alternatives. Key advantages embody the flexibility to deploy a number of LLMs behind a single endpoint for quicker scaling and improved response latencies, in addition to price financial savings. Moreover, the WxAI group applied an LLM proxy to simplify entry to LLMs for Webex groups, allow centralized knowledge assortment, and cut back operational overhead. With SageMaker Inference, Cisco can effectively handle and scale their LLM deployments, harnessing the facility of generative AI throughout the Webex portfolio whereas sustaining optimum efficiency, scalability, and cost-effectiveness.

The next diagram illustrates the WxAI structure on AWS.

The structure is constructed on a strong and safe AWS basis:

The structure makes use of AWS providers like Software Load Balancer, AWS WAF, and EKS clusters for seamless ingress, menace mitigation, and containerized workload administration.
The LLM proxy (a microservice deployed on an EKS pod as a part of the Service VPC) simplifies the combination of LLMs for Webex groups, offering a streamlined interface and lowering operational overhead. The LLM proxy helps LLM deployments on SageMaker Inference, Amazon Bedrock, or different LLM suppliers for Webex groups.
The structure makes use of SageMaker Inference for optimized mannequin deployment, auto scaling, and routing mechanisms.
The system integrates Loki for logging, Amazon Managed Service for Prometheus for metrics, and Grafana for unified visualization, seamlessly built-in with Cisco SSO.
The Information VPC homes the info layer parts, together with Amazon ElastiCache for caching and Amazon Relational Database Service (Amazon RDS) for database providers, offering environment friendly knowledge entry and administration.

Use case overview: Contact middle subject analytics

A key focus space for the WxAI group is to boost the capabilities of the Webex Contact Heart platform. A typical Webex Contact Heart set up has a whole lot of brokers dealing with many interactions by means of numerous channels like telephone calls and digital channels. Webex’s AI-powered Matter Analytics function extracts the important thing causes clients are calling about by analyzing aggregated historic interactions and clustering them into significant subject classes, as proven within the following screenshot. The contact middle administrator can then use these insights to optimize operations, improve agent efficiency, and finally ship a extra passable buyer expertise.

The Matter Analytics function is powered by a pipeline of three fashions: a name driver extraction mannequin, a subject clustering mannequin, and a subject labeling mannequin, as illustrated within the following diagram.

The mannequin particulars are as follows:

Name driver extraction – This generative mannequin summarizes the first purpose or intent (known as the name driver) behind a buyer’s name. Correct computerized tagging of calls with name drivers helps contact middle supervisors and directors shortly perceive the first purpose for any historic name. One of many key issues when fixing this downside was choosing the appropriate mannequin to stability high quality and operational prices. The WxAI group selected the FLAN T5 mannequin on SageMaker Inference and instruction fine-tuned it for extracting name drivers from name transcripts. FLAN-T5 is a strong text-to-text switch transformer mannequin that performs numerous pure language understanding and technology duties. This workload had a world footprint deployed in us-east-2, eu-west-2, eu-central-1, ap-southeast-1, ap-southeast-2, ap-northeast-1, and ca-central-1 AWS
Matter clustering – Though mechanically tagging each contact middle interplay with its name driver is a helpful function in itself, analyzing these name drivers in an aggregated style over a big batch of calls can uncover much more fascinating tendencies and insights. The subject clustering mannequin achieves this by clustering all of the individually extracted name drivers from a big batch of calls into completely different subject clusters. It does this by making a semantic embedding for every name driver and using an unsupervised hierarchical clustering approach that operates on the vector embeddings. This leads to distinct and coherent subject clusters the place semantically related name drivers are grouped collectively.
Matter labeling – The subject labeling mannequin is a generative mannequin that creates a descriptive identify to function the label for every subject cluster. A number of LLMs had been prompt-tuned and evaluated in a few-shot setting to decide on the best mannequin for the label technology job. Lastly, Llama2-13b-chat, with its potential to higher seize contextual nuances and semantics of pure language dialog, was used for its accuracy, efficiency, and cost-effectiveness. Moreover, Llama2-13b-chat was deployed and used on SageMaker inference parts, whereas sustaining comparatively low working prices in comparison with different LLMs, through the use of particular {hardware} like g4dn and g5

This resolution additionally used the auto scaling capabilities of SageMaker to dynamically alter the variety of situations primarily based on a desired minimal of 1 endpoint and most of 30. This method offers environment friendly useful resource utilization whereas sustaining excessive throughput, permitting the WxAI platform to deal with batch jobs in a single day and scale to a whole lot of inferences per minute throughout peak hours. By deploying the mannequin on SageMaker Inference with auto scaling, WxAI group was capable of ship dependable and correct responses to buyer interactions for his or her Matter Analytics use case.

By precisely pinpointing the decision driver, the system can counsel acceptable actions, assets, and subsequent steps to the agent, streamlining the shopper assist course of, additional resulting in customized and correct responses to buyer questions.

To deal with fluctuating demand and optimize useful resource utilization, the WxAI group applied auto scaling for his or her SageMaker Inference endpoints. They configured the endpoints to scale from a minimal to a most occasion depend primarily based on GPU utilization. Moreover, the LLM proxy routed requests between the completely different LLMs deployed on SageMaker Inference. This proxy abstracts the complexities of speaking with numerous LLM suppliers and allows centralized knowledge assortment and evaluation. This led to enhanced generative AI workflows, optimized latency, and customized use case implementations.

Advantages

By way of the strategic adoption of AWS AI providers, Cisco’s WxAI group has realized vital advantages, enabling them to construct cutting-edge, AI-powered collaboration capabilities extra quickly and cost-effectively:

Improved improvement and deployment cycle time – By decoupling fashions from functions, the group has streamlined processes like bug fixes, integration testing, and have rollouts throughout environments, accelerating their general improvement velocity.
Simplified engineering and supply – The clear separation of considerations between the lean software layer and resource-intensive mannequin layer has simplified engineering efforts and supply, permitting the group to give attention to innovation relatively than infrastructure complexities.
Decreased prices – By utilizing totally managed providers like SageMaker Inference, the group has offloaded infrastructure administration overhead. Moreover, capabilities like asynchronous inference and multi-model endpoints have enabled vital price optimization with out compromising efficiency or availability.
Scalability and efficiency – Companies like SageMaker Inference and Amazon Bedrock, mixed with applied sciences like NVIDIA Triton Inference Server on SageMaker, have empowered the WxAI group to scale their AI/ML workloads reliably and ship high-performance inference for demanding use instances.
Accelerated innovation – The partnership with AWS has given the WxAI group entry to cutting-edge AI providers and experience, enabling them to quickly prototype and deploy modern capabilities just like the AI-powered Webex Assistant and superior contact middle AI options.

Cisco’s contributions to SageMaker Inference: Enhancing generative AI inference capabilities

Constructing upon the success of their strategic migration to SageMaker Inference, Cisco has been instrumental in partnering with the SageMaker Inference group to construct and improve key generative AI capabilities inside the SageMaker platform. Because the early days of generative AI, Cisco has offered the SageMaker Inference group with helpful inputs and experience, enabling the introduction of a number of new options and optimizations:

Value and efficiency optimizations for generative AI inference – Cisco helped the SageMaker Inference group develop modern strategies to optimize using accelerators, enabling SageMaker Inference to scale back basis mannequin (ML) deployment prices by 50% on common and latency by 20% on common with inference parts. This breakthrough delivers vital price financial savings and efficiency enhancements for purchasers operating generative AI workloads on SageMaker.
Scaling enhancements for generative AI inference – Cisco’s experience in distributed methods and auto scaling has additionally helped the SageMaker group develop superior capabilities to higher deal with the scaling necessities of generative AI fashions. These enhancements cut back auto scaling instances by as much as 40% and auto scaling detection by 6 instances, so clients can quickly scale their generative AI workloads on SageMaker to satisfy spikes in demand with out compromising efficiency.
Streamlined generative AI mannequin deployment for inference – Recognizing the necessity for simplified generative AI mannequin deployment, Cisco collaborated with AWS to introduce the flexibility to deploy open supply LLMs and FMs with only a few clicks. This user-friendly performance removes the complexity historically related to deploying these superior fashions, empowering extra clients to harness the facility of generative AI.
Simplified inference deployment for Kubernetes clients – Cisco’s deep experience in Kubernetes and container applied sciences helped the SageMaker group develop new Kubernetes Operator-based inference capabilities. These improvements make it easy for clients operating functions on Kubernetes to deploy and handle generative AI fashions, lowering LLM deployment prices by 50% on common.
Utilizing NVIDIA Triton Inference Server for generative AI – Cisco labored with AWS to combine the NVIDIA Triton Inference Server, a high-performance mannequin serving container managed by SageMaker, to energy generative AI inference on SageMaker Inference. This enabled the WxAI group to scale their AI/ML workloads reliably and ship high-performance inference for demanding generative AI use instances.
Packaging generative AI fashions extra effectively – To additional simplify the generative AI mannequin lifecycle, Cisco labored with AWS to boost the capabilities in SageMaker for packaging LLMs and FMs for deployment. These enhancements make it easy to organize and deploy these generative AI fashions, accelerating their adoption and integration.
Improved documentation for generative AI – Recognizing the significance of complete documentation to assist the rising generative AI ecosystem, Cisco collaborated with the AWS group to boost the SageMaker documentation. This consists of detailed guides, greatest practices, and reference supplies tailor-made particularly for generative AI use instances, serving to clients shortly ramp up their generative AI initiatives on the SageMaker platform.

By carefully partnering with the SageMaker Inference group, Cisco has performed a pivotal position in driving the speedy evolution of generative AI Inference capabilities in SageMaker. The options and optimizations launched by means of this collaboration are empowering AWS clients to unlock the transformative potential of generative AI with better ease, cost-effectiveness, and efficiency.

“Our partnership with the SageMaker Inference product group goes again to the early days of generative AI, and we consider the options we’ve inbuilt collaboration, from price optimizations to high-performance mannequin deployment, will broadly assist different enterprises quickly undertake and scale generative AI workloads on SageMaker, unlocking new frontiers of innovation and enterprise transformation.”

– Travis Mehlinger, Principal Engineer at Cisco.

Conclusion

By utilizing AWS providers like SageMaker Inference and Amazon Bedrock for generative AI, Cisco’s WxAI group has been capable of optimize their AI/ML infrastructure, enabling them to construct and deploy AI-powered options extra effectively, reliably, and cost-effectively. This strategic method has unlocked vital advantages for Cisco in deploying and scaling its generative AI capabilities for the Webex platform. Cisco’s personal journey with generative AI, as showcased on this submit, presents helpful classes and insights for different makes use of of SageMaker Inference.

Recognizing the affect of generative AI, Cisco has performed an important position in shaping the way forward for these capabilities inside SageMaker Inference. By offering helpful insights and hands-on collaboration, Cisco has helped AWS develop a variety of highly effective options which might be making generative AI extra accessible and scalable for organizations. From optimizing infrastructure prices and efficiency to streamlining mannequin deployment and scaling, Cisco’s contributions have been instrumental in enhancing the SageMaker Inference service.

Transferring ahead, the Cisco-AWS partnership goals to drive additional developments in areas like conversational and generative AI inference. As generative AI adoption accelerates throughout industries, Cisco’s Webex platform is designed to scale and streamline consumer experiences by means of numerous use instances mentioned on this submit and past. You may count on to see ongoing innovation from this collaboration in SageMaker Inference capabilities, as Cisco and SageMaker Inference proceed to push the boundaries of what’s attainable on the earth of AI.

For extra data on Webex Contact Heart’s Matter Analytics function and associated AI capabilities, discuss with The Webex Benefit: Navigating Buyer Expertise within the Age of AI on the Webex weblog.

Concerning the Authors

Travis Mehlinger is a Principal Software program Engineer within the Webex Collaboration AI group, the place he helps groups develop and function cloud-centered AI and ML capabilities to assist Webex AI options for purchasers world wide. In his spare time, Travis enjoys cooking barbecue, enjoying video video games, and touring across the US and UK to race go-karts.

Karthik Raghunathan is the Senior Director for Speech, Language, and Video AI within the Webex Collaboration AI Group. He leads a multidisciplinary group of software program engineers, machine studying engineers, knowledge scientists, computational linguists, and designers who develop superior AI-driven options for the Webex collaboration portfolio. Previous to Cisco, Karthik held analysis positions at MindMeld (acquired by Cisco), Microsoft, and Stanford College.

Saurabh Trikande is a Senior Product Supervisor for Amazon SageMaker Inference. He’s captivated with working with clients and is motivated by the purpose of democratizing machine studying. He focuses on core challenges associated to deploying advanced ML functions, multi-tenant ML fashions, price optimizations, and making deployment of deep studying fashions extra accessible. In his spare time, Saurabh enjoys mountain climbing, studying about modern applied sciences, following TechCrunch and spending time along with his household.

Ravi Thakur is a Senior Options Architect at AWS, primarily based in Charlotte, NC. He makes a speciality of fixing advanced enterprise challenges utilizing distributed, cloud-centered, and well-architected patterns. Ravi’s experience consists of microservices, containerization, AI/ML, and generative AI. He empowers AWS strategic clients on digital transformation journeys, delivering bottom-line advantages. In his spare time, Ravi enjoys motorbike rides, household time, studying, films, and touring.

Amit Arora is an AI and ML Specialist Architect at Amazon Internet Companies, serving to enterprise clients use cloud-based machine studying providers to quickly scale their improvements. He’s additionally an adjunct lecturer within the MS knowledge science and analytics program at Georgetown College in Washington D.C.

Madhur Prashant is an AI and ML Options Architect at Amazon Internet Companies. He’s passionate concerning the intersection of human considering and generative AI. His pursuits lie in generative AI, particularly constructing options which might be useful and innocent, and most of all optimum for purchasers. Outdoors of labor, he loves doing yoga, mountain climbing, spending time along with his twin, and enjoying the guitar.

How Cisco accelerated using generative AI with Amazon SageMaker Inference

Evaluating Intercourse Ratios: Revisiting a Well-known Statistical Downside from the 1700s | by Ryan Burn | Aug, 2024

Which Regression approach do you have to use? | by Piero Paialunga | Aug, 2024

Which Regression approach do you have to use? | by Piero Paialunga | Aug, 2024

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

About Us

Category

Recent Posts