Deploy voice brokers with Pipecat and Amazon Bedrock AgentCore Runtime

This publish is a collaboration between AWS and Pipecat.

Deploying clever voice brokers that keep pure, human-like conversations requires streaming to customers the place they’re, throughout net, cellular, and cellphone channels, even below heavy site visitors and unreliable community situations. Even small delays can break the conversational circulate, inflicting customers to understand the agent as unresponsive or unreliable. To be used circumstances reminiscent of buyer help, digital assistants and outbound campaigns, a pure circulate is crucial for consumer expertise. On this collection of posts, you’ll learn the way streaming architectures assist handle these challenges utilizing Pipecat voice brokers on Amazon Bedrock AgentCore Runtime.

In Half 1, you’ll learn to deploy Pipecat voice brokers on AgentCore Runtime utilizing totally different community transport approaches together with WebSockets, WebRTC and telephony integration, with sensible deployment steerage and code samples.

Advantages of AgentCore Runtime for voice brokers

Deploying real-time voice brokers is difficult: you want low-latency streaming, strict isolation for safety, and the flexibility to scale dynamically to unpredictable dialog quantity. With out an appropriately designed structure, you possibly can expertise audio jitter, scalability constraints, inflated prices as a result of over-provisioning, and elevated complexity. For a deeper dive into voice agent architectures, together with cascaded (STT → LLM → TTS) and speech-to-speech approaches discuss with our earlier publish, Constructing real-time voice assistants with Amazon Nova Sonic in comparison with cascading architectures.

Amazon Bedrock AgentCore Runtime addresses these challenges by offering a safe, serverless setting for scaling dynamic AI brokers. Every dialog session runs in remoted microVMs for safety. It auto-scales for site visitors spikes, and handles steady classes for as much as 8 hours, making it superb for lengthy, multi-turn voice interactions. It costs just for assets actively used, serving to to attenuate prices related to idle infrastructure.

Pipecat, an agentic framework for constructing real-time voice AI pipelines, runs on AgentCore Runtime with minimal setup. Bundle your Pipecat voice pipeline as a container and deploy it on to AgentCore Runtime. The runtime helps bidirectional streaming for real-time audio, and built-in observability to hint agent reasoning and power calls.

AgentCore Runtime requires ARM64 (Graviton) containers, so make sure that your Docker photos are constructed for the linux/arm64 system.

Streaming architectures for voice brokers on AgentCore Runtime

This publish assumes your familiarity of widespread voice agent architectures: particularly the cascaded fashions strategy, the place you join speech-to-text (STT) and text-to-speech (TTS) fashions in a pipeline, and the speech-to-speech mannequin strategy, like Amazon Nova Sonic. In case you are new to those ideas, begin with our earlier weblog posts on the 2 foundational approaches: cascaded and speech-to-speech earlier than persevering with.

When constructing voice brokers, latency is a crucial consideration, figuring out how pure and dependable a voice dialog feels. Conversations require near-instant responses, sometimes below one second end-to-end, to keep up a fluid, human-like rhythm.

To realize low latency, you must think about bi-directional streaming on a number of paths, together with:

Shopper to Agent: Your voice brokers will run on gadgets and purposes, from net browsers and cellular apps to edge {hardware}, every with distinctive community situations.
Agent to Mannequin: Your voice brokers depend on bidirectional streaming to work together with speech fashions. Most speech fashions expose real-time WebSocket APIs, which your agent runtime or orchestration framework can devour for audio enter and textual content or speech output. Mannequin choice performs a key function in reaching pure responsiveness. Choose fashions like Amazon Nova Sonic (or Amazon Nova Lite in a cascaded pipeline strategy) which might be optimized for latency and offers a quick Time-to-First-Token (TTFT).
Telephony: For conventional inbound or outbound calls dealt with by contact facilities or telephony methods, your voice agent should additionally combine with a telephony supplier. That is sometimes achieved by a handoff and/or Session Interconnect Protocol (SIP) switch, the place the dwell audio stream is transferred from the telephony system to your agent runtime for processing.

In Half 1 of this collection, we’ll deal with the Shopper to Agent connection and how one can decrease the first-hop community latency out of your edge machine to your voice agent and discover extra concerns in relation to different elements of voice agent structure.

As an example these ideas, we’ll discover 4 community transport approaches with concerns for:

How customers interface along with your voice brokers (net/cellular purposes or cellphone calls)
Efficiency consistency and resilience throughout variable community situations
Ease of implementation

Method	Description	Efficiency consistency	Ease of implementation	Appropriate for
WebSockets	Net and cellular purposes connects on to your voice brokers through WebSockets.	Good	Easy	Prototyping and light-weight use circumstances.
WebRTC (TURN-assisted)	Net and cellular purposes connects on to your voice brokers through WebRTC.	Glorious	Medium	Manufacturing use circumstances with latency derived from direct connection of the shopper to the runtime setting relayed through Traversal Utilizing Relays round NAT (TURN) servers.
WebRTC (managed)	Net and cellular purposes connect with your voice brokers by a complicated, globally distributed infrastructure through WebRTC.	Glorious (International distribution)	Easy	Manufacturing use circumstances with latency optimization offloaded to specialised suppliers with globally distributed community and media relays. Gives extra capabilities reminiscent of observability and multi-participant calls.
Telephony	Voice brokers are accessed by conventional cellphone calls.	Glorious	Medium	Contact heart and telephony use circumstances. Latency could also be depending on telephony supplier.

Instance strategy: Utilizing WebSockets bi-directional streaming

You can begin with WebSockets as the only strategy: it natively helps most purchasers and AgentCore Runtime. Deploy Pipecat voice brokers on AgentCore Runtime utilizing persistent, bidirectional WebSocket connections for audio streaming between shopper gadgets and your agent logic.

The connection follows an easy three-step circulate:

Shopper requests a WebSocket endpoint: The shopper first sends a POST request to an middleman server (/server) to acquire a safe WebSocket connection endpoint.
Middleman server handles AWS authentication: The middleman server on the Pipecat pre-built frontend makes use of the AWS SDK to generate an AWS SigV4 pre-signed URL with embedded credentials as question parameter. For instance: X?-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=
Shopper establishes direct connection: Utilizing the authenticated pre-signed URL, the shopper connects on to the agent on AgentCore Runtime and streams bi-directional audio, bypassing the middleman server for subsequent communications.

You utilize Pipecat’s WebSocket transport to show an endpoint on the /ws path as required by AgentCore Runtime. The structure separates credential administration from agent logic for safe shopper entry with out exposing AWS credentials on to browser purposes.

To study extra, strive the Pipecat on AgentCore code pattern utilizing WebSockets transport.

Instance strategy: Utilizing WebRTC bi-directional streaming with TURN help

Whereas WebSockets works for easy deployments, WebRTC can provide improved efficiency. It’s designed to ship audio utilizing a quick, light-weight community path that minimizes delay. It sometimes makes use of UDP for its low latency and smoother real-time expertise, and offers improved resilience throughout variable community situations. If UDP is just not out there, WebRTC mechanically falls again to TCP, which is extra dependable however can introduce slight delays: much less superb for voice, however useful when connectivity is restricted. This reliability comes from Interactive Connectivity Institution (ICE) servers, which negotiate direct peer-to-peer paths by NATs and firewalls, falling again to streaming media relay through Traversal Utilizing Relays round NAT (TURN) servers when direct connections can’t be made.

Pipecat helps SmallWebRTCTransport for direct peer-to-peer WebRTC connections between purchasers and brokers on AgentCore Runtime. In comparison with complete WebRTC architectures requiring devoted media servers (reminiscent of Selective Forwarding Models or SFUs), this light-weight transport can run immediately inside AgentCore Runtime, eradicating the necessity for complicated media server administration.

On this state of affairs, connection circulate operates as follows:

Signaling: Shopper sends a Session Description Protocol (SDP) provide to the middleman server, which forwards it to the /invoke/ endpoint in AgentCore Runtime. The agent @app.entrypoint handler processes the provide and returns an SDP reply containing media capabilities and community candidates.
Connectivity Institution: To ascertain a direct connection, each the shopper and the agent use Interactive Connectivity Institution (ICE) protocol to be able to uncover the optimum community path. AgentCore Runtime supporting Traversal Utilizing Relays round NAT (TURN) relayed connections. The protocol makes an attempt connectivity on this order:
- Direct Connection: Join peer-to-peer utilizing native community addresses. This path is just not supported on AgentCore Runtime because the runtime setting can’t be assigned to a public IP handle.
- Session Traversal Utilities for NAT (STUN) assisted connection: Use a STUN server to find public IP/port by Community Tackle Translation (NAT) and try direct connectivity. This path requires each inbound and outbound UDP site visitors which isn’t presently supported as AWS NAT Gateways makes use of symmetric NAT, which prevents STUN-based direct connectivity from succeeding.
- Traversal Utilizing Relays round NAT (TURN) relayed connection: Route media by a TURN relay server. Configure TURN utilizing managed providers (reminiscent of Cloudflare or Twilio), Amazon Kinesis Video Streams (KVS) or self-hosted options (reminiscent of coturn in your VPC). This path is really helpful on AgentCore Runtime configured with a VPC (see particulars under).
Connection by VPC: As soon as connectivity is established, site visitors will route from the shopper to the runtime setting through the VPC (extra particulars within the following part).

To study extra, strive the Pipecat on AgentCore code pattern utilizing WebRTC transport.

Configuring AgentCore Runtime on VPC for WebRTC connectivity

The code pattern demonstrates a easy voice agent utilizing WebRTC. First, you configure ICE_SERVER_URLS setting variables in each: 1) the middleman server on the Pipecat pre-built frontend (/server) and a pair of) the runtime setting (/agent). This enables bidirectional site visitors between them.

Subsequent, you deploy your brokers to AgentCore Runtime with VPC networking configured to permit for UDP transport to TURN servers. For safety, you expose the runtime to a personal VPC subnet, with a NAT Gateway within the public subnet to route web entry, as illustrated under.

With this strategy, you possibly can configure ICE servers for full WebRTC connectivity, with each STUN and UDP with TCP fallback. For instance, you possibly can configure Cloudflare managed TURN as follows:

# Configure agent/.env and server/.env
ICE_SERVER_URLS=stun:stun.cloudflare.com,flip:flip.cloudflare.com:53,flip:flip.cloudflare.com:3478,flip:flip.cloudflare.com:5349

Utilizing AWS-native TURN with Amazon Kinesis Video Streams (KVS)

For a totally AWS-native various to managed TURN providers, Amazon Kinesis Video Streams (KVS) handles TURN infrastructure with out third-party dependencies. It offers non permanent, auto-rotating TURN credentials through the GetIceServerConfig API, avoiding third-party dependencies for NAT traversal. The circulate works as follows:

One-time setup: Create a KVS signaling channel. The channel is used just for TURN credential provisioning — your agent continues to make use of Pipecat’s WebRTC transport for signaling and media.
At connection time: Your agent calls GetSignalingChannelEndpoint to get the HTTPS endpoint, then calls GetIceServerConfig to retrieve non permanent TURN credentials (URIs, username, password).
Configure the peer connection: Cross the returned credentials to your RTCPeerConnection as ICE servers. TURN site visitors flows by KVS-managed infrastructure.

Concerns when utilizing KVS managed TURN

Issue	KVS Managed TURN	Third-party TURN
AWS native	Sure — no exterior dependency	No — requires exterior account
Credential administration	Computerized rotation	Handbook or provider-managed
Arrange	Create signaling channel + API calls	Configure setting variables
Finest for	AWS centric deployments	Simplicity or current supplier relationships

Extra concerns:

Value: Every lively signaling channel prices $0.03/month. At low to average quantity, that is negligible.
Charge restrict: GetIceServerConfig is proscribed to five transactions per second (TPS) per channel. For top-volume deployments exceeding 100,000 classes per 30 days, implement a channel pooling technique the place you distribute requests throughout a number of channels: channels_needed = ceil(peak_new_sessions_per_second / 5).
No PrivateLink: The VPC nonetheless requires web egress (through NAT Gateway) to achieve KVS TURN endpoints.
Credential lifetime: KVS TURN credentials are non permanent and auto-rotated, so you do not want to handle credential rotation.

To study extra, strive the code pattern utilizing KVS managed TURN.

Instance strategy: Utilizing managed WebRTC on AWS Market

Whereas direct WebRTC presents management, managed WebRTC suppliers generally present TURN servers and globally distributed SFUs to facilitate dependable connectivity and low-latency media routing. It additionally offers extra options reminiscent of built-in analytics and observability, and help for multi-participant rooms past 1:1 agent conversations. For manufacturing voice brokers at scale, think about managed suppliers out there on AWS Market, reminiscent of Every day. Every day runs its globally distributed WebRTC infrastructure on AWS providing a number of deployment fashions:

Absolutely managed SaaS: You connect with Every day’s hosted infrastructure through public API endpoints. That is superb for speedy deployment and environments the place operational simplicity is prioritized. On this state of affairs, your agent in AgentCore Runtime can merely connect with the managed WebRTC infrastructure through the general public web.
Buyer VPC deployment: You deploy Every day’s media servers immediately into your VPC for full community management and compliance with strict knowledge residency necessities. On this state of affairs, you configure AgentCore Runtime for VPC as outlined above.
SaaS with AWS PrivateLink: You connect with Every day’s hosted infrastructure and configure AWS PrivateLink in order that site visitors flows by VPC endpoints on to Every day’s managed infrastructure with out traversing the general public web, lowering latency whereas sustaining community isolation to the AWS spine community. On this state of affairs, you configure AgentCore Runtime for VPC as outlined above.

To study extra, contact your AWS account staff to discover Every day on AWS Market or strive the code pattern utilizing Every day transport and its DAILY_API_KEY on the totally managed SaaS choice.

Instance strategy: Utilizing a telephony supplier

Whereas WebRTC excels for net and cellular channels, telephony handoff permits conventional Public Switched Phone Community (PSTN) integration for contact facilities, IVR substitute, and outbound campaigns. For real-time dialog, your agent runtime should keep a persistent, bidirectional audio stream along with your speech fashions, enterprise logic, and telephony supplier. These suppliers provide managed voice providers that deal with the complexity of conventional telephony infrastructure by easy APIs. Relying on the capabilities of the telephony supplier, you combine to them utilizing both Session Initiation Protocol (SIP) or streaming WebSocket or WebRTC protocols. Pipecat transports and serializers present connectors for implementation.

To study extra, see Pipecat Information on Telephony and Constructing AI-Powered Voice Functions: Telephony Integration Information.

Conclusion

AgentCore Runtime offers a safe and serverless infrastructure to scale voice brokers reliably. On this publish, you discovered how low latency is crucial for pure conversations, and key concerns for various transport modes: WebSockets, TURN-assisted WebRTC, managed WebRTC and telephony integrations, based mostly in your latency, reliability, and utilization necessities. When evaluating transport choices, begin easy with WebSockets for speedy prototyping, then think about WebRTC with AgentCore on VPC mode or managed suppliers for manufacturing deployments. In case your voice brokers intend to deal with telephony or contact heart use circumstances, think about out there integrations to telephony suppliers on your implementation.

In Half 2 of this collection, you’ll discover extra concerns past community transport: overlaying streaming methods throughout agent-to-model communication, device execution, reminiscence, and retrieval to attain optimum end-to-end latency.

Get began with the Pipecat on AgentCore code samples and hands-on workshop under and choose the transport layer that matches your use case:

For groups preferring extra infrastructure management, the Steering for Constructing Voice Brokers on AWS on Amazon ECS can also be out there as a containerized deployment choice.

Extra Assets

Concerning the authors

Kwindla Hultman Kramer is the Co-founder and CEO at Every day, pioneering low-latency real-time voice, video, and multimodal AI infrastructure. A number one voice AI thought chief, he created the open-source Pipecat framework for manufacturing voice brokers and shares insights at voice AI meetups and his X account (@kwindla).

Paul Kompfner is a Member of Technical Employees at Every day, the place he’s on the staff that maintains the Pipecat open supply framework. He’s an professional in streaming infrastructure and voice-based agentic methods. He incessantly collaborates with AWS and the voice AI ecosystem to ship first-class help for voice fashions and internet hosting platforms to allow scalable real-time voice AI on Pipecat.

Kosti Vasilakakis is a Principal PM at AWS on the Agentic AI staff, the place he has led the design and improvement of a number of Bedrock AgentCore providers from the bottom up, together with Runtime. He beforehand labored on Amazon SageMaker since its early days, launching AI/ML capabilities now utilized by 1000’s of firms worldwide. Earlier in his profession, Kosti was an information scientist. Exterior of labor, he builds private productiveness automations, performs tennis, and explores the wilderness along with his household.

Lana Zhang is a Senior Options Architect within the AWS World Extensive Specialist Group AI Providers staff, specializing in AI and generative AI with a deal with use circumstances together with content material moderation and media evaluation. She’s devoted to selling AWS AI and generative AI options, demonstrating how generative AI can remodel basic use circumstances by including enterprise worth. She assists prospects in remodeling their enterprise options throughout various industries, together with social media, gaming, ecommerce, media, promoting, and advertising and marketing.

Sundar Raghavan is a Options Architect at AWS on the Agentic AI staff. He formed the developer expertise for Amazon Bedrock AgentCore, contributing to the SDK, CLI, and starter toolkit, and now focuses on integrations with AI agent frameworks. Beforehand, Sundar labored as a Generative AI Specialist, serving to prospects design AI purposes on Amazon Bedrock. In his free time, he loves exploring new locations, sampling native eateries, and embracing the nice outdoor.

Daniel Wirjo is a Options Architect at AWS, centered on AI and SaaS startups. As a former startup CTO, he enjoys collaborating with founders and engineering leaders to drive progress and innovation on AWS. Exterior of labor, Daniel enjoys taking walks with a espresso in hand, appreciating nature, and studying new concepts.

Deploy voice brokers with Pipecat and Amazon Bedrock AgentCore Runtime – Half 1

5 Sensible Methods to Detect and Mitigate LLM Hallucinations Past Immediate Engineering

Tips on how to Lie with Statistics together with your Robotic Finest Pal

Tips on how to Lie with Statistics together with your Robotic Finest Pal

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Context Engineering — A Complete Fingers-On Tutorial with DSPy

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

About Us

Category

Recent Posts