Claude Code deployment patterns and greatest practices with Amazon Bedrock

Claude Code is an AI-powered coding assistant from Anthropic that helps builders write, evaluate, and modify code by means of pure language interactions. Amazon Bedrock is a completely managed service that gives entry to basis fashions from main AI firms by means of a single API. This submit reveals you methods to deploy Claude Code with Amazon Bedrock. You’ll be taught authentication strategies, infrastructure choices, and monitoring methods to deploy securely at enterprise scale.

Suggestions for many enterprises

We advocate the Steerage for Claude Code with Amazon Bedrock, which implements confirmed patterns that may be deployed in hours.

Deploy Claude Code with this confirmed stack:

This structure supplies safe entry with consumer attribution, capability administration, and visibility into prices and developer productiveness.

Authentication strategies

Claude Code deployments start with authenticating to Amazon Bedrock. The authentication determination impacts downstream safety, monitoring, operations, and developer expertise.

Authentication strategies comparability

Characteristic	API Keys	AWS log in	SSO with IAM Identification Middle	Direct IdP Integration
Session length	Indefinite	Configurable (as much as 12 hours)	Configurable (as much as 12 hours)	Configurable (as much as 12 hours)
Setup time	Minutes	Minutes	Hours	Hours
Safety danger	Excessive	Low	Low	Low
Person attribution	None	Fundamental	Fundamental	Full
MFA assist	No	Sure	Sure	Sure
OpenTelemetry integration	None	Restricted	Restricted	Full
Value allocation	None	Restricted	Restricted	Full
Operation overhead	Excessive	Medium	Medium	Low
Use case	Brief time period testing	Testing and restricted deployments	Fast SSO deployment	Manufacturing deployment

The next will talk about the trade-offs and implementation concerns specified by the above desk.

API keys

Amazon Bedrock helps API keys because the quickest path to proof-of-concept. Each short-term (12-hour) and long-term (indefinite) keys will be generated by means of the AWS Administration Console, AWS CLI, or SDKs.

Nevertheless, API keys create safety vulnerabilities by means of persistent entry with out MFA, handbook distribution necessities, and danger of repository commits. They supply no consumer attribution for price allocation or monitoring. Use just for short-term testing (< 1 week, 12-hour expiration).

AWS log in

The aws login command makes use of your AWS Administration Console credentials for Amazon Bedrock entry by means of a browser-based authentication circulation. It helps fast setup with out API keys and is advisable for testing and small deployments.

Single Signal-On (SSO)

AWS IAM Identification Middle integrates with current enterprise identification suppliers by means of OpenID Join (OIDC), an authentication protocol that permits single sign-on by permitting identification suppliers to confirm consumer identities and share authentication info with functions. This integration permits builders to make use of company credentials to entry Amazon Bedrock with out distributing API keys.

Builders authenticate with AWS IAM Identification Middle utilizing the aws sso login command, which generates momentary credentials with configurable session durations. These credentials robotically refresh, lowering the operational overhead of credential administration whereas enhancing safety by means of momentary, time-limited entry.

aws sso login --profile=your-profile-name 
export CLAUDE_CODE_USE_BEDROCK=1 
export AWS_PROFILE=your-profile-name

Organizations utilizing IAM Identification Middle for AWS entry can prolong this sample to Claude Code. Nevertheless, it limits detailed user-level monitoring by not exposing OIDC JWT tokens for OpenTelemetry attribute extraction.

This authentication methodology fits organizations that prioritize fast SSO deployment over detailed monitoring or preliminary rollouts the place complete metrics aren’t but required.

Direct idP integration

Direct OIDC federation together with your identification supplier (Okta, Azure AD, Auth0, or AWS Cognito Person Swimming pools) is advisable for manufacturing Claude Code deployments. This strategy connects your enterprise identification supplier on to AWS IAM to generate momentary credentials with full consumer context for monitoring.

The course of credential supplier orchestrates the OAuth2 authentication with PKCE, a safety extension that helps stop authorization code interception. Builders authenticate of their browser, exchanging OIDC tokens for AWS momentary credentials.

A helper script makes use of AWS Safety Token Service (STS) AssumeRoleWithWebIdentity to imagine a job with credentials to InvokeModel and InvokeModelWithStreaming to make use of Amazon Bedrock. Direct IAM federation helps session durations as much as 12 hours and the JWT token stays accessible all through the session, enabling monitoring by means of OpenTelemetry to trace consumer attributes like e mail, division, and group.

The Steerage for Claude Code with Amazon Bedrock implements each Cognito Identification Pool and Direct IAM federation patterns, however recommends Direct IAM for simplicity. The answer supplies an interactive setup wizard that configures your OIDC supplier integration, deploys the required IAM infrastructure, and builds distribution packages for Home windows, macOS, and Linux.

Builders obtain set up packages that configure their AWS CLI profile to make use of the credential course of. Authentication happens by means of company credentials, with automated browser opening to refresh credentials. The credential course of handles token caching, credential refresh, and error restoration.

For organizations requiring detailed utilization monitoring, price attribution by developer, and complete audit trails, direct IdP integration by means of IAM federation supplies the muse for superior monitoring capabilities mentioned later on this submit.

Organizational choices

Past authentication, architectural choices form how Claude Code integrates together with your AWS infrastructure. These decisions have an effect on operational complexity, price administration, and enforcement of utilization insurance policies.

Public endpoints

Amazon Bedrock supplies managed, public API endpoints in a number of AWS Areas with minimal operational overhead. AWS manages infrastructure, scaling, availability, and safety patching. Builders use commonplace AWS credentials by means of AWS CLI profiles or atmosphere variables. Mixed with OpenTelemetry metrics from Direct IdP integration, you’ll be able to observe utilization by means of public endpoints by particular person developer, division, or price middle and will be enforced on the AWS IAM stage. For instance, implementing per-developer charge limiting requires infrastructure that observes CloudWatch metrics or CloudTrail logs and takes automated motion. Organizations requiring speedy, request-level blocking based mostly on customized enterprise logic may have further elements corresponding to an LLM (Giant Language Mannequin) gateway sample. Public Amazon Bedrock endpoints are adequate for many organizations as they supply a stability of simplicity, AWS managed reliability, price alerting, and applicable management mechanisms.

LLM gateway

An LLM gateway introduces an middleman software layer between builders and Amazon Bedrock, routing requests by means of customized infrastructure. The Steerage for Multi-Supplier Generative AI Gateway on AWS describes this sample, deploying a containerized proxy service with load balancing and centralized credential administration.

This structure is greatest for:

Multi-provider assist: Routing between Amazon Bedrock, OpenAI, and Azure OpenAI based mostly on availability, price, or functionality
Customized middleware: Proprietary immediate engineering, content material filtering, or immediate injection detection on the request stage
Request-level coverage enforcement: Speedy blocking of requests exceeding customized enterprise logic past IAM capabilities

Gateways present unified APIs and real-time monitoring however add operational overhead: Amazon Elastic Container Service (Amazon ECS)/Amazon Elastic Kubernetes Service (Amazon EKS) infrastructure, Elastic Load Balancing (ELB) Utility Load Balancers, Amazon ElastiCache, Amazon Relational Database Service (Amazon RDS) administration, elevated latency, and a brand new failure mode the place gateway points block Claude Code utilization. LLM gateways excel for functions making programmatic calls to LLMs, offering centralized monitoring, per consumer visibility, and unified management entry suppliers.

For conventional API entry eventualities, organizations can deploy gateways to achieve monitoring and attribution capabilities. The Claude Code steering answer already consists of monitoring and attribution capabilities by means of Direct IdP authentication, OpenTelemetry metrics, IAM insurance policies, and CloudWatch dashboards. Including an LLM gateway to the steering answer duplicates current performance. Contemplate gateways just for multi-provider assist, customized middleware, or request-level coverage enforcement past IAM.

Single account implementation

We advocate consolidating coding assistant inferences in a single devoted account, separate out of your improvement and manufacturing workloads. This strategy supplies 5 key advantages:

Simplified operations: Handle quotas and monitor utilization by means of unified dashboards as a substitute of monitoring throughout a number of accounts. Request quota will increase as soon as moderately than per account.
Clear price visibility: AWS Value Explorer and Value and Utilization Reviews present Claude Code prices straight with out advanced tagging. OpenTelemetry metrics allow division and team-level allocation.
Centralized safety: CloudTrail logs circulation to 1 location for monitoring and compliance. Deploy the monitoring stack as soon as to gather metrics from builders.
Manufacturing safety: Account-level isolation helps stop Claude Code utilization from exhausting quotas and throttling manufacturing functions. Manufacturing visitors spikes don’t have an effect on developer productiveness.
Implementation: Cross-account IAM configuration lets builders authenticate by means of identification suppliers that federate to restricted roles, granting solely mannequin invocation permissions with applicable guardrails.

This technique integrates with Direct IdP authentication and OpenTelemetry monitoring. Identification suppliers deal with authentication, the devoted account handles inference, and improvement accounts deal with functions.

Inference profiles

Amazon Bedrock inference profiles present price monitoring by means of useful resource tagging, however don’t scale to per-developer granularity. When you can create software profiles for price allocation, managing profiles for 1000+ particular person builders turns into operationally burdensome. Inference profiles work greatest for organizations with 10-50 distinct groups requiring remoted price monitoring, or when utilizing cross-Area inference the place managed routing distributes requests throughout AWS Areas. They’re excellent for eventualities requiring fundamental price allocation moderately than complete monitoring.

System-defined cross-Area inference profiles robotically route requests throughout a number of AWS Areas, distributing load for increased throughput and availability. Once you invoke a cross-Area profile (e.g., us.anthropic.claude-sonnet-4), Amazon Bedrock selects an obtainable Area to course of your request.

Utility inference profiles are profiles you create explicitly in your account, usually wrapped round a system-defined profile or a selected mannequin in a Area. You’ll be able to tag software profiles with customized key-value pairs like group:data-science or mission:fraud-detection that circulation to AWS Value and Utilization Reviews for price allocation evaluation. To create an software profile:

aws bedrock create-inference-profile 
   --inference-profile-name team-data-science 
   --model-source arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-sonnet-4 
   --tags group=data-science costcenter=engineering

Tags seem in AWS Value and Utilization Reviews, so you’ll be able to question:

"What did the data-science group spend on Amazon Bedrock final month?"

Every profile have to be referenced explicitly in API calls, that means builders’ credential configurations should specify their distinctive profile moderately than a shared endpoint.

For extra on inference profiles, see Amazon Bedrock Inference Profiles documentation.

Monitoring

An efficient monitoring technique transforms Claude Code from a productiveness instrument right into a measurable funding by monitoring utilization, prices, and impression.

Progressive enhancement path

Monitoring layers are complementary. Organizations usually begin with fundamental visibility and add capabilities as ROI necessities justify further infrastructure.

Let’s discover every stage and when it is smart on your deployment.

Observe: Infrastructure prices develop progressively—every stage retains the earlier layers whereas including new elements.

CloudWatch

Amazon Bedrock publishes metrics to Amazon CloudWatch robotically, monitoring invocation counts, throttling errors, and latency. CloudWatch graphs present mixture traits corresponding to complete requests, common latency, and quota utilization with minimal deployment effort. This baseline monitoring is included in the usual pricing of CloudWatch and requires minimal deployment effort. You’ll be able to create CloudWatch alarms that notify you when invocation charges spike, error charges exceed thresholds, or latency degrades.

Invocation logging

Amazon Bedrock invocation logging captures detailed details about every API name to Amazon S3 or CloudWatch Logs, preserving particular person request data together with invocation metadata and full request/response knowledge. Course of logs with Amazon Athena, load into knowledge warehouses, or analyze with customized instruments. The logs show utilization patterns, invocations by mannequin, peak utilization, and an audit path of Amazon Bedrock entry.

OpenTelemetry

Claude Code consists of assist for OpenTelemetry, an open supply observability framework for gathering software telemetry knowledge. When configured with an OpenTelemetry collector endpoint, Claude Code emits detailed metrics about its operations for each Amazon Bedrock API calls and higher-level improvement actions.

The telemetry captures detailed code-level metrics not included in Amazon Bedrock’s default logging, corresponding to: traces of code added/deleted, recordsdata modified, programming languages used, and builders’ acceptance charges of Claude’s recommendations. It additionally tracks key operations together with file edits, code searches, documentation requests, and refactoring duties.

The steering answer deploys OpenTelemetry infrastructure on Amazon ECS Fargate. An Utility Load Balancer receives telemetry over HTTP(S) and forwards metrics to an OpenTelemetry Collector. The collector exports knowledge to Amazon CloudWatch and Amazon S3.

Dashboard

The steering answer features a CloudWatch dashboard that shows key metrics constantly, monitoring energetic customers by hour, day, or week to disclose adoption and utilization traits that allow per-user price calculation. Token consumption breaks down by enter, output, and cached tokens, with excessive cache hit charges indicating environment friendly context reuse and per-user views figuring out heavy customers. Code exercise metrics observe traces added and deleted, correlating with token utilization to point out effectivity and utilization patterns.

The operations breakdown reveals distribution of file edits, code searches, and documentation requests, whereas consumer leaderboards show prime customers by tokens, traces of code, or session length.

The dashboard updates in near-real-time and integrates with CloudWatch alarms to set off notifications when metrics exceed thresholds. The steering answer deploys by means of CloudFormation with customized Lambda capabilities for advanced aggregations.

Analytics

Whereas dashboards excel at real-time monitoring, long-term traits and sophisticated consumer habits evaluation require analytical instruments. The steering answer’s elective analytics stack streams metrics to Amazon S3 utilizing Amazon Information Firehose. AWS Glue Information Catalog defines the schema, making knowledge queryable by means of Amazon Athena.

The analytics layer helps queries corresponding to month-to-month token consumption by division, code acceptance charges by programming language, and token effectivity variations throughout groups. Value evaluation turns into refined by becoming a member of token metrics with Amazon Bedrock pricing to calculate precise prices by consumer, then mixture for department-level chargeback. Time-series evaluation reveals how prices scale with group development for funds forecasting. The SQL interface integrates with enterprise intelligence instruments, enabling exports to spreadsheets, machine studying fashions, or mission administration techniques.

For instance, to see the month-to-month price evaluation by division:

SELECT division, SUM(input_tokens) * 0.003 / 1000 as input_cost, 
SUM(output_tokens) * 0.015 / 1000 as output_cost, 
COUNT(DISTINCT user_email) as active_users 
FROM claude_code_metrics 
WHERE yr = 2024 AND month = 1 
GROUP BY division 
ORDER BY (input_cost + output_cost) DESC;

The infrastructure provides average price: Information Firehose prices for ingestion, S3 for retention, and Athena prices per question based mostly on knowledge scanned.

Allow analytics if you want historic evaluation, advanced queries, or integration with enterprise intelligence instruments. Whereas the dashboard alone might suffice for small deployments or organizations targeted totally on real-time monitoring, enterprises making important investments in Claude Code ought to implement the analytics layer. This supplies the visibility wanted to exhibit return on funding and optimize utilization over time.

Quotas

Quotas enable organizations to manage and handle token consumption by setting utilization limits for particular person builders or groups. Earlier than implementing quotas, we advocate first enabling monitoring to grasp pure utilization patterns. Utilization knowledge usually reveals that top token consumption correlates with excessive productiveness, indicating that heavy customers ship proportional worth.

The quota system shops limits in DynamoDB with entries like:

{ "userId": "jane@instance.com", "monthlyLimit": 1000000, "currentUsage": 750000, "resetDate": "2025-02-01" }

A Lambda operate triggered by CloudWatch Occasions aggregates token consumption each quarter-hour, updating DynamoDB and publishing to SNS when thresholds are crossed.

Monitoring comparability

The next desk summarizes the trade-offs throughout monitoring approaches:

Functionality	CloudWatch	Invocation logging	OpenTelemetry	Dashboard and Analytics
Arrange complexity	None	Low	Medium	Medium
Person attribution	None	IAM Identification	Full	Full
Actual-time metrics	Sure	No	Sure	Sure
Code-level metrics	No	No	Sure	Sure
Historic evaluation	Restricted	Sure	Sure	Sure
Value allocation	Account stage	Account stage	Person, group, division	Person, group, division
Token observe	Mixture	Per-request	Per-user	Per-user with traits
Quota enforcement	Guide	Guide	Doable	Doable
Operational overhead	Minimal	Low	Medium	Medium
Value	Minimal	Low	Medium	Medium
Use case	POC	Fundamental auditing	Manufacturing	Enterprise with ROI

Placing it collectively

This part synthesizes authentication strategies, organizational structure, and monitoring methods right into a advisable deployment sample, offering steering on implementation priorities as your deployment matures. This structure balances safety, operational simplicity, and complete visibility. Builders authenticate as soon as per day with company credentials, directors see real-time utilization in dashboards, and safety groups have CloudTrail audit logs and complete user-attributed metrics by means of OpenTelemetry.

Implementation path

The steering answer helps fast deployment by means of an interactive setup course of, with authentication and monitoring operating inside hours. Deploy the complete stack to a pilot group first, collect actual utilization knowledge, then broaden based mostly on validated patterns.

Deployment – Clone the Steerage for Claude Code with Amazon Bedrock repository and run the interactive poetry run ccwb init wizard. The wizard configures your identification supplier, federation kind, AWS Areas, and elective monitoring. Deploy the CloudFormation stacks (usually 15-Half-hour), construct distribution packages, and take a look at authentication domestically earlier than distributing to customers.

Distribution – Determine a pilot group of 5-20 builders from completely different groups. This group will validate authentication, monitoring, and supply utilization knowledge for full rollout planning. For those who enabled monitoring, the CloudWatch dashboard reveals exercise instantly. You’ll be able to monitor token consumption, code acceptance charges, and operation sorts to estimate capability necessities, determine coaching wants, and exhibit worth for a broader rollout.

Growth – As soon as Claude Code is validated, broaden adoption by group or division. Add the analytics stack (usually 1-2 hours) for historic pattern evaluation to see adoption charges, high-performing groups, and prices forecasts.

Optimization – Use monitoring knowledge for steady enchancment by means of common evaluate cycles with improvement management. The monitoring knowledge can exhibit worth, determine coaching wants, and information capability changes.

When to deviate from the advisable sample

Whereas the structure above fits most enterprise deployments, particular circumstances may justify completely different approaches.

Contemplate an LLM gateway in case you want a number of LLM suppliers past Amazon Bedrock, customized middleware for immediate processing or response filtering, or function in a regulatory atmosphere requiring request-level coverage enforcement past the AWS IAM capabilities.
Contemplate inference profiles you probably have below 50 groups requiring separate price monitoring and like AWS-native billing allocation over telemetry metrics. Inference profiles work properly for project-based price allocation however don’t scale to per-developer monitoring.
Contemplate beginning with out monitoring for time-limited pilots with below 10 builders the place fundamental CloudWatch metrics suffice. Plan so as to add monitoring earlier than scaling, as retrofitting requires redistributing packages to builders.
Contemplate API keys just for time-boxed testing (below one week) the place safety dangers are acceptable.

Conclusion

Deploying Claude Code with Amazon Bedrock at enterprise scale requires considerate authentication, structure, and monitoring choices. Manufacturing-ready deployments observe a transparent sample: Direct IdP integration supplies safe, user-attributed entry and a devoted AWS account simplifies capability administration. OpenTelemetry monitoring supplies visibility into prices and developer productiveness. The Steerage for Claude Code with Amazon Bedrock implements these patterns in a deployable answer. Begin with authentication and fundamental monitoring, then progressively add options as you scale.

As AI-powered improvement instruments turn into the business commonplace, organizations that prioritize safety, monitoring, and operational excellence of their deployments will acquire lasting benefits. This information supplies a complete framework that can assist you maximize Claude Code’s potential throughout your enterprise.

To get began, go to the Steerage for Claude Code with Amazon Bedrock repository.

Concerning the authors

Courtroom Schuett is a Principal Specialist Answer Architect – GenAI who spends his days working with AI Coding Assistants to assist others get essentially the most out of them. Exterior of labor, Courtroom enjoys touring, listening to music, and woodworking.

Jawhny Cooke is the World Tech Lead for Anthropic’s Claude Code at AWS, the place he focuses on serving to enterprises operationalize agentic coding at scale. He companions with prospects and companions to resolve the advanced manufacturing challenges of AI-assisted improvement, from designing autonomous coding workflows and orchestrating multi-agent techniques to operational optimization on AWS infrastructure. His work bridges cutting-edge AI capabilities with enterprise-grade reliability to assist organizations confidently undertake Claude Code in manufacturing environments.

Karan Lakhwani is a Sr. Buyer Options Supervisor at Amazon Net Companies. He focuses on generative AI applied sciences and is an AWS Golden Jacket recipient. Exterior of labor, Karan enjoys discovering new eating places and snowboarding.

Gabe Levy is an Affiliate Supply Marketing consultant at AWS based mostly out of New York primarily targeted on Utility Improvement within the cloud. Gabe has a sub-specialization in Synthetic Intelligence and Machine Studying. When not working with AWS prospects, he enjoys exercising, studying and spending time with household and mates.

Gabriel Velazquez Lopez is a GenAI Product Chief at AWS, the place he leads the technique, go-to-market, and product launches for Claude on AWS in partnership with Anthropic.

Claude Code deployment patterns and greatest practices with Amazon Bedrock

Growing Human Sexuality within the Age of AI

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

The Journey from Jupyter to Programmer: A Fast-Begin Information

The right way to run Qwen 2.5 on AWS AI chips utilizing Hugging Face libraries

About Us

Category

Recent Posts