A 12 months in the past, Simon Willison wrote one of many cleanest definitions of an agent that has caught round:
An LLM agent runs instruments in a loop to attain a purpose.
That definition caught as a result of it describes what each manufacturing agent really does. Kiro, Amazon Q Developer, Fast Brokers, Codex, Claude Code: below the hood, all of them run the identical form. The agent loop is the frequent denominator.
However the loop was by no means the arduous half. The arduous half was all the things round it.
Decide a framework. Wire up instruments. Provision sandboxed compute. Configure storage, secrets and techniques, networking. Resolve the place reminiscence lives. Bolt on observability. Get the fitting dependencies into the fitting container. Additionally, native prototyping tends to be the straightforward half: a single developer can get up an agent on their laptop computer in a day. Getting it into manufacturing is the place the work explodes, and the second it has to serve a couple of person, an entire new layer of labor reveals up: concurrency, isolation, identification, state, scaling.
Worse, that overhead multiplied with each new use case. Groups that wished to experiment, strive a special mannequin, swap a software, level the agent at a brand new area, discovered themselves repeating the identical plumbing. The bottleneck wasn’t intelligence. It was orchestration and infrastructure.
After we launched the AgentCore harness in preview in April, we made a guess: the AgentCore primitives (Runtime, Reminiscence, Gateway, Browser, Id, Observability) already give groups all the things they should run brokers in manufacturing; what they shouldn’t need to do is wire them up by hand each time. The harness handles that wiring as a managed abstraction, so it turns into one thing you configure reasonably than one thing you construct.
Immediately, Amazon Bedrock AgentCore harness is usually obtainable. Two API calls (CreateHarness to outline an agent, InvokeHarness to run it), a fast walkthrough within the AgentCore CLI (as proven within the under gif), or a number of clicks within the console, and you’ve got an agent operating in minutes. It runs in its personal remoted surroundings with a filesystem and shell, so it will possibly learn recordsdata, run instructions, and write code safely. It remembers customers and conversations throughout classes, picks up expertise you level it at (together with the AWS-curated catalog), browses the net, calls your instruments by means of gateway or MCP, and switches mannequin suppliers mid-session with out shedding context. Each step streams again to you in actual time and is mechanically traced to CloudWatch. There’s no want to jot down orchestration code or construct a container, besides if you wish to.

What the harness provides you
A harness is all the things an agent must run in manufacturing, wrapped behind two API calls. You level to the mannequin, instruments, expertise, and directions you need. AgentCore handles the sandboxed surroundings, the reminiscence, the storage, the identification, and the observability that ties all of it collectively. Capabilities new at GA are marked with * within the diagram under.

Any mannequin: Use the fitting mannequin for the job, change when you should
Completely different duties want totally different fashions. Prospects informed us they need to plan with one mannequin and execute with one other, swap a supplier for a price-performance check, or transfer off a mannequin that simply shipped a regression, all with out shedding the dialog. Decide a default mannequin on CreateHarness, then override it on any single InvokeHarness name when you should. The default stays in place for each different invocation. Set the matching discipline on mannequin for the supplier you need:
bedrockfor any mannequin served on Amazon Bedrock, together with Anthropic Claude, Amazon Nova, Meta Llama, DeepSeek, Qwen, Kimi, MiniMax, Cohere, Mistral and as of lately OpenAI GPT-5.5 and GPT-5.4 on BedrockopenAifor direct entry to OpenAI’s API (api.openai.com)geminifor Google GeminiliteLlmfor any third-party supplier supported by LiteLLM, together with Anthropic direct, Cohere, Mistral, Vertex, Azure OpenAI, and others
And the half that prospects informed us mattered most: change suppliers at any level, even mid-session, and maintain context. For instance, you should utilize Claude Opus to plan, change to GPT-5.5 to jot down code, change to Gemini to summarize. The dialog continues. The harness handles the transition seamlessly.

When you’re utilizing API keys to entry any of the underlying mannequin suppliers, they’re saved securely in AgentCore Id’s token vault. The agent by no means sees uncooked credentials.
Instruments as config: Join your agent to the world with out writing glue code
Instruments are how the agent impacts something exterior its personal reasoning, and wiring them is the half most groups quietly hate. Prospects informed us they don’t need to write per-API adapter code, handle MCP server lifecycles, or construct their very own browser sandbox. They need to declare what the agent can use and let the harness deal with the connection, the auth, and the execution.
instruments on CreateHarness are a listing. Every entry has a kind and a config block, and the harness wires them in:
agentcore_gateway: you may reference an AgentCore Gateway by ARN. Each goal the gateway exposes (OpenAPI, Smithy, Lambda, MCP) reveals up as a software, with IAM/JWT auth, per-tool authorization, and outbound credential brokering dealt with for you.remote_mcp: you may join on to any MCP server by URL. Good when the server is already secured and also you don’t want Gateway’s governance layer in entrance of it.agentcore_browser: a full browser sandbox as a one-line reference. Click on, kind, navigate, screenshot.agentcore_code_interpreter: sandboxed Python and Node execution, identical one-line sample.inline_function: a software schema the harness emits as a tool-use occasion within the stream and waits so that you can reply on. Use it for human-in-the-loop approvals or for instruments that need to run in your facet.
Each session additionally will get built-in shell (run instructions contained in the microVM) and file_operations (learn and write on the agent’s filesystem) with out you itemizing them. They’re what make the stateful filesystem and shell story usable from the mannequin.
You have got the identical choices on InvokeHarness for per-call edits, the place you may move new instruments to alter instruments for a single name, or strip the checklist right down to a targeted set for that invocation through the allowed_tools parameter. Defaults are set at create time, however you may simply override at invoke time.
Constructed-in reminiscence: Your harness remembers customers and conversations
Prospects need their agent to acknowledge a returning person, decide up the place the final dialog left off, and bear in mind preferences with out anybody replaying message historical past. In preview, you needed to provision an AgentCore Reminiscence useful resource individually and move its ARN, which labored however was a second API name and a simple factor to neglect on the way in which to manufacturing.
At GA, omitting reminiscence on CreateHarness provisions a managed reminiscence mechanically, with wise defaults: SEMANTIC + SUMMARIZATION methods, 30-day occasion expiry, AWS-owned encryption, and multi-tenant isolation by default by means of namespace templates that key on actorId. It’s an actual, customer-owned Reminiscence useful resource, provisioned for you. Reminiscence isn’t necessary. In case your agent is stateless, set reminiscence: { disabled: {} } and the harness skips reminiscence fully. When you’d reasonably connect an AgentCore Reminiscence useful resource you already personal, move agentCoreMemoryConfiguration with its ARN. These three paths appear like the next:
Switching to your personal reminiscence is one UpdateHarness name. Cross agentCoreMemoryConfiguration together with your reminiscence ARN and the beforehand managed reminiscence disassociates instantly. It’s nonetheless an everyday AgentCore Reminiscence useful resource in your account, so you may maintain utilizing it anyplace, connect it to a different harness, question it straight, or delete it by yourself phrases. While you delete the harness, the managed reminiscence is cascade-deleted by default (deleteManagedMemory: true). Cross deleteManagedMemory: false if you wish to maintain it.
The managed reminiscence is automated however not opaque. It’s an actual, addressable AWS useful resource you may question, connect to a special agent, audit, or hand to an analytics pipeline.
Abilities: Give your agent the fitting experience on the fitting job
Prospects need their agent to know tips on how to deal with a selected job earlier than it tries it. For instance, tips on how to format an Excel report, tips on how to file a JIRA ticket the way in which their group recordsdata them, or tips on how to observe AWS-recommended procedures for accessing their knowledge on AWS. Abilities are the way you give the agent that data on demand. They’re bundles of recordsdata, scripts, and directions. The harness masses talent metadata and pulls full content material into context solely when the duty really requires it.
At GA, HarnessSkill is a union with 4 sources, so you may connect expertise declaratively with out baking them right into a container or shelling in:
awsSkills– activate the AWS-curated talent bundle.git– clone a public or non-public repo over HTTPS, pinned to a commit or a department.s3– pull a talent bundle from your personal Amazon Easy Storage Service (Amazon S3) bucket.path– reference a path that already exists within the container you introduced in.
The identical form works on InvokeHarness for per-call layering. The harness materializes every talent onto the session filesystem on session begin, or throughout a brand new invocation if the Abilities configuration adjustments.
The massive unlock for AWS builders: the AWS expertise repository ships curated expertise overlaying the AWS floor space, from core expertise (SDK utilization, infrastructure as code (IaC), AWS Id and Entry Administration (IAM), Amazon CloudWatch, and Amazon Bedrock) to service-specific deep workflows for analytics, databases, Amazon Elastic Compute Cloud (Amazon EC2), networking, safety, serverless, and storage.
To make this even less complicated, GA introduces a first-class awsSkills toggle: activate the AWS talent bundle with zero plumbing, no URL, no community fetch (the abilities are introduced within the harness’s underlying runtime, everytime you want them).
Surroundings and filesystem: Run your agent within the surroundings it wants
Most brokers run wonderful on the harness’s default surroundings, which incorporates Python and bash. While you want extra (a non-public dependency, a runtime model, a CLI software, or persistence throughout classes), two knobs allow you to form the agent’s runtime to match your stack: the container picture and the filesystem.
Container picture. If Python and bash aren’t sufficient, you may package deal your supply code, dependencies, runtimes, and instruments right into a customized container, push it to Amazon Elastic Container Registry (Amazon ECR), and reference it in CreateHarness. The agent then makes use of that precise surroundings. It’s also possible to pair it with InvokeAgentRuntimeCommand, an API that runs a shell command straight contained in the agent’s microVM session, for session-specific setup that varies per invocation (clone a specific department, seed check knowledge, or pull credentials). It’s deterministic, doesn’t undergo the mannequin, and doesn’t burn tokens.
Filesystem. Brokers usually want recordsdata to survive a single response: a shared data base, a working listing throughout classes, or a spot to drop produced paperwork again into your bucket. The harness provides you three filesystem choices, every with totally different attain and persistence traits.
| Kind | Managed | Digital non-public cloud (VPC) required | Persistence |
| Managed session storage | Sure | No | Throughout cease/resume cycles of the identical runtimeSessionId. |
| Amazon Elastic File System (Amazon EFS) entry level | BYO | Sure | Throughout all classes, sharable throughout harnesses. |
| Amazon Easy Storage Service (Amazon S3) Recordsdata entry level | BYO | Sure | Throughout all classes and harnesses, with full Amazon S3 sturdiness, versioning, and historical past. |
Attain for managed session storage for working recordsdata that have to survive microVM restarts inside a session. Attain for EFS when a number of harnesses or classes have to share reference knowledge, prompts, or talent bundles. Attain for S3 Recordsdata while you need the agent to learn and write by means of normal file operations whereas adjustments are mechanically synchronized with the backing S3 bucket (the agent writes a report, the report seems in your S3 bucket because it goes).
Unified observability: See what your agent did, in a single place
When one thing goes mistaken, prospects need to know in a single place what the agent ran, what it known as, the place it slowed down, and the place it failed. A typical harness invocation crosses runtime + reminiscence + gateway + a built-in software or two, and stitching that image collectively used to imply opening 5 tabs.
At GA, each harness web page within the AgentCore console reveals a single observability widget: an combination row that summarizes the harness throughout each primitive it touched, plus per-primitive sections that seem just for the primitives the harness is configured with or has used.

For deeper evaluation, CloudWatch GenAI Observability has a brand new Harnesses tab alongside Runtime and different primitives. Drill from a harness, right into a session, right into a single hint, and see precisely what the agent did, in what order, how lengthy every step took, and the place it failed. Logs from each primitive (reminiscence, gateway, browser, code interpreter) floor inline on the proper span, so that you cease hopping between log teams to piece collectively what occurred.

Consider and optimize: Hold bettering your agent in manufacturing
As soon as your agent is in manufacturing, the query shifts from “does it work?” to “is it bettering?” Prospects need a approach to rating how their agent is definitely doing on actual visitors, get options on what to alter, and validate these adjustments earlier than rolling them out. GA brings two items that shut that loop:
- AgentCore Evaluations rating harness traces with built-in massive language mannequin (LLM)-as-a-judge evaluators (helpfulness, faithfulness, security), or with customized evaluators you creator. Run them on-line (scoring each session because it occurs), on-demand for a single hint, in batch over historic traces, towards a set check dataset, or as a simulation with artificial customers to stress-test earlier than going reside.
- AgentCore optimization reads these evaluator scores and generates immediate and tool-description suggestions, then validates them by routing reside visitors between two variants by means of AgentCore Gateway with on-line analysis scoring per session and statistical significance reporting. Variants will be totally different variations of an non-obligatory configuration bundle on the identical runtime, or totally different model pointing at totally different endpoints, so you may A/B-test immediate and tool-description adjustments with out redeploying code by pointing simply at a special endpoint.
Run your harness, seize traces, get scores, get suggestions, A/B-test the really helpful configuration towards the present one, then ship the winner.
Model and roll again: Roll out adjustments safely, roll again immediately
Prospects need to replace prompts, swap a software, or strive a brand new mannequin on a subset of visitors with out placing the entire agent in danger. Versioning and endpoints on the harness mirror what AgentCore Runtime already provides: each UpdateHarness creates an immutable model capturing the complete configuration (mannequin, system immediate, instruments, reminiscence config, expertise, surroundings, truncation, execution limits), and rollback is “level the endpoint at an earlier model.”
The DEFAULT endpoint auto-advances on each replace. Named endpoints (PROD, STAGING) keep pinned till you explicitly promote.
Export to code: Graduate when configuration isn’t sufficient
When a use case outgrows configuration (customized orchestration, multi-agent coordination, deep instrumentation), prospects need to take the agent additional with out rebuilding it from scratch. One CLI command exports the harness as Strands-based code that may host on AgentCore Runtime or anyplace else:
The exported challenge preserves your mannequin, immediate, instruments, reminiscence wiring, expertise, and container surroundings. Identical compute path, identical observability, identical identification primitives. The commencement is a config-to-code translation, not an structure change.
Strands is the primary export goal; Claude Agent SDK is coming quickly, so prospects preferring that framework can graduate the identical method.
That is the a part of the harness story we care about most. When configuration stops being sufficient, you graduate to the identical compute and the identical primitives, with code you may learn and modify, as a substitute of beginning over from scratch.
Different notable additions
We additionally added the next:
Step Capabilities integration. A harness invocation is now a first-class state in AWS Step Capabilities. In Workflow Studio, seek for AgentCore InvokeHarness and drag it into your workflow. Use Fast Create Harness to scaffold a brand new harness and execution function from inside Step Capabilities, or level at an current harness and override per name. The identical InvokeHarness semantics apply, with defaults on the harness and overrides on the Process state.
Internet Search on AgentCore. The brand new Internet Search on AgentCore (additionally launched at NY Summit) is obtainable to harness brokers by means of AgentCore Gateway: expose Internet Search as a Gateway goal, reference the Gateway from the harness, and the agent has search. A primary-party agentcore_web_search software kind is coming quickly, matching the one-line sample of agentcore_browser and agentcore_code_interpreter.
What you are able to do with all of this
There are numerous use instances the harness can assist, throughout industries and agent sorts. To present you a way of the variety, listed here are three concrete examples, every one thing groups informed us they have been piecing collectively by hand earlier than.
A analysis and writing agent. The agent may search the net, browse sources, draft a doc, and hand you again an actual xlsx or pptx file, with reminiscence carrying throughout classes so the subsequent query doesn’t replay all the things. The minimal to face it up is one CreateHarness name:
instruments:agentcore_browser, plus a Gateway goal that exposes Internet Search on AgentCore.expertise: agitsupply pointing atanthropics/expertisefor the document-skills bundle.
Reminiscence is on by default, so that you don’t configure it explicitly. That’s it.
An AWS knowledge and analytics agent in your group. The agent may pull knowledge out of your AWS account (Amazon Athena, AWS Glue, Amazon S3, Amazon Redshift, Amazon CloudWatch), run an evaluation, and hand again a abstract, a chart, or a discovering, whereas following AWS-recommended procedures for accessing every service step-by-step as a substitute of improvising. The minimal to face it up is one CreateHarness name:
expertise:[{"awsSkills": {}}]to flip on the curated AWS catalog (analytics, database, Amazon EC2, networking, safety, serverless, and storage).executionRoleArn: an IAM function scoped to no matter AWS APIs you need the agent to learn from.
Add agentcore_code_interpreter in order for you the agent to additionally run Python in a sandbox to slice and visualize the information it pulls.
A coding agent. The agent may learn your code base, plan a change, write it, run the checks, and open a pull request (PR), with the flexibility to change to a special mannequin mid-session for design and implementation with out shedding context. The minimal to face it up is 2 steps:
- Push a customized container together with your repo and toolchain to Amazon ECR.
- Name
CreateHarnesswithenvironmentArtifactpointing at that picture, plus a Gateway goal wired to GitHub (or your inner GitLab or Bitbucket equal) so the agent can work together with branches, PRs, and critiques.
For deterministic git operations like clone, commit, push, and open a PR (with out paying the mannequin to suppose by means of them), name InvokeAgentRuntimeCommand straight.
These are three totally different brokers, with the very same harness. The API configuration is the one factor that adjustments.
Pay just for what you employ
There isn’t any extra harness payment. You pay for the underlying capabilities primarily based on precise consumption.
- Runtime compute (the place the harness session runs): active-consumption pricing per second, $0.0895 per vCPU-hour, $0.00945 per GB-hour. Agentic workloads spend important time ready on mannequin and power I/O. Runtime payments solely when CPU is definitely consumed.
- Browser and Code Interpreter: identical active-consumption mannequin.
- Gateway: per-1,000 invocations and per-1,000 search queries.
- Reminiscence: per-1,000 short-term occasions, per-1,000 long-term data per 30 days, per-1,000 retrievals.
- Observability: normal Amazon CloudWatch pricing for spans, logs, and metrics.
- Mannequin inference: charged by Amazon Bedrock or the third-party supplier at their normal charges.
Every is impartial. Use one, use all. An agent that runs for 60 seconds and calls two instruments prices accordingly. An agent that runs for an hour with heavy compute prices accordingly. You pay proportionally to what your agent really computes.
For full pricing particulars, see the AgentCore pricing web page.
What a few of our prospects are enthusiastic about with harness
Omar Paul, VP of Product at Twilio acknowledged that “Twilio’s prospects are constructing AI brokers that work throughout voice, messaging, and digital channels — with real-time intelligence and protracted reminiscence that make each interplay really feel like a dialog. By combining AgentCore harness with Twilio Conversations, builders can go from concept to reside agent with out rewiring infrastructure. The perfect buyer experiences occur when nice AI and nice communications infrastructure are constructed collectively.”
Dr. Lukas Schack, Principal Machine Studying Engineer at TUI GROUP informed us that “Amazon Bedrock AgentCore has change into a core constructing block at TUI: we use Runtime to host brokers throughout frameworks and Reminiscence to share context between them, in manufacturing and in workshops with over 500 staff, typically with greater than 130 folks constructing on the identical time. With AgentCore harness what used to take weeks from concept to working product now takes minutes, and customer-facing use instances are subsequent.”
Rodrigo Moreira, VP of Engineering, VTEX stated “We’re constructing AI brokers that may revolutionize ecommerce. Beforehand, prototyping every new agent required days of orchestration code and infrastructure setup earlier than we may validate an concept. AgentCore harness has modified that: swapping a mannequin, including a software, changing a talent, or refining an agent’s directions is now a configuration change, not a rebuild. We will now validate agent concepts in minutes as a substitute of days, and we’re wanting ahead to accelerating agent improvement additional with these new capabilities”.
Kazumi Matsuda, Senior Supervisor, AI Promotion Division at FUJISOFT famous that “At FUJISOFT, we’re constructing AI brokers to speed up software program improvement and operations throughout our groups. Our framework, Character Capsule, packages agent roles, expertise, and execution procedures as reusable capsules that scale to multi-agent orchestration on AgentCore. With AgentCore harness, we deploy new brokers in minutes and model every change. As soon as in manufacturing, evaluations scores how our brokers carry out utilizing execution logs, and AgentCore’s optimization capabilities generate immediate and power options primarily based on these scores. We A/B check these suggestions on reside visitors earlier than rolling out, so enchancment is steady, not guesswork. Collectively, these capabilities allow us to get up new brokers rapidly and maintain bettering them with confidence, catching high quality regressions earlier than they attain manufacturing and rolling out solely the adjustments we’ve validated throughout our multi-agent patterns.”
Get began
Amazon Bedrock AgentCore harness is obtainable at this time in all AWS Areas the place AgentCore is usually obtainable.
The sooner a group can get from concept to working agent, the extra concepts they’ll afford to check. The harness collapses that loop from days to minutes. We’re excited to see what you construct.
Further assets
For extra data, see the next:
Concerning the authors

