Manufacturing-grade AI brokers for monetary compliance: Classes from Stripe

This put up is co-written by Christopher Phillippi and Chrissie Cui from Stripe.

Stripe processes $1.4 trillion in annual fee quantity throughout 50 nations, requiring compliance groups to assessment 1000’s of transactions day by day. This put up explores how Stripe constructed a production-grade AI agent system on AWS utilizing Amazon Bedrock that lowered assessment dealing with time by 26 p.c whereas sustaining human oversight. The put up covers the technical structure, infrastructure selections, and classes discovered from deploying agentic AI that achieved over 96 p.c helpfulness scores, with human consultants firmly in command of remaining selections.

On this put up, you learn the way Stripe constructed a production-grade AI agent system for monetary compliance. We cowl the technical structure of Stripe’s ReAct agent framework and the infrastructure selections behind a devoted agent service. We additionally focus on the position of human oversight in sustaining accountability, and key classes about activity decomposition, orchestration patterns, and price optimization by means of immediate caching. By the tip, you’ll perceive how you can design agentic methods that scale compliance operations with out compromising high quality or auditability.

Stripe’s scale and compliance problem

The foundational mission of Stripe is to develop the gross home product (GDP) of the web. That pursuit requires programmable monetary infrastructure designed to help easy transactions and operational administration for companies of all scales. As of early 2026, Stripe has grown past its origins as a developer-centric fee API to change into a systemic pillar of the worldwide economic system. The corporate helps thousands and thousands of firms throughout 50 nations, from early-stage startups to 62 p.c of the Fortune 500, and processes roughly $1.4 trillion in annual fee quantity. This scale represents roughly 1.3 p.c of the whole international GDP, positioning Stripe on the crucial nexus of technological innovation and powerful regulatory frameworks.

The compliance scaling drawback

As Stripe’s international footprint expanded throughout 50 nations, the group confronted a crucial problem: how you can scale compliance operations with out proportional headcount will increase whereas sustaining regulatory high quality requirements. On daily basis, compliance groups conduct detailed critiques to establish and mitigate monetary crime dangers. Nonetheless, expert analysts have been spending as much as 80% of their time navigating fragmented methods to collect documentation moderately than performing high-value danger assessments. Stripe’s resolution integrates AI brokers with automated orchestration, remodeling compliance from a resource-intensive course of right into a scalable engine. This strategy addresses the $206 billion international compliance burden by serving to organizations establish 95% of card-testing assaults in actual time and scale back pointless buyer friction by 20%. The strategy additionally maintains the auditability and precision required by regulators.

Why agentic AI for compliance?

The constraints of conventional automation for complicated, judgment-based compliance work imply AI brokers are wanted to deal with assisted investigations with scale, constant high quality, and full auditability whereas maintaining people in management.

Three pillars

Oversight and accountability – Human-centered validation with configurable approval workflows and multi-layered resolution checkpoints. People keep within the driver’s seat, supported by brokers.
Transparency – Full audit path with immutable documentation of each motion, resolution, and rationale.
Effectivity – Pre-investigation and dynamic evaluation enable deeper critiques at sooner tempo.

Technical structure

The technical implementation of Stripe’s agentic compliance system consists of three key parts: activity decomposition and orchestration, the ReAct agent framework, and supporting infrastructure companies. Every part performs a crucial position in reaching scalable, auditable compliance automation.

Job decomposition and assessment orchestration

Assigning a single agent to deal with this lengthy, sophisticated assessment in a single go wouldn’t have labored. A single, unconstrained agent would have targeted an excessive amount of on the improper issues and never sufficient on what was really wanted. As a substitute, Stripe made the answer tractable by breaking the sophisticated assessment into composable, bite-sized sub-tasks. Every sub-task might doubtlessly rely upon the outcomes of different sub-tasks as a directed acyclic graph (DAG). These rails assist confirm every agentic course of is barely run on vetted questions the place high quality has been measured by means of high quality testing. In addition they assist verify the investigation covers the required bases, and supply the agent enough context and focus to ship high quality outcomes.

Regardless of rigorous high quality testing of the agent responses in every sub-task, Stripe’s implementation doesn’t rely outright on the response of an agent. As a substitute, the responses are supplied as supplementary info to the human reviewer, who should finally reply every sub-task of the assessment. This solves for oversight and accountability whereas nonetheless capturing the effectivity advantages. The high-level assessment stream is proven within the following diagram.

Reviewers work together with the assessment tooling, which is conscious of the present query and which subsequent questions require that reply as context. The tooling capabilities because the orchestrator, piping human-reviewed solutions as context for additional questions.

ReAct agent framework implementation

To fetch analysis for every sub-question, Stripe constructed a compliance agent utilizing a type of the ReAct (reasoning and performing) agent framework. Past utilizing a big language mannequin (LLM), a kind of basis mannequin (FM) on Amazon Bedrock for reasoning, the agentic side dynamically gathers related alerts by means of device calls. Stripe selected this agent framework to unravel the issue of a near-infinite variety of alerts that will or is probably not related for a given topic. Brokers decide which alerts are related and suggest follow-ups till they’re sufficiently assured to supply a remaining reply. The high-level agent logic is proven within the following diagram.

To stroll by means of this stream, think about being requested the question: “what’s the reply to 10 divided by the quantity π?”

For those who have been a ReAct agent, your first thought could be to contemplate whether or not you have already got the reply. You don’t, so you’d suggest an motion of taking out a calculator and inputting 10/π. The calculator would then return an statement. Your subsequent thought could be to find out whether or not you’ve got a solution, and also you would offer that calculation as your remaining reply. You’ll be able to think about one thing tougher, resembling “produce an evaluation forecasting subsequent yr’s firm income”, taking many cycles of database querying (Device) and interpretation (Thought) iterations.

Within the ReAct cycle, each time a device is requested within the Thought block, the agent framework stops the LLM execution and as a substitute programmatically runs that device. It then forces that output as an statement again to the agent earlier than permitting it to proceed. This injection sample implements a closed-loop management mechanism that:

Grounds agent reasoning in precise knowledge – By mandating that each device output have to be processed as an statement, this prevents the agent from hallucinating or fabricating device outcomes.
Maintains context coherence – Forces the agent to explicitly acknowledge and motive about each bit of retrieved info earlier than continuing.
Prevents reasoning drift – The statement step acts as a checkpoint, serving to confirm the agent’s thought course of stays anchored to factual device outputs moderately than speculative reasoning.
Helps auditability – Creates an express hint of device invocation → statement → reasoning that may be logged for compliance assessment.

That is analogous to a suggestions management system in engineering. The agent can’t proceed to the subsequent motion with out first processing the suggestions (statement) from its earlier motion, stopping open-loop habits that would result in hallucinations or off-track reasoning.

A problem with this strategy is that when a activity is so sophisticated that it wants many turns and observations, the immediate can get very lengthy within the later turns, significantly with verbose observations. The sub-task decomposition limits the scope of every query to maintain the variety of turns smaller. Immediate caching additionally helps with the price of enter tokens, which is the first value driver right here. With immediate caching, you solely pay for the brand new observations and ideas which might be appended to the earlier messages at every flip. Amazon Bedrock offers this functionality.

Full agentic assessment structure and infrastructure

Stripe relied on a big quantity of infrastructure to help the precise agentic execution. The next diagram exhibits the total structure.

The total structure consists of the assessment interface and orchestrator coated earlier and an agent service that hosts the agent logic and facilitates execution. The agent service is supported by Stripe’s LLM Proxy service and linked to inner alerts by means of out there agent instruments.

Constructing a devoted agent service

Earlier than this undertaking, Stripe’s agent service didn’t exist, and this undertaking resulted in Stripe requesting it. Initially, Stripe tried to suit an agent into a standard ML inference engine. This strategy was rejected rapidly for the next causes:

Compute profiles – Conventional ML is compute sure, requiring costly {hardware} resembling GPUs, quick multi-threaded CPUs, or massive reminiscence allocations. In distinction, agentic purposes are largely community sure, ready on basis fashions to complete or device calls to run.
Latency – Referencing the ReAct stream described beforehand, an agent can take an indeterminate period of time to complete, relying on what number of rounds of device calls it wants. A protracted agent question or a database device name might trigger a thread to sit down idle for minutes, in comparison with an XGBoost mannequin that will end in milliseconds.
Completely different API – In distinction to conventional ML that tends to output fundamental sorts (floats, Booleans, and others), brokers want extra flexibility of their schema to annotate their outcomes. Some brokers want to keep up stateful dialog states.

Consequently, Stripe stood up its personal agent service, initially resembling a stateless, synchronous inference endpoint. At the moment it additionally handles stateful, multi-turn conversational brokers. It has grown from a number of brokers at launch to properly over 100 brokers in lower than a yr.

LLM proxy structure

Stripe’s ReAct agent doesn’t name Amazon Bedrock straight. As a substitute, Stripe makes use of an LLM Proxy microservice as its customary technique for LLM entry. The next diagram exhibits the LLM Proxy structure.

Stripe makes use of an LLM Proxy service for the next causes:

Noisy neighbors – Stripe has many groups utilizing LLMs for numerous purposes. The LLM Proxy offers safeguards from different groups hogging the LLM bandwidth for a selected mannequin, stopping useful resource competition.
One API, many fashions – The one endpoint simplifies specifying capabilities resembling immediate caching or device calling throughout basis fashions from Amazon and main AI firms. Altering fashions requires solely altering the mannequin sort as an argument, as a substitute of every use case managing many alternative purchasers.
Mannequin fallbacks – This offers the power to mechanically specify default fashions within the case of useful resource constraints or outright failure.
Monitoring – By requiring authentication, the service can observe mannequin utilization to assist forecast future useful resource demand and ensure the suitable fashions are getting used relying on the privateness of the appliance.

How architectural parts work collectively

Human reviewers drive the assessment, utilizing agentic responses as pre-fetched analysis. As they reply, these responses can be utilized within the prompts for deeper questions throughout the identical assessment, orchestrating assessment questions as a directed acyclic graph (DAG).

For a given query, the agent can name instruments to dynamically entry inner knowledge or companies as wanted. This strategy is used as a result of the potential related alerts that might be examined are usually a lot bigger than what may be included in a immediate. The tool-calling side of the agent means the thought log consists of solely the related knowledge to reply the present query, with out extra irrelevant info, inducing focus.

The agent itself is pushed by basis fashions from Amazon and main AI firms, that are liable for considering and figuring out which device calls are wanted. The agent software accesses the LLM by means of the LLM Shopper, which abstracts away options resembling immediate caching and mannequin fallbacks.

Amazon Bedrock integration advantages

Stripe makes use of Amazon Bedrock inside its LLM Proxy. Amazon Bedrock offers the next additional advantages:

Standardized privateness and safety – As a fee processor, Stripe have to be additional cautious round privateness and safety. Amazon Bedrock helps confirm that basis fashions from Amazon and main AI firms match inside current safety and privateness constraints, with out extra assessment overhead for every mannequin.
Characteristic wealthy – As described earlier, Amazon Bedrock permits for immediate caching on supported fashions. Moreover, Amazon Bedrock permits for fine-tuning and serving customized fashions, which Stripe expects to concentrate on within the coming yr.
One API, many fashions – Integration is simple as a result of fashions fall inside the similar API. Altering fashions requires utilizing a unique mannequin identify. Amazon Bedrock additionally helps many alternative basis fashions from Amazon and main AI firms, offering industry-standard efficiency for Stripe.

Audit path implementation for regulatory compliance

Regardless that Stripe finally makes use of human reviewers to make judgments and selections, the system nonetheless should confirm it stands as much as regulatory scrutiny. Consequently, Stripe carried out logging so all the agent log is retrievable for every run traditionally. Each agent motion, resolution, and rationale is documented.

Outcomes and influence: 26 p.c sooner critiques with over 96 p.c helpfulness

Stripe achieved a 26 p.c discount in median assessment dealing with time by means of agentic automation, with over 96 p.c helpfulness scores maintained from reviewers, and human reviewers in command of selections. This was completed whereas offering full audit trails assembly examination requirements.

As Stripe continues to develop, the group will be capable to sustain with proportional demand for danger administration. Human reviewers can focus their time on harder issues or new investigation alternatives, resulting in an improved compliance program.

Key classes discovered from manufacturing deployment

By means of the method of constructing and deploying this manufacturing agentic AI system, Stripe distilled a number of insights that formed the undertaking’s success and may inform comparable implementations.

Chunk-sized duties – Preserve agent duties sufficiently small for working reminiscence. Take a look at high quality incrementally moderately than diving straight into full automation.

Orchestration – Async workflow structure with DAG help is important for complicated agent interactions whereas sustaining auditability and human oversight at scale.

Infrastructure – Devoted microservice structure issues as a result of brokers have basically totally different useful resource profiles than conventional ML fashions. Conventional inference methods are compute-bound and optimized for millisecond responses on costly GPU {hardware}. Brokers are network-bound, spending minutes ready on LLM calls and gear executions with unpredictable latency patterns. A devoted agent service handles these long-running, stateful interactions by means of async execution patterns. This enables threads to effectively handle a number of concurrent agent periods with out blocking on exterior calls. Token caching reduces prices by 60% by reusing widespread immediate prefixes throughout agent turns moderately than reprocessing all the dialog historical past on every step. Price instrumentation tracks token utilization per agent invocation, serving to groups forecast spend as workloads scale and establish optimization alternatives earlier than they influence budgets. This infrastructure-first strategy remodeled brokers from an experimental prototype right into a manufacturing service supporting greater than 100 brokers throughout Stripe.

Preserve people in management – Brokers help, however skilled reviewers preserve remaining resolution authority. Constrain brokers with rails to sure context.

What’s subsequent

Initially, Stripe targeted on questions that may be answered earlier than the assessment even begins. Remaining questions possible require upstream context identified and validated in the course of the assessment. This can result in extra complicated, multi-step investigations that orchestrate real-time solutions as context in the course of the assessment, supporting deeper effectivity enhancements. The present 26 p.c discount represents early progress.

As a result of Stripe isn’t keen to simply accept a rise in danger tolerance through the use of this know-how, the workforce assessments the agentic investigation part towards human high quality requirements. The workforce validates with precise people earlier than permitting the part to tell reviewers in manufacturing. The workforce can be exploring methods to make use of LLMs to rapidly choose and get rid of subpar approaches.

Amazon Bedrock offers customization capabilities that Stripe is exploring to additional improve its compliance system. At present, Stripe makes use of Retrieval Augmented Technology (RAG) for dynamic information injection by means of device calls, which supplies its brokers entry to real-time compliance knowledge. Wanting forward, Stripe is contemplating utilizing the fine-tuning capabilities of Amazon Bedrock to adapt mannequin habits particularly for monetary compliance duties. This could assist lock in mannequin high quality and scale back re-evaluation overhead as fashions evolve. Moreover, Amazon Bedrock offers continued pre-training choices for incorporating domain-specific information, which might assist construct extra specialised compliance experience into agent reasoning. The mannequin versioning and 6-month deprecation discover window in Amazon Bedrock helps plan these customization efforts strategically, permitting mannequin upgrades solely after they meaningfully enhance investigative capabilities. These complementary methods work collectively to steadiness efficiency, stability, and adaptableness as compliance operations scale.

Conclusion

Stripe has demonstrated that brokers can pace up guide assessment processes, reaching a 26 p.c discount in assessment dealing with time whereas sustaining over 96 p.c helpfulness scores, even with people sustaining resolution authority moderately than full automation. As a substitute of counting on the facility of brokers alone, Stripe completed this by constructing rails to constrain brokers to the bite-sized assessment areas the place they are often profitable. To realize this, Stripe wanted new agentic serving infrastructure, impressed by however distinct from the machine studying inference methods which have traditionally existed.

This grew to become attainable with Amazon Bedrock, which supplied Stripe with the privateness protections and mannequin choice that supported this soar in assessment effectivity, and these capabilities are anticipated to increase into many different domains.

To be taught extra about how you can construct comparable agentic methods on Amazon Bedrock, see the Amazon Bedrock Person Information and the Amazon Bedrock immediate caching documentation. To get began, go to the Amazon Bedrock console.