A Harness for Each Activity: Placing a Group of Claudes on One Job

1.

For many of 2024 and 2025, the default reply was easy: give the duty to at least one agent, use the largest context window out there, and wait. Generally it labored. Typically, the mannequin quietly misplaced the thread partway by way of.

Anthropic described the issue immediately: long-horizon duties require brokers to remain coherent throughout many steps, typically past what a context window can reliably help. Greater home windows helped, however they didn’t clear up it.

Anthropic had already shipped instruments to assist. Subagents let the principle agent delegate facet duties to remoted employees, every with its personal recent context, gathering summaries again into the principle dialog. Expertise packaged repeatable workflows into Markdown information — a recipe Claude might comply with on demand. Agent groups went additional nonetheless: a number of impartial Claude periods, every with its personal context window, coordinating by way of a shared process listing and messaging one another immediately.

All of this was actual progress. However every software nonetheless had the identical structural ceiling.

With subagents, the orchestrating Claude session nonetheless holds the plan. Each consequence that comes again from a employee lands in the principle dialog’s context window. With subagents, abilities, and agent groups, Claude is the orchestrator: it decides flip by flip what to spawn or assign subsequent, and all the outcomes accumulate within the context. This implies the orchestrating context expands because the variety of brokers will increase, finally reaching its limits. Consequently, the orchestrating degrades, and the identical failure modes seem.

Anthropic recognized three failure modes that seem constantly when one context window — whether or not it belongs to a single agent or a lead orchestrating a small workforce — is chargeable for a process too giant to trace cleanly. That’s the place the three frequent failure modes present up (Determine 1).

Determine 1. One thoughts, one context window — and the 3 ways it quietly fails on an enormous job. Picture by creator assist by ChatGPT

First, Agentic laziness — It begins the duty however doesn’t absolutely end. It might cease early, skip some information, or assume the remaining work is comparable sufficient. Then it confidently says the entire process is finished. This is sort of a individual checking solely a part of an extended spreadsheet however marking your complete spreadsheet as reviewed.

Second, Self-preferential bias. The AI is just not very strict when judging its personal output. When you ask it, “Did you comply with the directions?” it typically says sure, as a result of it tends to present itself the good thing about the doubt. It might miss its personal errors or overrate the standard of its reply.

Third, Purpose drift. Over an extended process, the AI slowly loses observe of the unique objective. It might keep in mind the principle process, however neglect vital particulars like “don’t embrace X”, “don’t skip any file” or “solely use this format”. The longer the dialog or process turns into, the extra doubtless this drift occurs.

These will not be bugs. They’re what occurs when the plan is a thought, and ideas degrade.

The fee grew to become laborious to disregard in early 2026, when Jarred Sumner, creator of Bun, wanted to port, file-by-file, about 750,000 traces of Zig to Rust. Prior to now, a process like this may have taken a workforce months. Sumner’s sample was easy: do one unit of labor, run an adversarial evaluate, then apply the modifications. He later referred to as Dynamic Workflows “the state-of-the-art at present for reliably utilizing brokers to finish medium-to-large tasks.” The consequence: 750,000 traces of Rust, 99.8% of the prevailing take a look at suite passing, and solely 11 days from first decide to merge.

The important thing thought is that Claude doesn’t must hold the entire plan in its head. The workflow strikes the plan into code. The script holds the loop, the branches, and the intermediate outcomes. Claude solely must deal with the present step and the ultimate synthesis. The plan turns into a JavaScript file. It doesn’t neglect, drift, or cease midway and name the job performed.

That’s the drawback Dynamic Workflows had been constructed to unravel. And that’s what this text covers.

By the top, you’ll perceive precisely the place subagents, abilities, and agent groups attain their limits and why — not as a obscure instinct, however as a structural argument you possibly can apply to your personal duties. You’ll know the six composition patterns that cowl the vast majority of real-world workflow issues, easy methods to write a workflow immediate that truly produces a helpful harness, and easy methods to keep away from the 2 costliest errors folks make when beginning out. Additionally, you will know when a workflow is the improper software — as a result of Dynamic Workflows devour considerably extra tokens than an ordinary session, and reaching for them on the improper process is its personal type of failure.

2. What a dynamic workflow is

A dynamic workflow is like changing one exhausted individual with a small, targeted workforce.

As a substitute of asking one AI to hold the entire mission from begin to end, you break up the work into clear items. One agent handles one process. One other checks the consequence. One other strikes the work ahead. Consequently, nobody will get drained within the center and begins chopping corners. Nobody offers themselves an ideal rating simply because they wrote the reply. And nobody forgets the unique temporary, as a result of every agent solely has to carry one clear piece of the job.

Claude’s dynamic workflow helps you do that. It splits the job throughout a workforce of fresh-context Claudes. Each handles a smaller piece, one other layer checks the work, and the outcomes are merged again into one reply for you.

The key phrase right here is harness. A harness is the scaffolding across the mannequin: the half that decides how a process is deliberate, divided, checked, and executed. The default Claude Code harness is constructed primarily for coding duties. Anthropic’s workforce discovered that these dynamic harnesses are “generally much more helpful for non-technical work.” Then they created it on the spot, formed across the process you give it.

Earlier than going additional, it helps to separate a workflow from a couple of different phrases that usually get blended collectively. Instruments, brokers, harnesses, and workflows are sometimes used as in the event that they imply the identical factor. They don’t. The cleanest option to separate them — I’m borrowing this framing from AlphaSignal — is asking one query: who holds the plan? (Determine 2)

Subagents vs Agent Teams vs Dynamic workflow — Determine 2. One query — *who holds the plan?* *Picture by creator* *with* assist of ChatGPT

A subagent is a helper the principle Claude sends out for one particular job. The plan nonetheless stays with the principle Claude. The subagent does its half, sends the consequence again, and that consequence seems in your chat. It’s principally fire-and-forget. Because the desk beneath exhibits, a subagent can’t create its personal helpers or discuss to different subagents.
An agent workforce is totally different. It’s a group of Claudes working facet by facet, coordinating as friends. The plan doesn’t sit inside one Claude. It lives between them. They will message one another, alter because the work unfolds, and proceed throughout one bigger shared process. It’s extra like giving a mission to a small workforce.
A dynamic workflow is totally different once more. Claude writes a small JavaScript program for the duty itself. On this case, the plan lives in code. The brokers do their work off to the facet, their outputs are saved in variables, and solely the ultimate merged reply comes again to you.

An agent workforce and a dynamic workflow appear to be alike. Nevertheless, they’re completely distinct. Test the beneath desk to see that.

	Subagent	Agent workforce	Dynamic workflow
Who holds the plan	the principle Claude (orchestrator), in its head	the friends, between them	a JavaScript program
Lifecycle	fire-and-forget, one job	long-running, ongoing	runs as soon as, returns one reply
Discuss to one another?	no — the orchestrator routes all the pieces, and a subagent can’t even spawn its personal subagents	sure — they coordinate as friends over time	no — brokers work off to the facet in script variables; solely the ultimate consequence comes again
Looks like	an intern you hand one process	colleagues on a shared mission	an meeting line you designed

And also you may ask one other query. What’s Dynamic? What are the variations of dynamic vs. static?

You may all the time construct a harness your self. You may wire up the Agent SDK, or run claude -p in a loop, and create a set system that you just use time and again. That may be a static harness: helpful, repeatable, however designed upfront.

A dynamic harness is the reverse. Claude writes the harness within the second, formed across the process you simply gave it. It plans the construction, splits the work, runs the brokers, checks the outputs, after which throws the harness away when the job is finished — except you press s to put it aside.

Static harnesses are general-purpose; dynamic ones are tailored and disposable.

Claude is now able to constructing dynamic workflows as a result of Opus 4.8 is now succesful sufficient to construct the precise harness on the fly — because the Anthropic workforce mentioned, “clever sufficient to put in writing a customized harness tailored on your use case.”

3. The true take a look at

3.1 Patterns that make dynamic workflows helpful

There are 6 workflows that Anthropic introduces, and I did some assessments with them to intuitively present you the way they work. They’re:

Fan-out-and-synthesize — break up the work, then merge them. Each bit will get its personal agent and clear context; a ultimate synthesizer waits for all of them earlier than combining outcomes.
Adversarial verification — for each discovering, spawn a separate agent whose solely job is to disprove it. A skeptic checking the optimist.
Classify-and-act — use a classifier agent to type every merchandise first, then route it to the precise handler. A entrance desk.
Generate-and-filter — brainstorm vast, then filter by a rubric: dedupe, confirm, hold solely what survives scrutiny.
Match — spawn N brokers that every try the identical process in another way, then have a decide agent evaluate them in pairs till one wins. Good for style and naming.
Loop-until-done — for jobs of unknown measurement, hold spawning brokers till a cease situation is met (no new findings, no extra errors) fairly than a set variety of passes.

Fan-out-and-synthesize might be some of the seen patterns. One process splits into a number of brokers, every with its personal clear context to allow them to’t contaminate one another, after which a synthesize step — a step that waits for everybody — merges their work into one consequence (Determine 3).

Determine 3. Fan-out-and-synthesize: break up into clean-context brokers, then a barrier merges everybody’s work into one consequence. Picture by creator with assist of ChatGPT

And Adversarial verification can be one other frequent sample (Determine 4).

Determine 4. Adversarial verification: a discovering faces a panel of refuters; majority-refute kills it, the remainder survive. Picture by creator assist by ChatGPT

3.2 Dynamic Workflow on non-technical drawback

The quickest option to perceive dynamic workflows is to make use of one on an issue that has nothing to do with code.

So I gave Claude a plain marketing strategy for a restaurant subscription mannequin and requested it to tear the thought aside from three hostile angles without delay: a risk-averse investor, a demanding buyer, and an incumbent competitor. Every agent labored independently. Then a ultimate synthesizer pulled the outcomes collectively and returned the three strongest objections, plus how I might reply them.

Right here’s that run (Determine 5), sped up:

Determine 5. Fan-out-and-synthesize: three critics assault a marketing strategy in parallel; one Opus agent synthesizes. 4 brokers, ~262k tokens, ~13s. *Picture by creator*

That is the fan-out-and-synthesize sample: three brokers fan out throughout the identical drawback from totally different viewpoints, then one agent synthesizes the outcomes. The entire run took about 13 seconds.

The vital half was not the pace. It was the separation. As a result of the brokers didn’t share the identical context window, they didn’t quietly affect one another or soften one another’s conclusions. Each got here again with a distinct type of view.

Listed here are the solutions:

The investor attacked the mathematics: The economics are too skinny to outlive churn. At $29/month and roughly 40% margin, the product makes solely about $11.60 in gross revenue per buyer monthly. With a $35 buyer acquisition value, the enterprise wants clients to remain lengthy sufficient for lifetime worth to obviously beat acquisition value. However meals subscriptions often face churn, and one weak retention month can push the mannequin underwater. Reply: repair the unit economics earlier than scaling: improve income per person by way of annual plans or add-ons, show low cohort churn, and mannequin LTV-to-CAC explicitly.
The shopper attacked the worth: The pitch leaned too laborious on concepts like rotating menus and carbon-neutral supply. These could sound good in a deck, however they is probably not what clients care about most when selecting dinner. Most clients need pace, flexibility, and fewer each day decision-making. Reply: make the worth extra sensible: lead with time saved, comfort, and the way the service makes weeknight meals simpler.
The competitor attacked the moat: A rotating menu and carbon-neutral supply will be copied rapidly. Neither creates a lot switching value. A bigger competitor might imitate the surface-level options, undercut the value, or bundle the supply into an present supply community. Reply: construct a stronger moat: per-city logistics density, personalization, switching credit, or habits that make the service tougher to interchange.

That’s what made the workflow helpful. It didn’t simply give me “suggestions on the marketing strategy.” It gave me three totally different objections from three totally different strain factors: economics, buyer worth, and defensibility. A single chat would most likely have blended these into one well mannered, mildly helpful critique. The workflow made the disagreement sharper. And the nicest half: I didn’t write a line of code.

3.3 Allow dynamic workflows

The setup is small. You turn the mannequin to Opus 4.8 (I’ll clarify it later), and also you set off the workflow both of 3 ways. The dependable means is to simply put the phrase workflow in your immediate. The opposite means is to set effort toultracode, which activates extra-high reasoning and lets Claude resolve itself whether or not to construct a workflow. Nevertheless, watch out with ultracode — it prices extra tokens, so attain for it once you need auto-orchestration.

The third one will be triggered in case you’ve already had a very good workflow earlier than, and it may be triggered once more by way of /. There are two save places: .claude/workflows/ (Challenge shared; accessible to everybody who cloned the repository) ~/.claude/workflows/ (Private use; accessible to all tasks, however solely to you)

The rationale Opus 4.8 issues is that the orchestrator has the toughest job. It’s not simply answering the query. It’s deciding easy methods to break up the duty, writing the workflow script, assigning work to sub-agents, selecting instruments, monitoring outputs, and synthesizing the ultimate consequence. So the sample is: use the neatest mannequin for orchestration, then use smaller or cheaper fashions for the employee brokers when the sub-tasks are narrower.

3.4 Let’s take a look at them out

3.4.1 Default method

The target: I take advantage of a multi-file repo and ask Claude to run workflows to audit this repo utilizing Fan-out-and-synthesize and Adversarial verification.

Immediate: audit the repo with a workflow: fan out finders and confirm every discovering, synthesize a severity-ranked report. use 200k token

Determine 6: Claude Code reply for the workflow creation. *Picture by creator*

As in Determine 5, Claude creates a workflow with 3 phases: Discover –> Confirm –> Synthesis; and makes use of 6 finders for six dimensions: safety, correctness, knowledge integrity, accessibility, code high quality, and repo hygiene. As a result of I didn’t specify the facet for Claude to look into, it mechanically suggests these 6.

It began to run the workflow. To examine the progress, use command /workflows

Determine 7: Workflow progress. *Picture by creator*

Inside /workflows (Determine 7), 6 brokers are operating, and the dangerous factor is that they’re all Opus 4.8 and so they’re consuming ~50k tokens every. My pockets will run out quickly.

After 2 minutes, the finders are all performed and located 50 candidate points (Determine 8). Consequently, there are 50 verifying brokers to be run on every difficulty to examine whether or not the difficulty was actual or only a false optimistic. And all are utilizing Opus 4.8.

That’s often pointless. The orchestrator advantages from the strongest mannequin as a result of it has to design the workflow, break up the duty, handle the brokers, and synthesize the consequence. However many verification duties are narrower: examine this one difficulty, examine the proof, and resolve whether or not it holds up. For that type of targeted work, a less expensive mannequin is usually sufficient.

Subsequently, within the subsequent take a look at, I switched the employee brokers to Sonnet. The objective was to not make the workflow weaker. It was to maintain Opus the place it mattered most — orchestration and synthesis — whereas utilizing a less expensive mannequin for the repeated verification work.

Determine 8: Finder brokers consequence. *Picture by creator*

3.4.2 Cheaper mannequin for brokers

One other strive with Sonnet as brokers and Opus as orchestrator and synthesizer.

In Determine 9, Claude offered 7 finder brokers with Sonnet 4.6 and took 254k tokens to search out 71 candidate points after nearly 5 minutes 17 seconds. Sonnet undoubtedly takes longer than Opus to run.

Determine 9: Finder brokers with Sonnet. *Picture by creator*

You may examine verification particulars of every difficulty within the workflows window as in Determine 10.

Determine 10: Verification window.Picture by creator

The verification technique of 71 points roughly consumes nearly 1.5Million tokens. It prices a lot lower than Opus, however the operating time is considerably longer for finder brokers.

Right here is the results of the synthesizer (Opus 4.8) in Determine 11.

Determine 11: Synthesizer consequence.Picture by creator

The vital factor is that you need to learn the report it produced, evaluate and revise it earlier than placing Claude to work revising the code.

The finder agent nonetheless detects a number of points, and people had been validated as legitimate by verifying brokers later. Nevertheless, these points are the character of the app, that means they must be that means, and detecting them means nothing however creating extra checking work for us. Therefore, I need to add some constraints to the workflow earlier than it runs in order that these points will not be picked up throughout scanning.

3.4.3 Revise the workflow earlier than operating

Immediate: audit the repo with a workflow: fanout finders and confirm every discovering, synthesize a severity-ranked report. use 200k token. Use Sonnet for all brokers and Opus as orchestrator and synthesizer. Write the workflow and provides me the hyperlink to entry and revise it earlier than operating.

Determine 12: Claude stopped after offering the workflow script to amend. *Picture by creator*

Good. Claude offers me the workflow script to evaluate and revise earlier than telling Claude to run it (just by run the workflow) (Determine 12)

I used a shorter codebase and less complicated immediate to exhibit the parts of the JavaScript workflow file in Determine 13.

Determine 13. Fan-out-and-synthesize — walked by way of line by line, then run (4 brokers, ~262k tokens, ~13s). *Picture by creator*

For my testing codebase, right here is the scope that I need to revise:

{
    key: 'correctness',
    immediate: `Audit for CORRECTNESS / LOGIC bugs. Focus: the deterministic date-based each day choose, shuffle conduct, the "final 5 worn excluded" historical past logic (off-by-one, wraparound, per-wardrobe isolation), wardrobe-gender switching, 2-piece/3-piece filter, theme auto-switch by hour (6am-6pm boundaries), localStorage key dealing with. Hint edge circumstances (empty male wardrobe, all outfits just lately worn). Learn app.js and assortment.js.`,
},
{
    key: 'docs-accuracy',
    immediate: `Audit DOCUMENTATION ACCURACY. Examine README.md and docs/*.md claims in opposition to precise code conduct. Focus: options described that do not match implementation, improper localStorage keys, stale config, deployment steps that will not work, outdated counts ("all 40 outfits"). Learn README.md, docs/codebase-summary.md, docs/deployment-guide.md, then confirm in opposition to the code.`,
},

I eliminated: shuffle conduct, theme auto-switch by hour (6am-6pm boundaries).Hint edge circumstances (empty male wardrobe, all outfits just lately worn), and your complete 'docs-accuracy' . I additionally checked different locations within the js file to make sure that the above factors are eliminated.

You too can ask Claude to exclude that, however that is easy, so I favor to do that myself.

So, from 7 elements that the finder brokers will search for, it reduces to six, and one facet has a smaller scope (Determine 14).

Determine 14: Workflow operating course of. *Picture by creator*

Six finder brokers discovered 44 distinct candidate points, and confirmed 40 points. The entire course of, referred to as 51 brokers, took 9 minutes and 52 seconds, consuming ~1.66 million tokens.

3.4.4 Examine to a single agent operating

I ran the identical codebase with a single agent in a single move, no workforce, no verification. It discovered 47 points — extra than the workflow’s 44 — in a 3rd of the tokens. Nevertheless, as a result of it didn’t run verification, so amongst 47 ones, there are the identical 2 improper findings that the verifier brokers within the workflow had caught and eliminated. I present the variations in beneath chart for simpler comparability (Determine 15).

Determine 15: Comparability of single agent and workflow. *Picture by creator* *with assist from* *ChatGPT*

When you give attention to uncooked protection and don’t thoughts self-reviewing, the only agent is a extra economical selection with a trade-off in high quality.

4. When to make use of workflow

Dynamic workflows use much more tokens than a traditional Claude Code session. That’s as a result of they run a number of sub-agents within the background, and every one works in its personal separate context window. So that you shouldn’t use them for each process. When you do, you possibly can burn by way of your plan in only a few hours. The higher method is to make use of them solely when the duty really wants a number of brokers working in parallel. A couple of key indicators will help you resolve when a workflow is price utilizing, are in Determine 16.

Determine 16: When to make use of Dynamic Workflow. *Picture by creator* *with assist from* *ChatGPT*

The primary is that the duty will be break up into impartial items. If every agent is dependent upon one other agent’s output, they principally find yourself ready for one another. At that time, there may be not a lot worth in beginning a workflow, since you lose the principle profit: parallel work. The much less the duties rely on each other, the extra helpful the workflow turns into. You get higher parallelism, and the outcomes come again quicker.

The second sign is whether or not the duty is giant sufficient to want multiple context window. Workflows run a number of sub-agents, and every sub-agent has its personal recent context window. That solely is sensible when the duty is large enough to profit from being divided into chunks. In any other case, you might be simply spending further time and tokens for no actual acquire. That is additionally helpful as a result of every sub-agent returns solely its ultimate consequence. Its detailed reasoning stays inside its personal working file and doesn’t enter the principle context window except you ask for it. That retains the principle dialog cleaner and leaves extra room for the ultimate synthesis.

The following sign is whether or not the duty wants verification. In some circumstances, a improper reply is dear. You don’t want to maneuver ahead primarily based on a weak safety discovering, a false bug report, or a dangerous migration plan. For duties like that, it may be price utilizing further brokers to cross-check the consequence earlier than you belief it. However verification is just not free. Extra brokers imply extra tokens and extra time. So the duty ought to really deserve that stage of checking. Don’t spawn 5 brokers simply since you just lately heard an AI tech CEO say that extra tokens means extra money.

The final sign is whether or not the duty is deterministic. A workflow makes use of code to name brokers in a set construction. So if the duty has a transparent form and will be damaged into identified steps, a workflow works effectively. But when the duty wants an agent to resolve what to do subsequent throughout runtime, then a workflow might be not the precise software. A helpful means to consider that is whether or not the duty is vast or deep. A large process will be break up into many smaller duties that run on the identical time. That’s the place workflows shine. They name a number of brokers in parallel, let every one work by itself half, after which convey the outcomes collectively. A deep process strikes step-by-step. Every step is dependent upon what occurred earlier than it. For that type of process, the objective command is often a greater match. It takes one process at a time and retains shifting ahead, as an alternative of attempting to run many issues in parallel.

5. Can we use Dynamic Workflow economically?

Dynamic Workflows are costly, however I need to take a look at whether or not the most cost effective mannequin, Haiku, can save us tokens and value or not. We can’t change the orchestrator and synthesizer; they should be Opus, that’s non-negotiable. Therefore, let’s attempt to change the subagents to Haiku.

Surprisingly, the workflow completed in ~7.5 min — 37 brokers. It used 37 brokers and 1.35 million tokens. It discovered 23 candidate points, which is way fewer than the Sonnet run above, and all 23 survived verification.

However the fee story was not so simple as “cheaper mannequin, cheaper workflow.” Haiku discovered solely 23 points with 1.35 million tokens. The Sonnet model discovered 40 points with 1.66 million tokens. So though Haiku is cheaper per token, the token effectivity was worse. It wanted extra turns to do the identical type of analytical work, and each further flip meant re-reading extra context. The lesson is straightforward: a smaller mannequin is just not mechanically cheaper in apply. If it takes extra steps to suppose by way of the duty, it might burn by way of its value benefit in a short time.

Haiku prices roughly one-third as a lot as Sonnet per token. On paper, that appears like a straightforward win. However on this take a look at, Haiku used about 1.5 occasions extra tokens. These two numbers nearly cancel one another out. Ultimately, the Haiku fan-out was roughly the identical value as Sonnet, perhaps round 10% cheaper, and solely barely quicker in actual time. So “simply route all the pieces to the smallest mannequin” is just not a dependable rule. A smaller mannequin can lose its value benefit if it wants extra tokens to get the job performed.

Another observe about high quality, which I believe it’s fairly vital. There have been 14 points that appeared in each variations. That was fairly shocking, and it means that the brokers had been really doing helpful work after they had been remoted from one another. Nevertheless, there have been additionally 2 points the place the 2 variations disagreed. Surprisingly, Haiku was proper on each, whereas Sonnet was improper. This doesn’t present which one is a greater mannequin, nevertheless it’s extra just like the mannequin doesn’t carry out 100% constantly as anticipated. One of many causes is that I gave Claude a obscure and broad immediate. Therefore, as an alternative, I’ll take a look at with a extra particular facet.

New immediate: audit the repo in time period of safety vulnerabilities, together with secrets and techniques, auth, injection, dependencies, knowledge dealing with, with a workflow: fanout finders and confirm every discovering, synthesize a severity-ranked report. use 200k token. Use Haiku for all brokers and Opus as orchestrator and synthesizer. Write the workflow and provides me the hyperlink to entry and revise it earlier than operating.

How the run of Haiku went:

15 brokers, ~572k subagent tokens, ~3.5 min wall-clock
5 Haiku finders → Haiku adversarial verifiers → Opus synthesizer
9 uncooked findings → 3 confirmed, 6 refuted. All “excessive” scores had been eliminated.

And for Sonnet brokers:

23 brokers, ~1.3M tokens throughout each passes. ~ 2.51 min
5 Sonnet finders → Sonnet adversarial verifiers → Opus synthesizer
18 uncooked findings → 13 confirmed, 5 rejected → deduped to eight distinct points. No vital/excessive survived adversarial verification.

One vital element: all 3 points confirmed within the Haiku run had been additionally discovered within the Sonnet run. That’s extra constant than the earlier run. One doable cause is that this time the immediate gave the brokers a particular angle to analyze, as an alternative of asking them to take a look at the entire system from a broad view. That is sensible. The workflow used 5 brokers, and every agent targeted solely on one facet of safety. As a result of the scope was narrower, the brokers might dig deeper into the identical sort of drawback as an alternative of spreading their consideration throughout too many doable difficulty classes. When an agent isn’t compelled to prioritize throughout a large floor space, it naturally spends extra of its reasoning finances on the particular drawback it was handed — and that results in extra thorough, reproducible findings.

Therefore, even in case you’re utilizing Dynamic Workflows with remoted subagents, your immediate nonetheless must be as particular as doable. Narrower prompts scale back that variance and push brokers towards the identical conclusions, which is precisely what you need when consistency and reliability matter.

6. Maintain the nice one after run

A helpful saved workflow ought to really feel like mission automation, not like a transcript of 1 fortunate run. It must be clear sufficient that one other teammate can open it and rapidly perceive: who owns it, what inputs it expects, which instruments it’s allowed to make use of, what every sub-agent is chargeable for, and what stage of proof is required earlier than the workflow can name the duty performed.

If the workflow labored effectively and also you need to reuse it, press s within the workflow menu to put it aside to ~/.claude/workflows. You too can transfer the script right into a ability if the objective is to share the strategy along with your workforce and make it simpler to reuse throughout related duties.

However don’t save a workflow simply because the primary run succeeded. A profitable run solely proves that it labored as soon as. Put it aside when the orchestration itself is efficacious: when the script is simpler to examine, reuse, and enhance than writing a traditional Claude Code immediate once more from scratch.

Under are some strategies for prompts on your reference. Add your particulars once you need to use considered one of them:

Stress-test a plan: “Take the plan beneath and run a workflow the place separate brokers tear it aside — a skeptical investor, a hard-to-please buyer, an incumbent competitor — every impartial. Then synthesize the three sharpest objections and the strongest reply to every.”

Audit a repo: “Run a workflow to audit this repository. Fan out brokers for logic bugs, unsafe routes, weak auth, lacking authorization, uncovered secrets and techniques, dangerous dependencies, and knowledge leaks. For every discovering, spawn a separate agent to adversarially confirm it — attempt to show it’s not actual. Synthesize a severity-ranked report with file paths and fixes. use 200k tokens.”

Make it low cost: “Construct it so the finder brokers run on mannequin: 'haiku' whereas the orchestrator stays on Opus 4.8 and does the ultimate synthesis. Report tokens and wall-clock time.”

Reproduce a flaky take a look at: “This take a look at fails perhaps 1 in 50 runs. Arrange a workflow to breed it — kind theories and adversarially take a look at them in worktrees. /objective don’t cease till one principle works.”

Confirm a draft: “Undergo this draft and use a workflow to confirm each technical declare in opposition to the codebase and sources. I don’t need to ship something improper.”

Rank by actual precedence (event): “I’ve an inventory of findings/choices. Use a workflow to rank them by [real exploitability / impact / whatever matters] — however as an alternative of scoring every one, run a pairwise event and rank by who wins. Then present me the highest three and why.”

Root-cause a heisenbug: “This bug is intermittent and the apparent trigger seems improper. Use a workflow: break up the investigation by proof — one agent on the signs, one on the code, one on the info/logs — then have separate brokers attempt to refute every principle, and synthesize the trigger that survives.”

Triage a backlog safely: “Use a workflow to triage this backlog: classify every merchandise (fix-now / escalate / needs-a-decision), dedupe into households, and route. Something that reads untrusted enter should be read-only — hold it separate from no matter proposes modifications.”

Route by process form: “Use a workflow with a classifier that appears at every process and routes it to the most cost effective succesful mannequin — small fashions for mechanical work, Opus for the ambiguous, security-critical reasoning — then runs every on its chosen mannequin.”

Test home guidelines: “Use a workflow to examine this code in opposition to our guidelines in CLAUDE.md — one verifier per rule, plus a skeptic that hunts for false positives. I care extra about not crying wolf than about catching each nit.”

Sources

Thariq Shihipar & Sid Bidasaria (Anthropic), “A harness for each process: dynamic workflows in Claude Code” — the why, the patterns, the prompting suggestions, save/share.
Manufacturing facility.ai, “The Context Window Drawback: Scaling Brokers Past Token Limits”.
Engineering at Anthropic, “Efficient context engineering for AI brokers”.
Chroma Technical Report, “Context Rot: How Rising Enter Tokens Impacts LLM Efficiency”
Anthropic, “Constructing efficient brokers” — background on the underlying orchestration patterns.
Anthropic, “Introducing dynamic workflows in Claude Code.”

A Harness for Each Activity: Placing a Group of Claudes on One Job

Extract Information with On-demand and Batch Pipelines Dynamically

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Context Engineering — A Complete Fingers-On Tutorial with DSPy

About Us

Category

Recent Posts