Stateful MCP consumer capabilities on Amazon Bedrock AgentCore Runtime now allow interactive, multi-turn agent workflows that have been beforehand unattainable with stateless implementations. Builders constructing AI brokers typically wrestle when their workflows should pause mid-execution to ask customers for clarification, request giant language mannequin (LLM)-generated content material, or present real-time progress updates throughout long-running operations, stateless MCP servers can’t deal with these eventualities. This solves these limitations by introducing three consumer capabilities from the MCP specification:
- Elicitation (request person enter mid-execution)
- Sampling (request LLM-generated content material from the consumer)
- Progress notification (stream real-time updates)
These capabilities remodel one-way software execution into bidirectional conversations between your MCP server and purchasers.
Mannequin Context Protocol (MCP) is an open commonplace defining how LLM functions join with exterior instruments and information sources. The specification defines server capabilities (instruments, prompts, and assets that servers expose) and consumer capabilities (options purchasers supply again to servers). Whereas our earlier launch targeted on internet hosting stateless MCP servers on AgentCore Runtime, this new functionality completes the bidirectional protocol implementation. Shoppers connecting to AgentCore-hosted MCP servers can now reply to server-initiated requests. On this put up, you’ll learn to construct stateful MCP servers that request person enter throughout execution, invoke LLM sampling for dynamic content material era, and stream progress updates for long-running duties. You will notice code examples for every functionality and deploy a working stateful MCP server to Amazon Bedrock AgentCore Runtime.
From stateless to stateful MCP
The unique MCP server assist on AgentCore used stateless mode: every incoming HTTP request was impartial, with no shared context between calls. This mannequin is simple to deploy and cause about, and it really works properly for software servers that obtain inputs and return outputs. Nevertheless, it has a basic constraint. The server can’t keep a dialog thread throughout requests, ask the person for clarification in the midst of a software name, or report progress again to the consumer as work occurs.
Stateful mode removes that constraint. If you run your MCP server with stateless_http=False, AgentCore Runtime provisions a devoted microVM for every person session. The microVM persists for the session’s lifetime (as much as 8 hours, or quarter-hour of inactivity per idleRuntimeSessionTimeout setting), with CPU, reminiscence, and filesystem isolation between classes. The protocol maintains continuity by way of a Mcp-Session-Id header: the server returns this identifier in the course of the initialize handshake, and the consumer consists of it in each subsequent request to route again to the identical session.
The next desk summarizes the important thing variations:
| Stateless mode | Stateful mode | |
| stateless_httpsetting | TRUE | FALSE |
| Session isolation | Devoted microVM per session | Devoted microVM per session |
| Session lifetime | As much as 8 hours; 15-min idle timeout | As much as 8 hours; 15-min idle timeout |
| Shopper capabilities | Not supported | Elicitation, sampling, progress notifications |
| Advisable for | Easy software serving | Interactive, multi-turn workflows |
When a session expires or the server is restarted, subsequent requests with the early session ID return a 404. At that time, purchasers should re-initialize the connection to acquire a brand new session ID and begin a recent session.The configuration change to allow stateful mode is a single flag in your server startup:
mcp.run( transport="streamable-http", host="0.0.0.0", port=8000, stateless_http=False # Allow stateful mode)
Past this flag, the three consumer capabilities develop into out there robotically as soon as the MCP consumer declares assist for them in the course of the initialization handshake.
The three new consumer capabilities
Stateful mode brings three consumer capabilities from the MCP specification. Every addresses a special interplay sample that brokers encounter in manufacturing workflows.
Elicitation permits a server to pause execution and request structured enter from the person by way of the consumer. The software can ask focused questions on the proper second in its workflow, gathering a choice, confirming a call, or gathering a worth that is dependent upon earlier outcomes. The server sends an elicitation/create request with a message and an elective JSON schema describing the anticipated response construction. The consumer renders an acceptable enter interface, and the person can settle for (offering the information), decline, or cancel.
Sampling permits a server to request an LLM-generated completion from the consumer by way of sampling/createMessage. That is the mechanism that makes it potential for software logic on the server to make use of language mannequin capabilities with out holding its personal mannequin credentials. The server offers a immediate and elective mannequin preferences; the consumer forwards the request to its related LLM and returns the generated response. Sensible makes use of embrace producing personalised summaries, creating natural-language explanations of structured information, or producing suggestions primarily based on earlier dialog context.
Progress notifications permit a server to report incremental progress throughout long-running operations. Utilizing ctx.report_progress(progress, whole), the server emits updates that purchasers can show as a progress bar or standing indicator. For operations that span a number of steps, for instance, looking throughout information sources, this retains customers knowledgeable fairly than watching a clean display screen.
All three capabilities are opt-in on the consumer degree: a consumer declares which capabilities it helps throughout initialization, and the server should solely use capabilities the consumer has marketed.
Elicitation: server-initiated person enter
Elicitation is the mechanism by which an MCP server pauses mid-execution and asks the consumer to gather particular info from the person. The server sends an elicitation/create JSON-RPC request containing a human-readable message and a requestedSchema that describes the anticipated response. The consumer presents this as a type or immediate, and the person’s response (or express decline) is returned to the server so execution can proceed.The MCP specification helps two elicitation modes:
- Kind mode: structured information assortment instantly by way of the MCP consumer. Appropriate for preferences, configuration inputs, and confirmations that don’t contain delicate information.
- URL mode: directs the person to an exterior URL for interactions that should not go by way of the MCP consumer, comparable to OAuth flows, cost processing, or credential entry.
The response makes use of a three-action mannequin: settle for (person offered information), decline (person explicitly rejected the request), or cancel (person dismissed with out selecting). Servers ought to deal with every case appropriately. The next instance implements an add_expense_interactive software that collects a brand new expense by way of 4 sequential elicitation steps: quantity, description, class, and a last affirmation earlier than writing to DynamoDB. Every step defines its anticipated enter as a Pydantic mannequin, which FastMCP converts to the JSON Schema despatched within the elicitation/create request.
Server
The add_expense_interactive software walks a person by way of 4 sequential questions earlier than writing to Amazon DynamoDB. Every step defines its anticipated enter as a separate Pydantic mannequin, as a result of the shape mode schema should be a flat object. You possibly can accumulate all 4 fields in a single mannequin with 4 properties however splitting them right here offers the person one targeted query at a time, which is the interactive sample elicitation is designed for.
brokers/mcp_client_features.py
import os
from pydantic import BaseModel
from fastmcp import FastMCP, Context
from fastmcp.server.elicitation import AcceptedElicitation
from dynamo_utils import FinanceDB
mcp = FastMCP(title="ElicitationMCP")
_region = os.environ.get('AWS_REGION') or os.environ.get('AWS_DEFAULT_REGION') or 'us-east-1'
db = FinanceDB(region_name=_region)
class AmountInput(BaseModel):
quantity: float
class DescriptionInput(BaseModel):
description: str
class CategoryInput(BaseModel):
class: str # one in all: meals, transport, payments, leisure, different
class ConfirmInput(BaseModel):
verify: str # Sure or No
@mcp.software()
async def add_expense_interactive(user_alias: str, ctx: Context) -> str:
"""Interactively add a brand new expense utilizing elicitation.
Args:
user_alias: Consumer identifier
"""
# Step 1: Ask for the quantity
consequence = await ctx.elicit('How a lot did you spend?', AmountInput)
if not isinstance(consequence, AcceptedElicitation):
return 'Expense entry cancelled.'
quantity = consequence.information.quantity
# Step 2: Ask for an outline
consequence = await ctx.elicit('What was it for?', DescriptionInput)
if not isinstance(consequence, AcceptedElicitation):
return 'Expense entry cancelled.'
description = consequence.information.description
# Step 3: Choose a class
consequence = await ctx.elicit(
'Choose a class (meals, transport, payments, leisure, different):',
CategoryInput
)
if not isinstance(consequence, AcceptedElicitation):
return 'Expense entry cancelled.'
class = consequence.information.class
# Step 4: Affirm earlier than saving
confirm_msg = (
f'Affirm: add expense of ${quantity:.2f} for {description}'
f' (class: {class})? Reply Sure or No'
)
consequence = await ctx.elicit(confirm_msg, ConfirmInput)
if not isinstance(consequence, AcceptedElicitation) or consequence.information.verify != 'Sure':
return 'Expense entry cancelled.'
return db.add_transaction(user_alias, 'expense', -abs(quantity), description, class)
if __name__ == '__main__':
mcp.run(
transport="streamable-http",
host="0.0.0.0",
port=8000,
stateless_http=False
)
Every await ctx.elicit() suspends the software and sends an elicitation/create request over the lively session. The isinstance(consequence, AcceptedElicitation) test handles decline and cancel uniformly at each step.
Shopper
Registering an elicitation_handler on fastmcp.Shopper is each how the handler is wired in and the way the consumer advertises elicitation assist to the server throughout initialization.
import asyncio
from fastmcp import Shopper
from fastmcp.consumer.transports import StreamableHttpTransport
# Pre-loaded responses simulate the person answering every query in sequence
_responses = iter([
{'amount': 45.50},
{'description': 'Lunch at the office'},
{'category': 'food'},
{'confirm': 'Yes'},
])
async def elicit_handler(message, response_type, params, context):
# In manufacturing: render a type and return the person's enter
response = subsequent(_responses)
print(f' Server asks: {message}')
print(f' Responding: {response}n')
return response
transport = StreamableHttpTransport(url=mcp_url, headers=headers)
async with Shopper(transport, elicitation_handler=elicit_handler) as consumer:
await asyncio.sleep(2) # permit session initialization
consequence = await consumer.call_tool('add_expense_interactive', {'user_alias': 'me'})
print(consequence.content material[0].textual content)
Operating this towards the deployed server:
The entire working instance, together with DynamoDB setup and AgentCore deployment, is offered within the GitHub pattern repository.
Use elicitation when your software wants info that is dependent upon earlier outcomes, is healthier collected interactively than upfront, or varies throughout customers in methods that can not be parameterized upfront. A journey reserving software that first searches locations after which asks the person to decide on amongst them is a pure match. A monetary workflow that confirms a transaction quantity earlier than submitting is one other. Elicitation isn’t acceptable for delicate inputs like passwords or API keys, use URL mode or a safe out-of-band channel for these.
Sampling: server-initiated LLM era
Sampling is the mechanism by which an MCP server requests an LLM completion from the consumer. The server sends a sampling/createMessage request containing an inventory of dialog messages, a system immediate, and elective mannequin preferences. The consumer forwards the request to its related language mannequin (topic to person approval) and returns the generated response. The server receives a structured consequence containing the generated textual content, the mannequin used, and the cease cause.
This functionality inverts the standard move: as a substitute of the consumer asking the server for software outcomes, the server asks the consumer for mannequin output. The profit is that the server doesn’t want API keys or a direct mannequin integration. The consumer retains full management over which mannequin is used, and the MCP specification requires a human-in-the-loop step the place customers can evaluate and approve sampling requests earlier than they’re forwarded.
Servers can categorical mannequin preferences utilizing functionality priorities (costPriority, speedPriority, intelligencePriority) and elective mannequin hints. These are advisory, the consumer makes the ultimate choice primarily based on what fashions it has entry to.
Server
The analyze_spending software fetches transactions from DynamoDB, builds a immediate from the structured information, and delegates the evaluation to the consumer’s LLM by way of ctx.pattern().
brokers/mcp_client_features.py (added software, identical file as elicitation)
@mcp.software()
async def analyze_spending(user_alias: str, ctx: Context) -> str:
"""Fetch bills from DynamoDB and ask the consumer's LLM to analyse them.
Args:
user_alias: Consumer identifier
"""
transactions = db.get_transactions(user_alias)
if not transactions:
return f'No transactions discovered for {user_alias}.'
strains="n".be a part of(
f"- {t['description']} (${abs(float(t['amount'])):.2f}, {t['category']})"
for t in transactions
)
immediate = (
f'Listed below are the latest bills for a person:n{strains}nn'
f'Please analyse the spending patterns and provides 3 concise, '
f'actionable suggestions to enhance their funds. '
f'Hold the response below 120 phrases.'
)
ai_analysis="Evaluation unavailable."
strive:
response = await ctx.pattern(messages=immediate, max_tokens=300)
if hasattr(response, 'textual content') and response.textual content:
ai_analysis = response.textual content
besides Exception:
go
return f'Spending Evaluation for {user_alias}:nn{ai_analysis}'
The software calls await ctx.pattern() and suspends. The server sends a sampling/createMessage request to the consumer over the open session. When the consumer returns the LLM response, execution resumes.
Shopper
The sampling_handler receives the immediate from the server and forwards it to a language mannequin. On this instance, that’s Claude Haiku on Amazon. Registering the handler can be how the consumer declares sampling assist to the server throughout initialization.
import json
import asyncio
import boto3
from mcp.sorts import CreateMessageResult, TextContent
from fastmcp import Shopper
from fastmcp.consumer.transports import StreamableHttpTransport
MODEL_ID = 'us.anthropic.claude-haiku-4-5-20251001-v1:0'
bedrock = boto3.consumer('bedrock-runtime', region_name=area)
def _invoke_bedrock(immediate: str, max_tokens: int) -> str:
physique = json.dumps({
'anthropic_version': 'bedrock-2023-05-31',
'max_tokens': max_tokens,
'messages': [{'role': 'user', 'content': prompt}]
})
resp = bedrock.invoke_model(modelId=MODEL_ID, physique=physique)
return json.hundreds(resp['body'].learn())['content'][0]['text']
async def sampling_handler(messages, params, ctx):
"""Known as by fastmcp.Shopper when the server points ctx.pattern()."""
immediate = messages if isinstance(messages, str) else ' '.be a part of(
m.content material.textual content for m in messages if hasattr(m.content material, 'textual content')
)
max_tokens = params.maxTokens if params and hasattr(params, 'maxTokens') and params.maxTokens else 300
textual content = await asyncio.to_thread(_invoke_bedrock, immediate, max_tokens)
return CreateMessageResult(
position="assistant",
content material=TextContent(sort="textual content", textual content=textual content),
mannequin=MODEL_ID,
stopReason='endTurn'
)
transport = StreamableHttpTransport(url=mcp_url, headers=headers)
async with Shopper(transport, sampling_handler=sampling_handler) as consumer:
consequence = await consumer.call_tool('analyze_spending', {'user_alias': 'me'})
print(consequence.content material[0].textual content)
Operating this towards a person with 4 seeded bills:
Use sampling when your software should produce natural-language output that advantages from a language mannequin’s capabilities. A software that has collected a person’s journey preferences and desires to generate a tailor-made journey itinerary narrative is an efficient instance. Sampling isn’t acceptable for deterministic operations like database queries, calculations, or API calls with well-defined outputs. We advocate that you simply use software logic for these.
Progress notifications: real-time operation suggestions
Progress notifications are occasions {that a} server sends throughout long-running operations to maintain the consumer and the person knowledgeable about how a lot work has been accomplished. await ctx.report_progress(progress, whole) emits a notifications/progress message and returns instantly. The server doesn’t look ahead to a response, it’s fire-and-forget in each instructions. The consumer receives the notification asynchronously and might render a progress bar, log a standing line, or use it to stop the person from assuming the connection has stalled. The sample is to name report_progress at every logical step of a multi-stage operation, with progress incrementing towards whole.
Server
The generate_report software builds a month-to-month monetary report in 5 steps, emitting a progress notification at the beginning of every one.
brokers/mcp_progress_server.py
import os
from fastmcp import FastMCP, Context
from dynamo_utils import FinanceDB
mcp = FastMCP(title="Progress-MCP-Server")
_region = os.environ.get('AWS_REGION') or os.environ.get('AWS_DEFAULT_REGION') or 'us-east-1'
db = FinanceDB(region_name=_region)
@mcp.software()
async def generate_report(user_alias: str, ctx: Context) -> str:
"""Generate a month-to-month monetary report, streaming progress at every stage.
Args:
user_alias: Consumer identifier
"""
whole = 5
# Step 1: Fetch transactions
await ctx.report_progress(progress=1, whole=whole)
transactions = db.get_transactions(user_alias)
# Step 2: Group by class
await ctx.report_progress(progress=2, whole=whole)
by_category = {}
for t in transactions:
cat = t['category']
by_category[cat] = by_category.get(cat, 0) + abs(float(t['amount']))
# Step 3: Fetch budgets
await ctx.report_progress(progress=3, whole=whole)
budgets = {b['category']: float(b['monthly_limit']) for b in db.get_budgets(user_alias)}
# Step 4: Examine spending vs budgets
await ctx.report_progress(progress=4, whole=whole)
strains = []
for cat, spent in sorted(by_category.objects(), key=lambda x: -x[1]):
restrict = budgets.get(cat)
if restrict:
pct = (spent / restrict) * 100
standing="OVER" if spent > restrict else 'OK'
strains.append(f' {cat:<15} ${spent:>8.2f} / ${restrict:.2f} [{pct:.0f}%] {standing}')
else:
strains.append(f' {cat:<15} ${spent:>8.2f} (no price range set)')
# Step 5: Format and return
await ctx.report_progress(progress=5, whole=whole)
total_spent = sum(by_category.values())
return (
f'Month-to-month Report for {user_alias}n'
f'{"=" * 50}n'
f' {"Class":<15} {"Spent":>10} {"Funds":>8} Statusn'
f'{"-" * 50}n'
+ 'n'.be a part of(strains)
+ f'n{"-" * 50}n'
f' {"TOTAL":<15} ${total_spent:>8.2f}n'
)
if __name__ == '__main__':
mcp.run(
transport="streamable-http",
host="0.0.0.0",
port=8000,
stateless_http=False
)
Every await ctx.report_progress() is fire-and-forget: the notification is shipped and execution strikes instantly to the following step.
Shopper
The progress_handler receives progress, whole, and an elective message every time the server emits a notification. Registering the handler is how the consumer declares progress assist throughout initialization.
import logging
logging.getLogger('mcp.consumer.streamable_http').setLevel(logging.ERROR)
from fastmcp import Shopper
from fastmcp.consumer.transports import StreamableHttpTransport
async def progress_handler(progress: float, whole: float | None, message: str | None):
pct = int((progress / whole) * 100) if whole else 0
stuffed = pct // 5
bar="#" * stuffed + '-' * (20 - stuffed)
print(f'r Progress: [{bar}] {pct}% ({int(progress)}/{int(whole or 0)})',
finish='', flush=True)
if whole and progress >= whole:
print(' Carried out!')
transport = StreamableHttpTransport(url=mcp_url, headers=headers)
async with Shopper(transport, progress_handler=progress_handler) as consumer:
consequence = await consumer.call_tool('generate_report', {'user_alias': 'me'})
print(consequence.content material[0].textual content)
Because the server strikes by way of its 5 levels, the consumer renders the bar in place:
Use progress notifications for any software name that takes quite a lot of seconds and includes discrete, measurable steps. Operations like looking a number of information sources, operating a sequence of API calls, processing a batch of information, or operating a multi-step reserving workflow are all good candidates. A software that completes in below a second usually doesn’t want progress reporting; the overhead of emitting occasions shouldn’t be worthwhile for quick operations.
Conclusion
On this put up, you’ve been launched to stateful MCP consumer capabilities on Amazon Bedrock AgentCore Runtime. We defined the distinction between stateless and stateful MCP deployments, walked by way of elicitation, sampling, and progress notifications with code examples, and confirmed learn how to deploy a stateful MCP server into AgentCore Runtime. With these capabilities, you may construct MCP servers that interact customers in structured conversations, use the consumer’s LLM for content material era, and supply real-time visibility into long-running operations, all hosted on managed, remoted infrastructure powered by AgentCore Runtime.We encourage you to discover the next assets to get began:
In regards to the Authors

