Amazon Bedrock AgentCore Observability with Langfuse

The rise of synthetic intelligence (AI) brokers marks a change in software program growth and the way purposes make selections and work together with customers. Whereas conventional programs observe predictable paths, AI brokers have interaction in complicated reasoning that is still hidden from view. This invisibility creates a problem for organizations: how can they belief what they’ll’t see? That is the place agent observability enters the image, providing deep insights into how agentic purposes carry out, work together, and execute duties.

On this put up, we clarify easy methods to combine Langfuse observability with Amazon Bedrock AgentCore to achieve deep visibility into an AI agent’s efficiency, debug points sooner, and optimize prices. We stroll by means of a whole implementation utilizing Strands brokers deployed on AgentCore Runtime adopted by step-by-step code examples.

Amazon Bedrock AgentCore is a complete agentic platform that may deploy and function extremely succesful AI brokers securely, at scale. It provides purpose-built infrastructure for dynamic agent workloads, highly effective instruments to reinforce brokers, and important controls for real-world deployment. AgentCore is comprised of absolutely managed companies that can be utilized collectively or independently. These companies work with any framework together with CrewAI, LangGraph, LlamaIndex, and Strands Brokers, and any basis mannequin in or exterior of Amazon Bedrock, providing flexibility and reliability. AgentCore emits telemetry knowledge in standardized OpenTelemetry (OTEL)-compatible format, enabling straightforward integration with an present monitoring and observability stack. It provides detailed visualizations of every step within the agent workflow, enabling inspection of an agent’s execution path, audit intermediate outputs, and debugging efficiency bottlenecks and failures.

How Langfuse tracing works

Langfuse makes use of OpenTelemetry to hint and monitor brokers deployed on Amazon Bedrock AgentCore. OpenTelemetry is a Cloud Native Computing Basis (CNCF) mission that gives a set of specs, APIs, and libraries that outline an ordinary strategy to accumulate distributed traces and metrics from an utility. Customers can now observe efficiency metrics together with token utilization, latency, and execution durations throughout totally different processing phases. The system creates hierarchical hint buildings that seize each streaming and non-streaming responses, with detailed operation attributes and error states.

Via the /api/public/otel endpoint, Langfuse features as an OpenTelemetry Backend, mapping traces to its knowledge mannequin utilizing generative AI conventions. That is notably invaluable for complicated giant language mannequin (LLM) purposes using chains and brokers with instruments, the place nested traces assist builders rapidly establish and resolve points. The combination helps systematic debugging, efficiency monitoring, and audit path upkeep, making it simpler for groups to construct and keep dependable AI purposes on Amazon Bedrock AgentCore.

Along with Agent observability, Langfuse provides a collection of built-in instruments overlaying the complete LLM utility growth lifecycle. This contains operating automated llm-as-a-judge evaluators (on-line/offline), organizing knowledge labeling for root trigger evaluation and evaluator alignment, observe experiments (native and in CI), iterate in prompts interactively in a playground, and model management them in UI utilizing immediate administration.

Resolution overview

This put up exhibits easy methods to deploy a Strands agent on Amazon Bedrock AgentCore Runtime with Langfuse observability. The implementation makes use of Anthropic Claude fashions by means of Amazon Bedrock. Telemetry knowledge flows from the Strands agent by means of OTEL exporters to Langfuse for monitoring and debugging. To make use of Langfuse, set disable_otel=True within the AgentCore runtime deployment. This turns off AgentCore’s default observability.

Determine 1: Structure overview

Key elements used within the answer are:

Strands Brokers: Python framework for constructing LLM-powered brokers with built-in telemetry help
Amazon Bedrock AgentCore Runtime: Managed runtime service for internet hosting and scaling brokers on Amazon Net Companies (AWS)
Langfuse: Open-source observability and analysis platform for LLM purposes that receives traces through OTEL
OpenTelemetry: Business-standard protocol for gathering and exporting telemetry knowledge

Technical implementation information

Now that now we have lined how Langfuse tracing works, we are able to stroll by means of easy methods to implement it with Amazon Bedrock AgentCore.

Conditions

An AWS account
- Earlier than utilizing Amazon Bedrock, verify all AWS credentials are configured appropriately. They are often arrange utilizing the AWS CLI or by setting setting variables. For this pocket book we assume that the credentials are already configured.
Amazon Bedrock Mannequin Entry for Anthropic Claude 3.7 in us-west-2 area
Amazon Bedrock AgentCore permissions
Python 3.10+
Docker put in regionally
A Langfuse account, which is required to create a Langfuse API Key.
- Customers have to register at Langfuse cloud, create a mission, and get API keys
- Alternatively, you may self-host Langfuse inside your individual AWS account utilizing the Terraform module.

Walkthrough

The next steps stroll by means of easy methods to use Langfuse for gathering traces from brokers created utilizing Strands SDK in AgentCore runtime. Customers may also discuss with this pocket book on Github to get began with it instantly.

Clone this Github repo:

git clone https://github.com/awslabs/amazon-bedrock-agentcore-samples.git

As soon as the repo is cloned, go to the Amazon Bedrock AgentCore Samples listing, discover the pocket book runtime_with_strands_and_langfuse.ipynb and begin operating every cell.

Step 1: Python dependencies and necessities packages for our Strands agent

Execute the under cell to put in the dependencies that are outlined in the necessities.txt file.

!pip set up --force-reinstall -U -r necessities.txt –quiet

Step 2: Agent implementation

The agent file (strands_claude.py) implements a journey agent with net search capabilities.

%%writefile strands_claude.py
import os
import logging
from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands import Agent, instrument
from strands.fashions import BedrockModel
from strands.telemetry import StrandsTelemetry
from ddgs import DDGS
logging.basicConfig(stage=logging.ERROR, format="[%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)
logger.setLevel(os.getenv("AGENT_RUNTIME_LOG_LEVEL", "INFO").higher())
@instrument
def web_search(question: str) -> str:
"""
Search the net for info utilizing DuckDuckGo.
Args:
question: The search question
Returns:
A string containing the search outcomes
"""
attempt:
ddgs = DDGS()
outcomes = ddgs.textual content(question, max_results=5)
formatted_results = []
for i, end in enumerate(outcomes, 1):
formatted_results.append(
f"{i}. {consequence.get('title', 'No title')}n"
f" {consequence.get('physique', 'No abstract')}n"
f" Supply: {consequence.get('href', 'No URL')}n"
)
return "n".be a part of(formatted_results) if formatted_results else "No outcomes discovered."
besides Exception as e:
return f"Error looking the net: {str(e)}"
# Perform to initialize Bedrock mannequin
def get_bedrock_model():
area = os.getenv("AWS_DEFAULT_REGION", "us-west-2")
model_id = os.getenv("BEDROCK_MODEL_ID", "us.anthropic.claude-3-7-sonnet-20250219-v1:0")
bedrock_model = BedrockModel(
model_id=model_id,
region_name=area,
temperature=0.0,
max_tokens=1024
)
return bedrock_model
# Initialize the Bedrock mannequin
bedrock_model = get_bedrock_model()
# Outline the agent's system immediate
system_prompt = """You're an skilled journey agent specializing in personalised journey suggestions
with entry to real-time net info. Your position is to seek out dream locations matching person preferences
utilizing net seek for present info. It's best to present complete suggestions with present
info, transient descriptions, and sensible journey particulars."""
app = BedrockAgentCoreApp()
def initialize_agent():
"""Initialize the agent with correct telemetry configuration."""
# Initialize Strands telemetry with 3P configuration
strands_telemetry = StrandsTelemetry()
strands_telemetry.setup_otlp_exporter()

# Create and cache the agent
agent = Agent(
mannequin=bedrock_model,
system_prompt=system_prompt,
instruments=[web_search]
)

return agent
@app.entrypoint
def strands_agent_bedrock(payload, context=None):
"""
Invoke the agent with a payload
"""
user_input = payload.get("immediate")
logger.data("[%s] Consumer enter: %s", context.session_id, user_input)

# Initialize agent with correct configuration
agent = initialize_agent()

response = agent(user_input)
return response.message['content'][0]['text']
if __name__ == "__main__":
app.run()

Step 3: Configure AgentCore Runtime deployment

Subsequent, use our starter toolkit to configure the AgentCore Runtime deployment with an entry level, the execution position we created, and a necessities file. Moreover, configure the starter package to auto create the Amazon Elastic Container Registry (ECR) repository on launch.

Throughout the configure step, the docker file is generated primarily based on the applying code. When utilizing the bedrock_agentcore_starter_toolkit to configure the agent, it configures AgentCore Observability by default. Subsequently, to make use of Langfuse, customers ought to disable OTEL by setting the configuration flag as “True” as proven within the following code block.

Determine 2: Configure AgentCore Runtime

from bedrock_agentcore_starter_toolkit import Runtime
from boto3.session import Session
boto_session = Session()
area = boto_session.region_name
agentcore_runtime = Runtime()
agent_name = "strands_langfuse_observability"
response = agentcore_runtime.configure(
entrypoint="strands_claude.py",
auto_create_execution_role=True,
auto_create_ecr=True,
requirements_file="necessities.txt",
area=area,
agent_name=agent_name,
disable_otel=True,
)
response

Step 4: Deploy to AgentCore Runtime

Now {that a} docker file has been generated, launch the agent to the AgentCore Runtime to create the Amazon ECR repository and the AgentCore Runtime.

Now configure the Langfuse secret key, public key and OTEL endpoints in AWS Programs Supervisor Parameter Retailer, which supplies safe, hierarchical storage for configuration knowledge administration and secrets and techniques administration.

import base64
# Langfuse configuration
otel_endpoint = "https://us.cloud.langfuse.com/api/public/otel"
langfuse_secret_key = "" #For manufacturing key must be securely saved
langfuse_public_key = "

The next desk describes the assorted configuration parameters getting used.

Parameter	Description	Default
`langfuse_public_key`	API key for OTEL endpoint	Atmosphere variable
`langfuse_secret_key`	Secret key for OTEL endpoint	Atmosphere variable
`OTEL_EXPORTER_OTLP_ENDPOINT`	Hint endpoint	https://cloud.langfuse.com/api/public/otel/v1/traces
`OTEL_EXPORTER_OTLP_HEADERS`	Authentication sort	Fundamental
`DISABLE_ADOT_OBSERVABILITY`	AWS Distro for Open Telemetry (ADOT). The implementation disables Agent Core’s default observability to make use of Langfuse as a substitute.	True
`BEDROCK_MODEL_ID`	AWS Bedrock Mannequin ID	us. anthropic.claude-3-7-sonnet-20250219-v1:0

Step 5: Verify deployment standing

Look ahead to the runtime to be prepared earlier than invoking:

import time
status_response = agentcore_runtime.standing()
standing = status_response.endpoint['status']
end_status = ['READY', 'CREATE_FAILED', 'DELETE_FAILED', 'UPDATE_FAILED']
whereas standing not in end_status:
time.sleep(10)
status_response = agentcore_runtime.standing()
standing = status_response.endpoint['status']
print(standing)
standing

A profitable deployment exhibits a “Prepared” state for the agent runtime.

Step 6: Invoking AgentCore Runtime

Lastly, invoke our AgentCore Runtime with a payload.

invoke_response = agentcore_runtime.invoke({"immediate": "I am planning a weekend journey to london. What are the must-visit locations and native meals I ought to attempt?"})

As soon as the AgentCore Runtime has been invoked, customers ought to be capable of see the Langfuse traces within the Langfuse dashboard.

Step 7: View traces in Langfuse

After operating the agent, go to the Langfuse mission to view the detailed traces. The traces embody:

Agent invocation particulars
Device calls (net search)
Mannequin interactions with latency and token utilization
Request/response payloads

Traces and hierarchy

Langfuse captures all interactions from person requests to particular person mannequin calls. Every hint captures the entire execution path, together with API calls, operate invocations, and mannequin responses, making a complete timeline of agent actions. The nested construction of traces allows builders to drill down into particular interactions and establish efficiency bottlenecks or error patterns at any stage of the execution chain. To additional improve observability capabilities, Langfuse supplies tagging mechanisms that may be applied in agent workflows.

Determine 3: Traces in Langfuse

Combining hierarchical traces with strategic tagging supplies insights into agent operations, enabling data-driven optimization and superior person experiences. As proven within the following picture, builders can drill down into the exact timing of every operation inside their agent’s execution stream. Within the earlier instance, the entire request took 26.57s, with particular person breakdowns for occasion loop cycle, instrument calls, and different elements. Use this timing info to seek out efficiency bottlenecks and scale back response instances. As an illustration, sure LLM operations would possibly take longer than anticipated, or there could also be alternatives to parallelize particular actions to scale back general latency. By leveraging these insights, customers could make data-driven selections to reinforce agent’s efficiency and ship a greater buyer expertise.

Determine 4: Detailed hint hierarchy

Langfuse dashboard

The Langfuse dashboard options three totally different dashboards for monitoring similar to Value, Latency and Utilization Administration.

Determine 5: Langfuse dashboard

Value monitoring

Value monitoring helps observe bills at each the combination and particular person request ranges to take care of management over AI infrastructure bills. The platform supplies detailed price breakdowns per mannequin, person, and performance name, enabling groups to establish cost-intensive operations and optimize their implementation. This granular price visibility helps in making data-driven selections about mannequin choice, immediate engineering, and useful resource allocation whereas sustaining funds constraints. Dashboard price knowledge is supplied for estimation functions; precise expenses must be verified by means of official billing statements.

Determine 6: Value dashboard

Langfuse latency dashboard

Latency metrics could be monitored throughout traces and generations for efficiency optimization. The dashboard exhibits the next metrics by default and you’ll create customized charts and dashboard relying in your wants:

P 95 Latency by Stage (Observations)
P 95 Latency by Use Case
Max Latency by Consumer Id (Traces)
Avg Time To First Token by Immediate Identify (Observations)
P 95 Time To First Token by Mannequin
P 95 Latency by Mannequin
Avg Output Tokens Per Second by Mannequin

Determine 7: Latency dashboard

Langfuse utilization administration

This dashboard exhibits metrics throughout traces, observations, and scores to handle useful resource allocation.

Determine 8: Utilization administration dashboard

Conclusion

This put up demonstrated easy methods to combine Langfuse with AgentCore for complete observability of AI brokers. Customers can now observe efficiency, debug interactions, and optimize prices throughout workflows. We count on extra Langfuse observability options and integration choices sooner or later to assist scale AI purposes.

Begin implementing Langfuse with AgentCore immediately to achieve deeper insights into brokers’ efficiency, observe dialog flows, and optimize AI purposes. For extra info, go to the next assets:

In regards to the authors

Richa Gupta is a Senior Options Architect at Amazon Net Companies, specializing in AI/ML, Generative AI, and Agentic AI. She is enthusiastic about serving to clients on their AI transformation journey, architecting end-to-end options from proof-of-concept to manufacturing deployment and drive enterprise income. Past her skilled pursuits, Richa likes to make latte arts and is an journey fanatic.

Ishan Singh is a Sr. Generative AI Information Scientist at Amazon Net Companies, the place he companions with clients to architect progressive and accountable generative AI options. With deep experience in AI and machine studying, Ishan leads the event of manufacturing Generative AI options at scale, with a deal with evaluations and observability. Exterior of labor, he enjoys taking part in volleyball, exploring native bike trails, and spending time together with his spouse, child, and canine, Beau.

Yanyan Zhang is a Senior Generative AI Information Scientist at Amazon Net Companies, the place she has been engaged on cutting-edge AI/ML applied sciences as a Generative AI Specialist, serving to clients use generative AI to realize their desired outcomes. Yanyan graduated from Texas A&M College with a PhD in Electrical Engineering. Exterior of labor, she loves touring, understanding, and exploring new issues.

Madhu Samhitha is a Specialist Resolution Architect at Amazon Net Companies, centered on serving to clients implement generative AI options. She combines her information of huge language fashions with strategic innovation to ship enterprise worth. She has a Grasp’s in Laptop Science from the College of Massachusetts, Amherst and has labored in varied industries. Past her technical position, Madhu is a educated classical dancer, an artwork fanatic, and enjoys exploring nationwide parks.

Marc Klingen is the co-founder and CEO of Langfuse, the Open Supply LLM Engineering Platform. After constructing LLM Brokers in 2023 collectively together with his co-founders, Marc and group realized that new tooling is important to convey brokers into manufacturing and scale them reliably. With Langfuse they’ve constructed the main Open Supply LLM Engineering Platform (Observability, Analysis, Immediate Administration) with over 18,000 GitHub stars, 14.8M+ SDK installs per 30 days, and 6M+ Docker pulls. Langfuse is utilized by high engineering groups similar to Khan Academy, Samsara, Twilio, and Merck.

Amazon Bedrock AgentCore Observability with Langfuse

Coaching a Tokenizer for Llama Mannequin

6 Technical Expertise That Make You a Senior Knowledge Scientist

6 Technical Expertise That Make You a Senior Knowledge Scientist

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

The Good-Sufficient Fact | In direction of Knowledge Science

About Us

Category

Recent Posts