Speed up Enterprise AI Growth utilizing Weights & Biases and Amazon Bedrock AgentCore

This publish is co-written by Thomas Capelle and Ray Strickland from Weights & Biases (W&B).

Generative synthetic intelligence (AI) adoption is accelerating throughout enterprises, evolving from easy basis mannequin interactions to stylish agentic workflows. As organizations transition from proof-of-concepts to manufacturing deployments, they require strong instruments for improvement, analysis, and monitoring of AI purposes at scale.

On this publish, we display find out how to use Basis Fashions (FMs) from Amazon Bedrock and the newly launched Amazon Bedrock AgentCore alongside W&B Weave to assist construct, consider, and monitor enterprise AI options. We cowl the whole improvement lifecycle from monitoring particular person FM calls to monitoring complicated agent workflows in manufacturing.

Overview of W&B Weave

Weights & Biases (W&B) is an AI developer system that gives complete instruments for coaching fashions, fine-tuning, and leveraging basis fashions for enterprises of all sizes throughout varied industries.

W&B Weave provides a unified suite of developer instruments to assist each stage of your agentic AI workflows. It permits:

Tracing & monitoring: Observe giant language mannequin (LLM) calls and software logic to debug and analyze manufacturing programs.
Systematic iteration: Refine and iterate on prompts, datasets and fashions.
Experimentation: Experiment with totally different fashions and prompts within the LLM Playground.
Analysis: Use customized or pre-built scorers alongside our comparability instruments to systematically assess and improve software efficiency. Accumulate consumer and knowledgeable suggestions for real-life testing and analysis.
Guardrails: Assist defend your software with safeguards for content material moderation, immediate security, and extra. Use customized or third-party guardrails (together with Amazon Bedrock Guardrails) or W&B Weave’s native guardrails.

W&B Weave could be totally managed by Weights & Biases in a multi-tenant or single-tenant setting or could be deployed in a buyer’s Amazon Digital Personal Cloud (VPC) immediately. As well as, W&B Weave’s integration into the W&B Growth Platform gives organizations a seamlessly built-in expertise between the mannequin coaching/fine-tuning workflow and the agentic AI workflow.

To get began, subscribe to the Weights & Biases AI Growth Platform by means of AWS Market. People and tutorial groups can subscribe to W&B at no extra price.

Monitoring Amazon Bedrock FMs with W&B Weave SDK

W&B Weave integrates seamlessly with Amazon Bedrock by means of Python and TypeScript SDKs. After putting in the library and patching your Bedrock consumer, W&B Weave robotically tracks the LLM calls:

!pip set up weave
import weave
import boto3
import json
from weave.integrations.bedrock.bedrock_sdk import patch_client

weave.init("my_bedrock_app")

# Create and patch the Bedrock consumer
consumer = boto3.consumer("bedrock-runtime")
patch_client(consumer)

# Use the consumer as traditional
response = consumer.invoke_model(
    modelId="anthropic.claude-3-5-sonnet-20240620-v1:0",
    physique=json.dumps({
        "anthropic_version": "bedrock-2023-05-31",
        "max_tokens": 100,
        "messages": [
            {"role": "user", "content": "What is the capital of France?"}
        ]
    }),
    contentType="software/json",
    settle for="software/json"
)
response_dict = json.hundreds(response.get('physique').learn())
print(response_dict["content"][0]["text"])

This integration robotically variations experiments and tracks configurations, offering full visibility into your Amazon Bedrock purposes with out modifying core logic.

Experimenting with Amazon Bedrock FMs in W&B Weave Playground

The W&B Weave Playground accelerates immediate engineering with an intuitive interface for testing and evaluating Bedrock fashions. Key options embrace:

Direct immediate enhancing and message retrying
Facet-by-side mannequin comparability
Entry from hint views for fast iteration

To start, add your AWS credentials within the Playground settings, choose your most well-liked Amazon Bedrock FMs, and begin experimenting. The interface permits fast iteration on prompts whereas sustaining full traceability of experiments.

Evaluating Amazon Bedrock FMs with W&B Weave Evaluations

W&B Weave Evaluations gives devoted instruments for evaluating generative AI fashions successfully. By leveraging W&B Weave Evaluations alongside Amazon Bedrock, customers can effectively consider these fashions, analyze outputs, and visualize efficiency throughout key metrics. Customers can use inbuilt scorers from W&B Weave, third get together or customized scorers, and human/knowledgeable suggestions as effectively. This mixture permits for a deeper understanding of the tradeoffs between fashions, corresponding to variations in price, accuracy, velocity, and output high quality.

W&B Weave has a first-class method to monitor evaluations with Mannequin & Analysis lessons. To arrange an analysis job, clients can:

Outline a dataset or listing of dictionaries with a set of examples to be evaluated
Create a listing of scoring capabilities. Every perform ought to have a model_output and optionally, different inputs out of your examples, and return a dictionary with the scores
Outline an Amazon Bedrock mannequin through the use of Mannequin class
Consider this mannequin by calling Analysis

Right here’s an instance of establishing an analysis job:

import weave
from weave import Analysis
import asyncio

# Accumulate your examples
examples = [
    {"question": "What is the capital of France?", "expected": "Paris"},
    {"question": "Who wrote 'To Kill a Mockingbird'?", "expected": "Harper Lee"},
    {"question": "What is the square root of 64?", "expected": "8"},
]

# Outline any customized scoring perform
@weave.op()
def match_score1(anticipated: str, output: dict) -> dict:
    # Right here is the place you'd outline the logic to attain the mannequin output
    return {'match': anticipated == model_output['generated_text']}

@weave.op()
def function_to_evaluate(query: str):
    # this is the place you'll add your LLM name and return the output
    return  {'generated_text': 'Paris'}

# Rating your examples utilizing scoring capabilities
analysis = Analysis(
    dataset=examples, scorers=[match_score1]
)

# Begin monitoring the analysis
weave.init('intro-example')
# Run the analysis
asyncio.run(analysis.consider(function_to_evaluate))

The analysis dashboard visualizes efficiency metrics, enabling knowledgeable choices about mannequin choice and configuration. For detailed steerage, see our earlier publish on evaluating LLM summarization with Amazon Bedrock and Weave.

Enhancing Amazon Bedrock AgentCore Observability with W&B Weave

Amazon Bedrock AgentCore is a whole set of providers for deploying and working extremely succesful brokers extra securely at enterprise scale. It gives safer runtime environments, workflow execution instruments, and operational controls that work with standard frameworks like Strands Brokers, CrewAI, LangGraph, and LlamaIndex, in addition to many LLM fashions – whether or not from Amazon Bedrock or exterior sources.

AgentCore contains built-in observability by means of Amazon CloudWatch dashboards that monitor key metrics like token utilization, latency, session period, and error charges. It additionally traces workflow steps, displaying which instruments have been invoked and the way the mannequin responded, offering important visibility for debugging and high quality assurance in manufacturing.

When working with AgentCore and W&B Weave collectively, groups can use AgentCore’s built-in operational monitoring and safety foundations whereas additionally utilizing W&B Weave if it aligns with their current improvement workflows. Organizations already invested within the W&B setting might select to include W&B Weave’s visualization instruments alongside AgentCore’s native capabilities. This method provides groups flexibility to make use of the observability answer that most closely fits their established processes and preferences when growing complicated brokers that chain a number of instruments and reasoning steps.

There are two important approaches so as to add W&B Weave observability to your AgentCore brokers: utilizing the native W&B Weave SDK or integrating by means of OpenTelemetry.

Native W&B Weave SDK

The only method is to make use of W&B Weave’s @weave.op decorator to robotically monitor perform calls. Initialize W&B Weave together with your challenge identify and wrap the capabilities you wish to monitor:

import weave
import os

os.environ["WANDB_API_KEY"] = "your_api_key"
weave.init("your_project_name")

@weave.op()
def word_count_op(textual content: str) -> int:
    return len(textual content.cut up())

@weave.op()
def run_agent(agent: Agent, user_message: str) -> Dict[str, Any]:
    consequence = agent(user_message)
    return {"message": consequence.message, "mannequin": agent.mannequin.config["model_id"]}

Since AgentCore runs as a docker container, add W&B weave to your dependencies (for instance, uv add weave) to incorporate it in your container picture.

OpenTelemetry Integration

For groups already utilizing OpenTelemetry or wanting vendor-neutral instrumentation, W&B Weave helps OTLP (OpenTelemetry Protocol) immediately:

from opentelemetry import hint
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

auth_b64 = base64.b64encode(f"api:{WANDB_API_KEY}".encode()).decode()
exporter = OTLPSpanExporter(
    endpoint="https://hint.wandb.ai/otel/v1/traces",
    headers={"Authorization": f"Fundamental {auth_b64}", "project_id": WEAVE_PROJECT}
)

# Create spans to trace execution
with tracer.start_as_current_span("invoke_agent") as span:
    span.set_attribute("enter.worth", json.dumps({"immediate": user_message}))
    consequence = agent(user_message)
    span.set_attribute("output.worth", json.dumps({"message": consequence.message}))

This method maintains compatibility with AgentCore’s current OpenTelemetry infrastructure whereas routing traces to W&B Weave for visualization.When utilizing each AgentCore and W&B Weave collectively, groups have a number of choices for observability. AgentCore’s CloudWatch integration screens system well being, useful resource utilization, and error charges whereas offering tracing for agent reasoning and gear choice. W&B Weave provides visualization capabilities that current execution information in codecs acquainted to groups already utilizing the W&B setting. Each options present visibility into how brokers course of data and make choices, permitting organizations to decide on the observability method that greatest aligns with their current workflows and preferences.This dual-layer method means customers can:

Monitor manufacturing service degree agreements (SLAs) by means of CloudWatch alerts
Debug complicated agent behaviors in W&B Weave’s hint explorer
Optimize token utilization and latency with detailed execution breakdowns
Examine agent efficiency throughout totally different prompts and configurations

The mixing requires minimal code modifications, preserves your current AgentCore deployment, and scales together with your agent complexity. Whether or not you’re constructing easy tool-calling brokers or orchestrating multi-step workflows, this observability stack gives the insights wanted to iterate rapidly and deploy confidently.

For implementation particulars and full code examples, discuss with our earlier publish.

Conclusion

On this publish, we demonstrated find out how to construct and optimize enterprise-grade agentic AI options by combining Amazon Bedrock’s FMs and AgentCore with W&B Weave’s complete observability toolkit. We explored how W&B Weave can improve each stage of the LLM improvement lifecycle—from preliminary experimentation within the Playground to systematic analysis of mannequin efficiency, and eventually to manufacturing monitoring of complicated agent workflows.

The mixing between Amazon Bedrock and W&B Weave gives a number of key capabilities:

Computerized monitoring of Amazon Bedrock FM calls with minimal code modifications utilizing the W&B Weave SDK
Fast experimentation by means of the W&B Weave Playground’s intuitive interface for testing prompts and evaluating fashions
Systematic analysis with customized scoring capabilities to guage totally different Amazon Bedrock fashions
Complete observability for AgentCore deployments, with CloudWatch metrics offering extra strong operational monitoring supplemented by detailed execution traces

To get began:

Request a free trial or subscribe to Weights &Biases AI Growth Platform by means of AWS Market
Set up the W&B Weave SDK and comply with our code examples to start monitoring your Bedrock FM calls
Experiment with totally different fashions within the W&B Weave Playground by including your AWS credentials and testing varied Amazon Bedrock FMs
Arrange evaluations utilizing the W&B Weave Analysis framework to systematically examine mannequin efficiency in your use circumstances
Improve your AgentCore brokers by including W&B Weave observability utilizing both the native SDK or OpenTelemetry integration

Begin with a easy integration to trace your Amazon Bedrock calls, then progressively undertake extra superior options as your AI purposes develop in complexity. The mixture of Amazon Bedrock and W&B Weave’s complete improvement instruments gives the muse wanted to construct, consider, and keep production-ready AI options at scale.

In regards to the authors

James Yi is a Senior AI/ML Companion Options Architect at AWS. He spearheads AWS’s strategic partnerships in Rising Applied sciences, guiding engineering groups to design and develop cutting-edge joint options in generative AI. He permits subject and technical groups to seamlessly deploy, function, safe, and combine associate options on AWS. James collaborates carefully with enterprise leaders to outline and execute joint Go-To-Market methods, driving cloud-based enterprise development. Outdoors of labor, he enjoys taking part in soccer, touring, and spending time along with his household.

Ray Strickland is a Senior Companion Options Architect at AWS specializing in AI/ML, Agentic AI and Clever Doc Processing. He permits companions to deploy scalable generative AI options utilizing AWS greatest practices and drives innovation by means of strategic associate enablement packages. Ray collaborates throughout a number of AWS groups to speed up AI adoption and has in depth expertise in associate analysis and enablement.

Thomas Capelle is a Machine Studying Engineer at Weights & Biases. He’s answerable for conserving the www.github.com/wandb/examples repository dwell and updated. He additionally builds content material on MLOPS, purposes of W&B to industries, and enjoyable deep studying normally. Beforehand he was utilizing deep studying to unravel short-term forecasting for photo voltaic vitality. He has a background in City Planning, Combinatorial Optimization, Transportation Economics, and Utilized Math.

Scott Juang is the Director of Alliances at Weights & Biases. Previous to W&B, he led a variety of strategic alliances at AWS and Cloudera. Scott studied Supplies Engineering and has a ardour for renewable vitality.

Speed up Enterprise AI Growth utilizing Weights & Biases and Amazon Bedrock AgentCore

Breaking the {Hardware} Barrier: Software program FP8 for Older GPUs

Hugging Face Transformers in Motion: Studying How To Leverage AI for NLP

Hugging Face Transformers in Motion: Studying How To Leverage AI for NLP

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

The Good-Sufficient Fact | In direction of Knowledge Science

About Us

Category

Recent Posts