Construct Agentic Workflows with OpenAI GPT OSS on Amazon SageMaker AI and Amazon Bedrock AgentCore

OpenAI has launched two open-weight fashions, gpt-oss-120b (117 billion parameters) and gpt-oss-20b (21 billion parameters), each constructed with a Combination of Specialists (MoE) design and a 128K context window. These fashions are the main open supply fashions, based on Synthetic Evaluation benchmarks, and excel at reasoning and agentic workflows. With Amazon SageMaker AI, you may fine-tune or customise fashions and deploy together with your alternative of framework via a totally managed service. Amazon SageMaker Inference provides you the flexibleness to deliver your personal inference code and framework with out having to construct and keep your personal clusters.

Though massive language fashions (LLMs) excel at understanding language and producing content material, constructing real-world agentic purposes requires complicated workflow administration, device calling capabilities, and context administration. Multi-agent architectures handle these challenges by breaking down complicated programs into specialised elements, however they introduce new complexities in agent coordination, reminiscence administration, and workflow orchestration.

On this submit, we present how you can deploy gpt-oss-20b mannequin to SageMaker managed endpoints and display a sensible inventory analyzer agent assistant instance with LangGraph, a robust graph-based framework that handles state administration, coordinated workflows, and chronic reminiscence programs. We’ll then deploy our brokers to Amazon Bedrock AgentCore, a unified orchestration layer that abstracts away infrastructure and means that you can securely deploy and function AI brokers at scale.

Resolution overview

On this answer, we construct an agentic inventory analyzer with the next key elements:

The GPT OSS 20B mannequin deployed to a SageMaker endpoint utilizing vLLM, an open supply serving framework for LLMs
LangGraph to construct a multi-agent orchestration framework
Amazon Bedrock AgentCore to deploy the brokers

The next diagram illustrates the answer structure.

This structure illustrates a multi-agent workflow hosted on Amazon Bedrock AgentCore Runtime operating on AWS. A consumer submits a question, which is dealt with by a pipeline of specialised brokers—Information Gathering Agent, Inventory Efficiency Analyzer Agent, and Inventory Report Era Agent—which might be every liable for a definite a part of the inventory analysis course of.

These brokers collaborate inside Amazon Bedrock AgentCore Runtime, and when language understanding or technology is required, they invoke a GPT OSS mannequin hosted on SageMaker AI. The mannequin processes the enter and returns structured outputs that inform agent actions, enabling a totally serverless, modular, and scalable agentic system utilizing open-source fashions.

Stipulations

Guarantee that you’ve required quota for G6e cases to deploy the mannequin. Request quota right here if you don’t.
If that is your first time working with Amazon SageMaker Studio, you first must create a SageMaker area.
Guarantee your IAM function has required permissions to deploy SageMaker Fashions and Endpoints. For extra info, see How Amazon SageMaker AI works with IAM within the SageMaker Developer Information.

Deploy GPT-OSS fashions to SageMaker Inference

Clients who need to customise their fashions and frameworks can deploy utilizing serverful deployments, however this requires entry to GPUs, serving frameworks, load balancers, and infrastructure setup. SageMaker AI supplies a totally managed internet hosting platform that takes care of provisioning the infrastructure with the mandatory drivers, downloads the fashions, and deploys them. OpenAI’s GPT-OSS fashions are launched with a 4-bit quantization scheme (MXFP4), enabling quick inference whereas retaining useful resource utilization low. These fashions can run on P5(H100), P6(H200), and P4(A100) and G6e(L40) cases.The GPT-OSS fashions are sparse MoE architectures with 128 consultants (120B) or 32 consultants (20B), the place every token is routed to 4 consultants with no shared knowledgeable. Utilizing MXFP4 for MoE weights alone reduces the mannequin sizes to 63 GB (120B) and 14 GB (20B), making them runnable on a single H100 GPU.

To deploy these fashions successfully, you want a robust serving framework like vLLM. To deploy the mannequin, we construct a vLLM container with the newest model that helps GPT OSS fashions on SageMaker AI.

You should use the next Docker file and script to construct the container and push it to a neighborhood Amazon Elastic Container Registry (Amazon ECR). The advisable method is to do that instantly from Amazon SageMaker Studio, which supplies a managed JupyterLab atmosphere with AWS CLI entry the place you may construct and push photos to ECR as a part of your SageMaker workflow. Alternatively, you can even carry out the identical steps on an Amazon Elastic Compute Cloud (Amazon EC2) occasion with Docker put in.

After you might have constructed and pushed the container to Amazon ECR, you may open Amazon SageMaker Studio by going to the SageMaker AI console, as proven within the following screenshot.

You may then create a Jupyter house or use an current one to launch JupyterLab and run notebooks.

Clone the next pocket book and run “Choice 3: Deploying from HF utilizing BYOC.” Replace the required parameters, such because the inference picture within the pocket book with the container picture. We additionally present mandatory atmosphere variables, as proven within the following code.

inference_image  f"{account_id}.dkr.ecr.{area}.amazonaws.com/vllm:v0.10.0-gpt-oss"
instance_type  "ml.g6e.4xlarge"
num_gpu  1
model_name  sagemakerutilsname_from_base("model-byoc")
endpoint_name  model_name
inference_component_name  f"ic-{model_name}"
config  {
"OPTION_MODEL": "openai/gpt-oss-20b",
"OPTION_SERVED_MODEL_NAME": "mannequin",
"OPTION_TENSOR_PARALLEL_SIZE": jsondumps(num_gpu),
"OPTION_ASYNC_SCHEDULING": "true",
}

After you arrange the deployment configuration, you may deploy to SageMaker AI utilizing the next code:

from sagemaker.compute_resource_requirements.resource_requirements import ResourceRequirements

lmi_model = sagemaker.Mannequin(
    image_uri=inference_image,
    env=config,
    function=function,
    identify=model_name,
)

lmi_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=600,
    endpoint_name=endpoint_name,
    endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
    inference_component_name=inference_component_name,
    assets=ResourceRequirements(requests={"num_accelerators": num_gpu, "reminiscence": 1024*5, "copies": 1,}),
)

Now you can run an inference instance:

payload={
    "messages": [
        {"role": "user", "content": "Name popular places to visit in London?"}
    ],
}
res = llm.predict(payload)
print("-----n" + res["choices"][0]["message"]["content"] + "n-----n")
print(res["usage"])

-----
Listed below are among the should‑see spots in London — a mixture of iconic landmarks, world‑class museums, and vibrant neighborhoods:

| # | Place | Why It’s Fashionable |
|---|-------|------------------|
| 1 | **Buckingham Palace** | The Queen’s official London residence – watch the Altering of the Guard. |
| 2 | **The Tower of London & Tower Bridge** | Historic citadel, Crown Jewels, and the long-lasting bridge with glass flooring. |
| 3 | **The British Museum** | World‑well-known assortment from the Rosetta Stone to Egyptian mummies (free entry). |
| 4 | **The Homes of Parliament & Large Ben** | The basic image of London’s politics and structure. |
| 5 | **The Nationwide Gallery (Tate Britain)** | Dwelling to masterpieces from Van Gogh to Turner. |
| 6 | **Buckinghamshire Gardens (Kew Gardens)** | Beautiful botanical gardens with a glasshouse and the Horniman Insect Zoo. |
| 7 | **Camden Market** | Eclectic stalls, road meals, music and classic style. |
| 8 | **Covent Backyard** | Energetic piazza with road performers, boutique retailers, and the Royal Opera Home. |
| 9 | **West Finish Theatres** | Theatre district well-known for grand productions (musicals, dramas). |
|10 | **The Shard** | Skyscraper with panoramic 360° views of London. |
|11 | **St. Paul’s Cathedral** | Large dome, gorgeous inside and a climb up the Whispering Gallery. |
|12 | **The Tate Trendy** | Up to date artwork museum set in a former energy station. |
|13 | **The Victoria & Albert Museum** | Design and style, costume, and jewellery collections. |
|14 | **Hyde Park & Kensington Gardens** | Enormous inexperienced areas with Serpentine Lake, Speaker’s Nook and Audio system' Nook. |
|15 | **Oxford Avenue & Regent Avenue** | Prime purchasing streets for style, flagship shops, and historic structure. |

These spots cowl historical past, tradition, purchasing, and leisure—excellent for a primary go to or a weekend escape in London!
-----

Use LangGraph to construct a inventory analyzer agent

For our inventory analyzing multi-agent system, we use LangGraph to orchestrate the workflow. Jupyter pocket book for the code is positioned on this github repository. The system contains three specialised instruments that work collectively to investigate shares comprehensively:

The gather_stock_data device collects complete inventory information for a given ticker image, together with present worth, historic efficiency, monetary metrics, and market information. It returns formatted info protecting worth historical past, firm fundamentals, buying and selling metrics, and up to date information headlines.
The analyze_stock_performance device performs detailed technical and elementary evaluation of inventory information, calculating metrics like worth traits, volatility, and general funding scores. It evaluates a number of elements together with P/E ratios, revenue margins, and dividend yields to offer a complete efficiency evaluation
The generate_stock_reportdevice creates skilled PDF experiences from the gathered inventory information and evaluation, mechanically importing them to Amazon S3 with organized date-based folders.

For native testing, you should use a simplified model of the system by importing the mandatory features out of your native script. For instance:

from langgraph_stock_local import langgraph_stock_sagemaker
# Check the agent regionally
consequence = langgraph_stock_sagemaker({
    "immediate": "Analyze SIM_STOCK Inventory for Funding functions."
})
print(consequence)

This manner, you may iterate shortly in your agent’s logic earlier than deploying it to a scalable platform, ensuring every element features appropriately and the general workflow produces the anticipated outcomes for several types of shares.

Deploy to Amazon Bedrock AgentCore

After you might have developed and examined your LangGraph framework regionally, you may deploy it to Amazon Bedrock AgentCore Runtime. Amazon Bedrock AgentCore handles the heavy lifting of container orchestration, session administration, scalability and abstracting the administration of infrastructure. It supplies persistent execution environments that may keep an agent’s state throughout a number of invocations.

Earlier than deploying our inventory analyzer agent to Amazon Bedrock AgentCore Runtime, we have to create an AWS Identification and Entry Administration IAM function with the suitable permissions. This function permits Amazon Bedrock AgentCore to invoke your SageMaker endpoint for GPT-OSS mannequin inference, handle ECR repositories for storing container photos, write Amazon CloudWatch logs for monitoring and debugging, entry Amazon Bedrock AgentCore workload companies for runtime operations, and ship telemetry information to AWS X-Ray and CloudWatch for observability. See the next code:

from create_agentcore_role import create_bedrock_agentcore_role
role_arn = create_bedrock_agentcore_role(
    role_name="MyStockAnalyzerRole",
    sagemaker_endpoint_name="your-endpoint-name",
    area="us-west-2"
)

After creating the function, you should use the Amazon Bedrock AgentCore Starter Toolkit to deploy your agent. The toolkit simplifies the deployment course of by packaging your code, creating the mandatory container picture, and configuring the runtime atmosphere:

from bedrock_agentcore_starter_toolkit import Runtime
agentcore_runtime = Runtime()
# Configure the agent
response = agentcore_runtime.configure(
    entrypoint="langgraph_stock_sagemaker_gpt_oss.py",
    execution_role=role_arn,
    auto_create_ecr=True,
    requirements_file="necessities.txt",
    area="us-west-2",
    agent_name="stock_analyzer_agent"
)
# Deploy to the cloud
launch_result = agentcore_runtime.launch(native=False, local_build=False)

If you’re utilizing BedrockAgentCoreApp, it mechanically creates an HTTP server that listens on port 8080, implements the required /invocations endpoint for processing the agent’s necessities, implements the/ping endpoint for well being checks (which is essential for asynchronous brokers), handles correct content material varieties and response codecs, and manages error dealing with based on AWS requirements.

After you deploy to Amazon Bedrock AgentCore Runtime, it is possible for you to to see the standing present as Prepared on the Amazon Bedrock AgentCore console.

Invoke the agent

After you create the agent, you could arrange the agent invocation entry level. With Amazon AgentCore Runtime, we beautify the invocation a part of our agent with the @app.entrypoint decorator and use it because the entry level for our runtime. After you deploy the agent to Amazon AgentCore Runtime, you may invoke it utilizing the AWS SDK:

import boto3
import json
agentcore_client = boto3.shopper('bedrock-agentcore', region_name="us-west-2")
response = agentcore_client.invoke_agent_runtime(
    agentRuntimeArn=launch_result.agent_arn,
    qualifier="DEFAULT",
    payload=json.dumps({
        "immediate": "Analyze SIM_STOCK for funding functions"
    })
)

After invoking the inventory analyzer agent via Amazon Bedrock AgentCore Runtime, you could parse and format the response for clear presentation. The response processing entails the next steps:

Decode the byte stream from Amazon Bedrock AgentCore into readable textual content.
Parse the JSON response containing the whole inventory evaluation.
Extract three fundamental sections utilizing regex sample matching:
1. Inventory Information Gathering Part: Extracts core inventory info together with image, firm particulars, present pricing, market metrics, monetary ratios, buying and selling information, and up to date information headlines.
2. Efficiency Evaluation part: Analyzes technical indicators, elementary metrics, and volatility measures to generate complete inventory evaluation.
3. Inventory Report Era Part: Generates an in depth PDF report with all of the Inventory Technical Evaluation.

The system additionally contains error dealing with that gracefully handles JSON parsing errors, falls again to plain textual content show if structured parsing fails, and supplies debugging info for troubleshooting parsing problems with the inventory evaluation response.

stock_analysis = parse_bedrock_agentcore_stock_response(invoke_response)

This formatted output makes it simple to assessment the agent’s decision-making course of and current skilled inventory evaluation outcomes to stakeholders, finishing the end-to-end workflow from mannequin deployment to significant enterprise output:

STOCK DATA GATHERING REPORT:
================================
Inventory Image: SIM_STOCK
Firm Title: Simulated Inventory Inc.
Sector: SIM_SECTOR
Trade: SIM INDUSTRY
CURRENT MARKET DATA:
- Present Worth: $29.31
- Market Cap: $3,958
- 52-Week Excessive: $29.18
- 52-Week Low: $16.80
- YTD Return: 1.30%
- Volatility (Annualized): 32.22%
FINANCIAL METRICS:
- P/E Ratio: 44.80
- Ahead P/E: 47.59
- Worth-to-Guide: 11.75
- Dividend Yield: 0.46%
- Income (TTM): $4,988
- Revenue Margin: 24.30%

STOCK PERFORMANCE ANALYSIS:
===============================
Inventory: SIM_STOCK | Present Worth: $29.31
TECHNICAL ANALYSIS:
- Worth Development: SLIGHT UPTREND
- YTD Efficiency: 1.03%
- Technical Rating: 3/5
FUNDAMENTAL ANALYSIS:
- P/E Ratio: 34.80
- Revenue Margin: 24.30%
- Dividend Yield: 0.46%
- Beta: 1.165
- Basic Rating: 3/5
STOCK REPORT GENERATION:
===============================
Inventory: SIM_STOCK 
Sector: SIM_INDUSTRY
Present Worth: $29.78
REPORT SUMMARY:
- Technical Evaluation: 8.33% YTD efficiency
- Report Sort: Complete inventory evaluation for informational functions
- Generated: 2025-09-04 23:11:55
PDF report uploaded to S3: s3://amzn-s3-demo-bucket/2025/09/04/SIM_STOCK_Stock_Report_20250904_231155.pdf
REPORT CONTENTS:
• Government Abstract with key metrics
• Detailed market information and monetary metrics
• Technical and elementary evaluation
• Skilled formatting for documentation

Clear up

You may delete the SageMaker endpoint to keep away from accruing prices after your testing by operating the next cells in the identical pocket book:

sessdelete_inference_component(inference_component_name)
sessdelete_endpoint(endpoint_name)
sessdelete_endpoint_config(endpoint_name)
sessdelete_model(model_name)

You can too delete Amazon Bedrock AgentCore assets utilizing the next instructions:

runtime_delete_response  agentcore_control_clientdelete_agent_runtime(
agentRuntimeIdlaunch_resultagent_id
)
response  ecr_clientdelete_repository(
repositoryNamelaunch_resultecr_urisplit('/')[1],
power
)

Conclusion

On this submit, we constructed an end-to-end answer for deploying OpenAI’s open-weight fashions on a single G6e(L40s) GPU, making a multi-agent inventory evaluation system with LangGraph and deploying it seamlessly with Amazon Bedrock AgentCore. This implementation demonstrates how organizations can now use highly effective open supply LLMs cost-effectively with environment friendly serving frameworks similar to vLLM. Past the technical implementation, enhancing this workflow can present vital enterprise worth, similar to discount in inventory evaluation processing time, elevated analyst productiveness by automating routine inventory assessments. Moreover, by liberating analysts from repetitive duties, organizations can redirect expert professionals towards complicated circumstances and relationship-building actions that drive enterprise development.

We invite you to check out our code samples and iterate your agentic workflows to fulfill your use circumstances.

In regards to the authors

Vivek Gangasani is a Worldwide Lead GenAI Specialist Options Architect for SageMaker Inference. He drives Go-to-Market (GTM) and Outbound Product technique for SageMaker Inference. He additionally helps enterprises and startups deploy, handle, and scale their GenAI fashions with SageMaker and GPUs. At present, he’s centered on growing methods and options for optimizing inference efficiency and GPU effectivity for internet hosting Giant Language Fashions. In his free time, Vivek enjoys mountain climbing, watching films, and making an attempt totally different cuisines.

Surya Kari is a Senior Generative AI Information Scientist at AWS, specializing in growing options leveraging state-of-the-art basis fashions. He has in depth expertise working with superior language fashions together with DeepSeek-R1, the Llama household, and Qwen, specializing in their fine-tuning and optimization for particular scientific purposes. His experience extends to implementing environment friendly coaching pipelines and deployment methods utilizing AWS SageMaker, enabling the scaling of basis fashions from growth to manufacturing. He collaborates with prospects to design and implement generative AI options, serving to them navigate mannequin choice, fine-tuning approaches, and deployment methods to attain optimum efficiency for his or her particular use circumstances.

Construct Agentic Workflows with OpenAI GPT OSS on Amazon SageMaker AI and Amazon Bedrock AgentCore

Evaluation of Gross sales Shift in Retail with Causal Influence: A Case Examine at Carrefour

How I Constructed and Deployed an App in 2 days with Lovable, Supabase, and Netlify

How I Constructed and Deployed an App in 2 days with Lovable, Supabase, and Netlify

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

The Journey from Jupyter to Programmer: A Fast-Begin Information

The right way to run Qwen 2.5 on AWS AI chips utilizing Hugging Face libraries

About Us

Category

Recent Posts