Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

Run NVIDIA Nemotron 3 Nano as a completely managed serverless mannequin on Amazon Bedrock

admin by admin
March 10, 2026
in Artificial Intelligence
0
Run NVIDIA Nemotron 3 Nano as a completely managed serverless mannequin on Amazon Bedrock
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


This publish is cowritten with Abdullahi Olaoye, Curtice Lockhart, Nirmal Kumar Juluru from NVIDIA.

We’re excited to announce that NVIDIA’s Nemotron 3 Nano is now obtainable as a completely managed and serverless mannequin in Amazon Bedrock. This follows our earlier announcement at AWS re:Invent supporting NVIDIA Nemotron 2 Nano 9B and NVIDIA Nemotron 2 Nano VL 12B fashions.

With NVIDIA Nemotron open fashions on Amazon Bedrock, you may speed up innovation and ship tangible enterprise worth with out having to handle infrastructure complexities. You may energy your generative AI functions with Nemotron’s capabilities via the inference capabilities of Amazon Bedrock and harness the good thing about its in depth options and tooling.

This publish explores the technical traits of the NVIDIA Nemotron 3 Nano mannequin and discusses potential utility use circumstances. Moreover, it gives technical steerage that can assist you get began utilizing this mannequin to your generative AI functions throughout the Amazon Bedrock setting.

About Nemotron 3 Nano

NVIDIA Nemotron 3 Nano is a small language mannequin (SLM) with a hybrid Combination-of-Consultants (MoE) structure that delivers excessive compute effectivity and accuracy that builders can use to construct specialised agentic AI techniques. The mannequin is totally open with open-weights, datasets, and recipes facilitating transparency and confidence for builders and enterprises. In comparison with different comparable sized fashions, Nemotron 3 Nano excels in coding and reasoning duties, taking the lead on benchmarks comparable to SWE Bench Verified, AIME 2025, Area Onerous v2, and IFBench.

Mannequin overview:

  • Structure:
    • Combination-of-Consultants (MoE) with Hybrid Transformer-Mamba Structure
    • Helps Token Funds for offering accuracy whereas avoiding overthinking
  • Accuracy:
    • Main accuracy on coding, scientific reasoning, math, software calling, instruction following, and chat
    • Nemotron 3 Nano leads on benchmarks comparable to SWE Bench, AIME 2025, Humanity Final Examination, IFBench, RULER, and Area Onerous (in comparison with different open language fashions with 30 billion or fewer MoE)
  • Mannequin measurement: 30 B with 3 B energetic parameters
  • Context size: 256K
  • Mannequin enter: Textual content
  • Mannequin output: Textual content

Nemotron 3 Nano combines Mamba, Transformer, and Combination-of-Consultants layers right into a single spine to assist steadiness effectivity, reasoning accuracy, and scale. Mamba allows long-range sequence modeling with low reminiscence overhead, whereas Transformer layers assist add exact consideration for structured reasoning duties like code, math, and planning. MoE routing additional boosts scalability by activating solely a subset of specialists per token, serving to to enhance latency and throughput. This makes Nemotron 3 Nano particularly well-suited for agent clusters working many concurrent, light-weight workflows.

To be taught extra about Nemotron 3 Nano’s structure and the way it’s skilled, see Inside NVIDIA Nemotron 3: Methods, Instruments, and Knowledge That Make It Environment friendly and Correct.

Mannequin benchmarks

The next picture reveals that Nemotron 3 Nano leads in probably the most enticing quadrant in Synthetic Evaluation Openness Index vs. Intelligence Index. Why openness issues: It builds belief via transparency. Builders and enterprises can confidently construct on Nemotron with clear visibility into the mannequin, information pipeline, and information traits, enabling simple auditing and governance.

Title: Chart displaying Nemotron 3 Nano in probably the most enticing quadrant in Synthetic Evaluation Openness vs Intelligence Index (Supply: Synthetic Evaluation)

As proven within the following picture, Nemotron 3 Nano gives main accuracy with the best effectivity among the many open fashions and scores a formidable 52 factors, a major soar over the earlier Nemotron 2 Nano mannequin. Token demand is rising as a result of agentic AI, so the power to ‘suppose quick’ (arrive on the appropriate reply rapidly whereas utilizing fewer tokens) is vital. Nemotron 3 Nano delivers excessive throughput with its environment friendly Hybrid Transformer-Mamba and MoE structure.

Title: NVIDIA Nemotron 3 Nano gives highest effectivity with main accuracy amongst open fashions with a formidable 52 factors rating on Synthetic Evaluation Intelligence vs. Output Pace Index. (Supply: Synthetic Evaluation)

NVIDIA Nemotron 3 Nano use circumstances

Nemotron 3 Nano helps energy varied use circumstances for various industries. Among the use circumstances embrace

  • Finance – Speed up mortgage processing by extracting information, analyzing revenue patterns, detecting fraudulent operations, decreasing cycle instances, and threat.
  • Cybersecurity – Robotically triage vulnerabilities, carry out in-depth malware evaluation, and proactively hunt for safety threats.
  • Software program growth – Help with duties like code summarization.
  • Retail – Optimize stock administration and assist improve in-store service with real-time, customized product suggestions and assist.

Get began with NVIDIA Nemotron 3 Nano in Amazon Bedrock

To check NVIDIA Nemotron 3 Nano in Amazon Bedrock, full the next steps:

  1. Navigate to the Amazon Bedrock console and choose Chat/Textual content playground from the left menu (underneath the Check part).
  2. Select Choose mannequin within the upper-left nook of the playground.
  3. Select NVIDIA from the class listing, then choose NVIDIA Nemotron 3 Nano.
  4. Select Apply to load the mannequin.

After choice, you may check the mannequin instantly. Let’s use the next immediate to generate a unit check in Python code utilizing the pytest framework:

Write a pytest unit check suite for a Python perform referred to as calculate_mortgage(principal, fee, years). Embrace check circumstances for: 1) A typical 30-year mounted mortgage 2) An edge case with 0% curiosity 3) Error dealing with for damaging enter values.

Advanced duties like this immediate can profit from a series of thought method to assist produce a exact outcome based mostly on the reasoning capabilities constructed natively into the mannequin.

Utilizing the AWS CLI and SDKs

You may entry the mannequin programmatically utilizing the mannequin ID nvidia.nemotron-nano-3-30b. The mannequin helps each the InvokeModel and Converse APIs via the AWS Command Line Interface (AWS CLI) and AWS SDK with nvidia.nemotron-nano-3-30b because the mannequin ID. Additional, it helps the Amazon Bedrock OpenAI SDK appropriate API.

Run the next command to invoke the mannequin instantly out of your terminal utilizing the AWS Command Line Interface (AWS CLI) and the InvokeModel API:

aws bedrock-runtime invoke-model  
 --model-id nvidia.nemotron-nano-3-30b  
 --region us-west-2  
 --body '{"messages": [{"role": "user", "content": "Type_Your_Prompt_Here"}], "max_tokens": 512, "temperature": 0.5, "top_p": 0.9}'  
 --cli-binary-format raw-in-base64-out  
invoke-model-output.txt

To invoke the mannequin via the AWS SDK for Python (boto3), use the next script to ship a immediate to the mannequin, on this case by utilizing the Converse API:

import boto3 
from botocore.exceptions import ClientError 

# Create a Bedrock Runtime consumer within the AWS Area you wish to use. 
consumer = boto3.consumer("bedrock-runtime", region_name="us-west-2") 

# Set the mannequin ID
model_id = "nvidia.nemotron-nano-3-30b" 

# Begin a dialog with the person message. 

user_message = "Type_Your_Prompt_Here" 
dialog = [ 
   { 
       "role": "user", 

       "content": [{"text": user_message}], 
   } 
]  

attempt: 
   # Ship the message to the mannequin utilizing a fundamental inference configuration. 
   response = consumer.converse( 
        modelId=model_id, 

       messages=dialog, 
        inferenceConfig={"maxTokens": 512, "temperature": 0.5, "topP": 0.9}, 
   ) 
 
   # Extract and print the response textual content. 
    response_text = response["output"]["message"]["content"][0]["text"] 
   print(response_text)

besides (ClientError, Exception) as e: 
    print(f"ERROR: Cannot invoke '{model_id}'. Purpose: {e}") 
    exit(1)

To invoke the mannequin via the Amazon Bedrock OpenAI-compatible ChatCompletions endpoint, you are able to do so by utilizing the OpenAI SDK:

# Import OpenAI SDK
from openai import OpenAI

# Set setting variables
os.environ["OPENAI_API_KEY"] = ""
os.environ["OPENAI_BASE_URL"] = "https://bedrock-runtime..amazon.com/openai/v1"

# Set the mannequin ID
model_id = "nvidia.nemotron-nano-3-30b"

# Set prompts
system_prompt = “Type_Your_System_Prompt_Here”
user_message = "Type_Your_User_Prompt_Here"


# Use ChatCompletionsAPI
response = consumer.chat.completions.create(
    mannequin= mannequin _ID,                 
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user",   "content": user_message}
    ],
    temperature=0,
    max_completion_tokens=1000
)
 
# Extract and print the response textual content
print(response.selections[0].message.content material)

Use NVIDIA Nemotron 3 Nano with Amazon Bedrock options

You may improve your generative AI functions by combining Nemotron 3 Nano with the Amazon Bedrock managed instruments. Use Amazon Bedrock Guardrails to implement safeguards and Amazon Data Bases to create sturdy Retrieval Augmented Era (RAG) workflows.

Amazon Bedrock guardrails

Guardrails is a managed security layer that helps implement accountable AI by filtering dangerous content material, redacting delicate info (PII), and blocking particular subjects throughout prompts and responses. It really works throughout a number of fashions to assist detect immediate injection assaults and hallucinations.

Instance use case: For those who’re constructing a mortgage assistant, you may assist stop it from providing normal funding recommendation. By configuring a filter for the phrase “shares”, person prompts containing that time period may be instantly blocked and obtain a customized message.

To arrange a guardrail, full the next steps:

  1. Within the Amazon Bedrock console, navigate to the Construct part on the left and choose Guardrails.
  2. Create a brand new guardrail and configure the mandatory filters to your use case.

After configured, check the guardrail with varied prompts to confirm its efficiency. You may then fine-tune settings, comparable to denied subjects, phrase filters, and PII redaction, to match your particular security necessities. For a deep dive, see Create your guardrail.

Amazon Bedrock Data Bases

Amazon Bedrock Data Bases automates the entire RAG workflow. It handles ingesting content material out of your information sources, chunking it into searchable segments, changing them into vector embeddings, and storing them in a vector database. Then, when a person submits a question, the system matches the enter in opposition to saved vectors to search out semantically comparable content material, which is then used to enhance the immediate despatched to the muse mannequin.

For this instance, we uploaded PDFs (for instance, Shopping for a New Dwelling, Dwelling Mortgage Toolkit, Searching for a Mortgage) to Amazon Easy Storage Service (Amazon S3) and chosen Amazon OpenSearch Serverless because the vector retailer. The next code demonstrates methods to question this information base utilizing the RetrieveAndGenerate API, whereas routinely facilitating security compliance alignment via a selected Guardrail ID.

import boto3
bedrock_agent_runtime_client = boto3.consumer('bedrock-agent-runtime')
response = bedrock_agent_runtime_client.retrieve_and_generate(
    enter={
        'textual content': 'I'm curious about buying a house. What steps ought to I take to ensure I'm ready to tackle a mortgage?'
    },
    retrieveAndGenerateConfiguration={
        'knowledgeBaseConfiguration': {
            'generationConfiguration': {
                'guardrailConfiguration': {
                    'guardrailId': '',
                    'guardrailVersion': '1'
                }
            },
            'knowledgeBaseId': '',
            'modelArn': 'arn:aws:bedrock:us-east-1::foundation-model/nvidia.nemotron-nano-3-30b',
            "generationConfiguration": {
                "promptTemplate": {
                    "textPromptTemplate": (
                        "You're a useful assistant that solutions questions on mortgages"
                        "search outcomes.nn"
                        "Search outcomes:n$search_results$nn"
                        "Consumer question:n$question$nn"
                        "Reply clearly and concisely."
                    )
                },
            },
            "orchestrationConfiguration": {
                "promptTemplate": {
                    "textPromptTemplate": (
                        "You're very educated on mortgages"
                        "Dialog thus far:n$conversation_history$nn"
                        "Consumer question:n$question$nn"
                        "$output_format_instructions$"
                    )
                }
            }
        },
        'kind': 'KNOWLEDGE_BASE'
    }
)
print(response)

It directs the NVIDIA Nemotron 3 Nano mannequin to synthesize the retrieved paperwork into a transparent, grounded reply utilizing your customized immediate template. To arrange your personal pipeline, evaluation the complete walkthrough within the Amazon Bedrock Consumer Information.

Conclusion

On this publish, we confirmed you methods to get began with NVIDIA Nemotron 3 Nano on Amazon Bedrock for totally managed serverless inference. We additionally confirmed you methods to use the mannequin with Amazon Bedrock Data Bases and Amazon Bedrock Guardrails. The mannequin is now obtainable within the US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Tokyo), Asia Pacific (Mumbai), South America (Sao Paulo), Europe (London), and Europe (Milan) AWS Areas. Verify the full Area listing for future updates. To be taught extra, take a look at NVIDIA Nemotron and provides NVIDIA Nemotron 3 Nano a attempt within the Amazon Bedrock console in the present day.


In regards to the authors

Antonio Rodriguez

Antonio Rodriguez is a Principal Generative AI Specialist Options Architect at Amazon Net Companies. He helps corporations of various sizes remedy their challenges, embrace innovation, and create new enterprise alternatives with Amazon Bedrock. Aside from work, he likes to spend time along with his household and play sports activities along with his pals.

Aris Tsakpinis

Aris Tsakpinis is a Senior Specialist Options Architect for Generative AI specializing in open weight fashions on Amazon Bedrock and the broader generative AI open-source setting. Alongside his skilled position, he’s pursuing a PhD in Machine Studying Engineering on the College of Regensburg, the place his analysis focuses on utilized generative AI in scientific domains.

Abdullahi Olaoye

Abdullahi Olaoye is a Senior AI Options Architect at NVIDIA, specializing in integrating NVIDIA AI libraries, frameworks, and merchandise with cloud AI providers and open-source instruments to optimize AI mannequin deployment, inference, and generative AI workflows. He collaborates with cloud suppliers to assist improve AI workload efficiency and drive adoption of NVIDIA-powered AI and generative AI options.

Curtice Lockhart

Curtice Lockhart is an AI Options Architect at NVIDIA, the place he helps clients deploy language and imaginative and prescient fashions to construct end-to-end AI workflows utilizing NVIDIA’s tooling on AWS. He enjoys making complicated AI ideas really feel approachable and spending his time exploring the artwork, music, and being outside.

Nirmal Kumar Juluru

Nirmal Kumar Juluru is a product advertising supervisor at NVIDIA driving the adoption of Nemotron and NeMo. He beforehand labored as a software program developer. Nirmal holds an MBA from Carnegie Mellon College and a bachelors in pc science from BITS Pilani.

Tags: AmazonBedrockFullyManagedModelNanoNemotronNVIDIARunServerless
Previous Post

Construct Semantic Search with LLM Embeddings

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • Greatest practices for Amazon SageMaker HyperPod activity governance

    Greatest practices for Amazon SageMaker HyperPod activity governance

    405 shares
    Share 162 Tweet 101
  • Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

    403 shares
    Share 161 Tweet 101
  • Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

    403 shares
    Share 161 Tweet 101
  • Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

    403 shares
    Share 161 Tweet 101
  • The Good-Sufficient Fact | In direction of Knowledge Science

    403 shares
    Share 161 Tweet 101

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • Run NVIDIA Nemotron 3 Nano as a completely managed serverless mannequin on Amazon Bedrock
  • Construct Semantic Search with LLM Embeddings
  • Why Your AI Search Analysis Is In all probability Fallacious (And The right way to Repair It)
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.