NVIDIA Nemotron 3 Nano 30B MoE mannequin is now obtainable in Amazon SageMaker JumpStart

As we speak we’re excited to announce that the NVIDIA Nemotron 3 Nano 30B mannequin with 3B lively parameters is now usually obtainable within the Amazon SageMaker JumpStart mannequin catalog. You may speed up innovation and ship tangible enterprise worth with Nemotron 3 Nano on Amazon Net Companies (AWS) with out having to handle mannequin deployment complexities. You may energy your generative AI functions with Nemotron capabilities utilizing the managed deployment capabilities supplied by SageMaker JumpStart.

Nemotron 3 Nano is a small language hybrid combination of consultants (MoE) mannequin with the best compute effectivity and accuracy for builders to drive highly-skilled agentic duties at scale. The mannequin is absolutely open with open-weights, datasets, and recipes, so builders can seamlessly customise, optimize, and deploy the mannequin on their infrastructure to assist meet their privateness and safety necessities. Nemotron 3 Nano excels in coding and reasoning, and leads on benchmarks resembling SWE Bench Verified, GPQA Diamond, AIME 2025, Enviornment Laborious v2, and IFBench.

About Nemotron 3 Nano 30B

Nemotron 3 Nano is differentiated from different fashions by its structure and accuracy, boasting sturdy efficiency in a wide range of extremely technical abilities:

Structure:
- ο MoE with hybrid Transformer-Mamba architectureο Helps token finances for offering optimum accuracy with minimal reasoning token technology
Accuracy:
- Main accuracy on coding, scientific reasoning, math, and instruction following
- Leads on benchmarks resembling LiveCodeBench, GPQA Diamond, AIME 2025, BFCL , and IFBench (in comparison with different open language fashions underneath 30B)
Usability:
- 30B parameter mannequin with 3 billion lively parameters
- Has a context window of as much as 1 million tokens
- Textual content-based basis mannequin, utilizing textual content for each inputs and outputs

Conditions

To get began with Nemotron 3 Nano in Amazon SageMaker JumpStart, it’s essential to have a provisioned Amazon SageMaker Studio area.

Get began with NVIDIA Nemotron 3 Nano 30B in SageMaker JumpStart

To check the Nemotron 3 Nano mannequin in SageMaker JumpStart, open SageMaker Studio and select Fashions within the navigation pane. Seek for NVIDIA within the search bar and select NVIDIA Nemotron 3 Nano 30B because the mannequin.

On the mannequin particulars web page, select Deploy and comply with the prompts to deploy the mannequin.

After the mannequin is deployed to a SageMaker AI endpoint, you possibly can check it. You may entry the mannequin utilizing the next AWS Command Line Interface (AWS CLI) code examples. You need to use nvidia/nemotron-3-nano because the mannequin ID.

cat > enter.json << EOF
{
"mannequin": "${MODEL_ID}",
"messages": [
{
 	"role": "system",
 	"content": "You are a helpful assistant."
 },
 {
 	"role": "user",
       	"content": "What is NVIDIA? Answer in 2-3 sentences."
}],
"max_tokens": 512,
"temperature": 0.2,
"stream": False, # Set to False for non-streaming mode,
   	"chat_template_kwargs": {"enable_thinking": False} # Set to False for non-reasoning mode
}
EOF
 
aws sagemaker-runtime invoke-endpoint 
--endpoint-name ${ENDPOINT_NAME} 
--region ${AWS_REGION} 
--content-type 'utility/json' 
--body fileb://enter.json 
> response.json

Alternatively, you possibly can entry the mannequin utilizing SageMaker SDK and Boto3 code. The next Python code examples present find out how to ship a textual content message to the NVIDIA Nemotron 3 Nano 30B utilizing the SageMaker SDK. For extra code examples, consult with the NVIDIA GitHub repo.

runtime_client = boto3.shopper('sagemaker-runtime', region_name=area) 
payload = {
        "messages": [
            {"role": "user", "content": prompt}
        ],
        "max_tokens": 1000
    }
    
    strive:
        response = self.runtime_client.invoke_endpoint(
            EndpointName=self.endpoint_name,
            ContentType="utility/json",
            Physique=json.dumps(payload)
        )
        
        response_body = response['Body'].learn().decode('utf-8')
        raw_response = json.masses(response_body)
        
        # Parse the response utilizing our customized parser
        return self.parse_response(raw_response)
        
    besides Exception as e:
        elevate Exception(
            f"Did not invoke endpoint '{self.endpoint_name}': {str(e)}. "
            f"Verify that the endpoint is InService and you've got least-privileged IAM permissions assigned."
        )

Now obtainable

NVIDIA Nemotron 3 Nano is now obtainable absolutely managed in SageMaker JumpStart. Check with the mannequin bundle for AWS Area availability. To study extra, take a look at the Nemotron Nano mannequin web page, the NVIDIA GitHub pattern pocket book for Nemotron 3 Nano 30B, and the Amazon SageMaker JumpStart pricing web page.

Attempt the Nemotron 3 Nano mannequin in Amazon SageMaker JumpStart in the present day and ship suggestions to AWS re:Put up for SageMaker JumpStart or by way of your normal AWS Assist contacts.

Concerning the authors

Dan Ferguson is a Options Architect at AWS, primarily based in New York, USA. As a machine studying companies knowledgeable, Dan works to assist clients on their journey to integrating ML workflows effectively, successfully, and sustainably.

Pooja Karadgi leads product and strategic partnerships for Amazon SageMaker JumpStart, the machine studying and generative AI hub inside SageMaker. She is devoted to accelerating buyer AI adoption by simplifying basis mannequin discovery and deployment, enabling clients to construct production-ready generative AI functions throughout the complete mannequin lifecycle – from onboarding and customization to deployment.

Benjamin Crabtree is a Senior Software program Engineer on the Amazon SageMaker AI staff, specializing in delivering the “final mile” expertise to clients. He’s obsessed with democratizing the most recent synthetic intelligence breakthroughs by providing straightforward to make use of capabilities. Additionally, Ben is extremely skilled in constructing machine studying infrastructure at scale.

Timothy Ma is a Principal Specialist in generative AI at AWS, the place he collaborates with clients to design and deploy cutting-edge machine studying options. He additionally leads go-to-market methods for generative AI companies, serving to organizations harness the potential of superior AI applied sciences.

Abdullahi Olaoye is a Senior AI Options Architect at NVIDIA, specializing in integrating NVIDIA AI libraries, frameworks, and merchandise with cloud AI companies and open-source instruments to optimize AI mannequin deployment, inference, and generative AI workflows. He collaborates with AWS to boost AI workload efficiency and drive adoption of NVIDIA-powered AI and generative AI options.

Nirmal Kumar Juluru is a product advertising supervisor at NVIDIA driving the adoption of AI software program, fashions, and APIs within the NVIDIA NGC Catalog and NVIDIA AI Basis fashions and endpoints. He beforehand labored as a software program developer. Nirmal holds an MBA from Carnegie Mellon College and a bachelors in pc science from BITS Pilani.

Vivian Chen is a Deep Studying Options Architect at NVIDIA, the place she helps groups bridge the hole between advanced AI analysis and real-world efficiency. Specializing in inference optimization and cloud-integrated AI options, Vivian focuses on turning the heavy lifting of machine studying into quick, scalable functions. She is obsessed with serving to shoppers navigate NVIDIA’s accelerated computing stack to make sure their fashions don’t simply work within the lab, however thrive in manufacturing.

NVIDIA Nemotron 3 Nano 30B MoE mannequin is now obtainable in Amazon SageMaker JumpStart

Constructing an AI Agent to Detect and Deal with Anomalies in Time-Collection Knowledge

The best way to Leverage Explainable AI for Higher Enterprise Choices

The best way to Leverage Explainable AI for Higher Enterprise Choices

Leave a Reply Cancel reply

Popular News

Greatest practices for Amazon SageMaker HyperPod activity governance

How Cursor Really Indexes Your Codebase

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

About Us

Category

Recent Posts