On the AWS Summit in New York Metropolis, we launched a complete suite of mannequin customization capabilities for Amazon Nova basis fashions. Out there as ready-to-use recipes on Amazon SageMaker AI, you need to use them to adapt Nova Micro, Nova Lite, and Nova Professional throughout the mannequin coaching lifecycle, together with pre-training, supervised fine-tuning, and alignment.
On this multi-post sequence, we are going to discover these customization recipes and supply a step-by-step implementation information. We’re beginning with Direct Choice Optimization (DPO, an alignment method that provides an easy technique to tune mannequin outputs together with your preferences. DPO makes use of prompts paired with two responses—one most popular over the opposite—to information the mannequin towards outputs that higher replicate your required tone, model, or tips. You possibly can implement this system utilizing both parameter-efficient or full mannequin DPO, primarily based in your knowledge quantity and price concerns. The custom-made fashions may be deployed to Amazon Bedrock for inference utilizing provisioned throughput. The parameter-efficient model helps on-demand inference. Nova customization recipes can be found in SageMaker coaching jobs and SageMaker HyperPod, supplying you with flexibility to pick the setting that most closely fits your infrastructure and scale necessities.
On this put up, we current a streamlined method to customizing Amazon Nova Micro with SageMaker coaching jobs.
Resolution overview
The workflow for utilizing Amazon Nova recipes with SageMaker coaching jobs, as illustrated within the accompanying diagram, consists of the next steps:
- The person selects a particular Nova customization recipe which supplies complete configurations to manage Amazon Nova coaching parameters, mannequin settings, and distributed coaching methods. You need to use the default configurations optimized for the SageMaker AI setting or customise them to experiment with completely different settings.
- The person submits an API request to the SageMaker AI management aircraft, passing the Amazon Nova recipe configuration.
- SageMaker makes use of the coaching job launcher script to run the Nova recipe on a managed compute cluster.
- Based mostly on the chosen recipe, SageMaker AI provisions the required infrastructure, orchestrates distributed coaching, and, upon completion, robotically decommissions the cluster.
This streamlined structure delivers a totally managed person expertise, so you may shortly outline Amazon Nova coaching parameters and choose your most popular infrastructure utilizing easy recipes, whereas SageMaker AI handles the end-to-end infrastructure administration—inside a pay-as-you-go pricing mannequin that’s solely billed for the web coaching time in seconds.
The custom-made Amazon Nova mannequin is subsequently deployed on Amazon Bedrock utilizing the createcustommodel
API inside Bedrock – and may combine with native tooling similar to Amazon Bedrock Information Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Brokers.
Enterprise Use Case – Implementation Stroll-through
On this put up, we concentrate on adapting the Amazon Nova Micro mannequin to optimize structured perform calling for application-specific agentic workflows. We exhibit how this method can optimize Amazon Nova fashions for domain-specific use instances by a 81% enhance in F1 rating and as much as 42% good points in ROUGE metrics. These enhancements make the fashions extra environment friendly in addressing a wide selection of enterprise purposes, similar to enabling buyer assist AI assistants to intelligently escalate queries, powering digital assistants for scheduling and workflow automation, and automating decision-making in sectors like ecommerce and monetary providers.
As proven within the following diagram, our method makes use of DPO to align the Amazon Nova mannequin with human preferences by presenting the mannequin with pairs of responses—one most popular by human annotators and one much less most popular—primarily based on a given person question and out there device actions. The mannequin is educated with the nvidia/When2Call dataset to extend the probability of the tool_call
response, which aligns with the enterprise aim of automating backend actions when applicable. Over many such examples, the Amazon Nova mannequin learns not simply to generate appropriate function-calling syntax, but in addition to make nuanced selections about when and the right way to invoke instruments in advanced workflows—enhancing its utility in enterprise purposes like buyer assist automation, workflow orchestration, and clever digital assistants.
When coaching is full, we consider the fashions utilizing SageMaker coaching jobs with the suitable analysis recipe. An analysis recipe is a YAML configuration file that defines how your Amazon Nova giant language mannequin (LLM) analysis job can be executed. Utilizing this analysis recipe, we measure each the mannequin’s task-specific efficiency and its alignment with the specified agent behaviors, so we will quantitatively assess the effectiveness of our customization method. The next diagram illustrates how these levels may be carried out as two separate coaching job steps. For every step, we use built-in integration with Amazon CloudWatch to entry logs and monitor system metrics, facilitating sturdy observability. After the mannequin is educated and evaluated, we deploy the mannequin utilizing the Amazon Bedrock Customized Mannequin Import performance as a part of step 3.
Conditions
You have to full the next conditions earlier than you may run the Amazon Nova Micro mannequin fine-tuning pocket book:
- Make the next quota enhance requests for SageMaker AI. For this use case, you will have to request a minimal of two
p5.48xlarge
occasion (with 8 x NVIDIA H100 GPUs) and scale to extrap5.48xlarge
cases (relying on time-to-train and cost-to-train trade-offs to your use case). On the Service Quotas console, request the next SageMaker AI quotas:- P5 cases (
p5.48xlarge
) for coaching job utilization: 2
- P5 cases (
- (Non-compulsory) You possibly can create an Amazon SageMaker Studio area (seek advice from Use fast setup for Amazon SageMaker AI) to entry Jupyter notebooks with the previous function. (You need to use JupyterLab in your native setup, too.)
- Create an AWS Id and Entry Administration (IAM) function with managed insurance policies
AmazonSageMakerFullAccess
,AmazonS3FullAccess
, andAmazonBedrockFullAccess
to offer required entry to SageMaker AI and Amazon Bedrock to run the examples. - Assign the next coverage because the belief relationship to your IAM function:
- Clone the GitHub repository with the property for this deployment. This repository consists of a pocket book that references coaching property:
Subsequent, we run the pocket book nova-micro-dpo-peft.ipynb to fine-tune the Amazon Nova mannequin utilizing DPO, and PEFT on SageMaker coaching jobs.
Put together the dataset
To arrange the dataset, you might want to load the nvidia/When2Call
dataset. This dataset supplies synthetically generated person queries, device choices, and annotated preferences primarily based on actual situations, to coach and consider AI assistants on making optimum tool-use selections in multi-step situations.
Full the next steps to format the enter in a chat completion format, and configure the info channels for SageMaker coaching jobs on Amazon Easy Storage Service (Amazon S3):
- Load the nvidia/When2Call dataset:
The DPO method requires a dataset containing the next:
- Person prompts (e.g., “Write knowledgeable electronic mail asking for a elevate”)
- Most popular outputs (preferrred responses)
- Non-preferred outputs (undesirable responses)
The next code is an instance from the unique dataset:
- As a part of knowledge preprocessing, we convert the info into the format required by Amazon Nova Micro, as proven within the following code. For examples and particular constraints of the Amazon Nova format, see Making ready knowledge for fine-tuning Understanding fashions.
For the complete knowledge conversion code, see right here.
- Break up the dataset into prepare and take a look at datasets:
- Put together the coaching and take a look at datasets for the SageMaker coaching job by saving them as
.jsonl
recordsdata, which is required by SageMaker HyperPod recipes for Amazon Nova, and establishing the Amazon S3 paths the place these recordsdata can be uploaded:
DPO coaching utilizing SageMaker coaching jobs
To fine-tune the mannequin utilizing DPO and SageMaker coaching jobs with recipes, we use the PyTorch Estimator class. Begin by setting the fine-tuning workload with the next steps:
- Choose the occasion sort and the container picture for the coaching job:
- Create the PyTorch Estimator to encapsulate the coaching setup from a specific Amazon Nova recipe:
You possibly can level to the particular recipe with the training_recipe
parameter and override the recipe by offering a dictionary as recipe_overrides
parameter.
The PyTorch Estimator
class simplifies the expertise by encapsulating code and coaching setup instantly from the chosen recipe.
On this instance, training_recipe
: fine-tuning/nova/dpo-peft-nova-micro-v1
is defining the DPO fine-tuning setup with PEFT method
- Arrange the enter channels for the PyTorch Estimator by creating an TrainingInput objects from the offered S3 bucket paths for the coaching and take a look at datasets:
- Submit the coaching job utilizing the
match
perform name on the created Estimator:
estimator.match(inputs={"prepare": train_input, "validation": test_input}, wait=True)
You possibly can monitor the job instantly out of your pocket book output. You can even refer the SageMaker AI console, which exhibits the standing of the job and the corresponding CloudWatch logs for governance and observability, as proven within the following screenshots.

SageMaker coaching jobs console

SageMaker coaching jobs system metrics
After the job is full, the educated mannequin weights can be out there in an escrow S3 bucket. This safe bucket is managed by Amazon and makes use of particular entry controls. You possibly can entry the paths shared in manifest recordsdata which might be saved in a buyer S3 bucket as a part of the coaching course of.
Consider the fine-tuned mannequin utilizing the analysis recipe
To evaluate mannequin efficiency towards benchmarks or {custom} datasets, we will use the Nova analysis recipes and SageMaker coaching jobs to execute an analysis workflow, by pointing to the mannequin educated within the earlier step. Amongst a number of supported benchmarks, similar to mmlu
, math
, gen_qa
, and llm_judge
, within the following steps we’re going to present two choices for gen_qa
and llm_judge
duties, which permit us to guage response accuracy, precision and mannequin inference high quality with the likelihood to make use of our personal dataset and examine outcomes with the bottom mannequin on Amazon Bedrock.
Possibility A: Consider gen_qa activity
- Use the code within the to organize the dataset, structured within the following format as required by the analysis recipe:
- Save the dataset as
.jsonl
recordsdata, which is required by Amazon Nova analysis recipes, and add them to the Amazon S3 path:
- Create the analysis recipe pointing to educated mannequin, validation knowledge, and the analysis metrics relevant to your use case:
- Choose the occasion sort, the container picture for the analysis job, and outline the checkpoint path the place the mannequin can be saved. The really helpful occasion varieties for the Amazon Nova analysis recipes are:
ml.g5.12xlarge
for Amazon Nova Micro and Amazon Nova Lite, andml.g5.48xlarge
for Amazon Nova Professional:
- Create the PyTorch Estimator to encapsulate the analysis setup from the created recipe:
- Arrange the enter channels for PyTorch Estimator by creating an TrainingInput objects from the offered S3 bucket paths for the validation dataset:
- Submit the coaching job:
estimator.match(inputs={"prepare": eval_input}, wait=False)
Analysis metrics can be saved by the SageMaker coaching Job in your S3 bucket, underneath the required output_path
.
The next determine and accompanying desk present the analysis outcomes towards the bottom mannequin for the gen_qa
activity:
F1 | F1 QUASI | ROUGE 1 | ROUGE 2 | ROUGE L | |
Base | 0.26 | 0.37 | 0.38 | 0.28 | 0.34 |
Wonderful-tuned | 0.46 | 0.52 | 0.52 | 0.4 | 0.46 |
% Distinction | 81% | 40% | 39% | 42% | 38% |
Possibility B: Consider llm_judge activity
- For the
llm_judge
activity, construction the dataset with the under format, the placeresponse_A
represents the bottom reality andresponse_B
represents our custom-made mannequin output:
- Following the identical method described for the
gen_qa
activity, create an analysis recipe particularly for thellm_judge
activity, by specifyingdecide
as technique:
The whole implementation together with dataset preparation, recipe creation, and job submission steps, seek advice from the pocket book nova-micro-dpo-peft.ipynb.
The next determine exhibits the outcomes for the llm_judge
activity:
This graph exhibits the choice percentages when utilizing an LLM as a decide to guage mannequin efficiency throughout two completely different comparisons. In Graph 1, the fine-tuned mannequin outperformed the bottom reality with 66% choice versus 34%, whereas in Graph 2, the bottom mannequin achieved 56% choice in comparison with the bottom reality’s 44%.
Summarized analysis outcomes
Our fine-tuned mannequin delivers vital enhancements on the tool-calling activity, outperforming the bottom mannequin throughout all key analysis metrics. Notably, the F1
rating elevated by 81%, whereas the F1 Quasi
rating improved by 35%, reflecting a considerable increase in each precision and recall. When it comes to lexical overlap, the mannequin demonstrated enhanced accuracy in matching generated solutions to reference texts —instruments to invoke and construction of the invoked perform— attaining good points of 39% and 42% for ROUGE-1
and ROUGE-2
scores, respectively. The llm_judge
analysis additional validates these enhancements, with the fine-tuned mannequin outputs being most popular in 66.2% towards the bottom reality outputs. These complete outcomes throughout a number of analysis frameworks affirm the effectiveness of our fine-tuning method in elevating mannequin efficiency for real-world situations.
Deploy the mannequin on Amazon Bedrock
To deploy the fine-tuned mannequin, we will use the Amazon Bedrock CreateCustomModel
API and use Bedrock On-demand inference with the native mannequin invocation instruments. To deploy the mannequin, full the next steps:
- Create a {custom} mannequin, by pointing to the mannequin checkpoints saved within the escrow S3 bucket:
- Monitor the mannequin standing. Wait till the mannequin reaches the standing
ACTIVE
orFAILED
:
When the mannequin import is full, you will notice it out there by the AWS CLI:
- Configure Amazon Bedrock Customized Mannequin on-demand inference:
- Monitor the mannequin deployment standing. Wait till the mannequin reaches the standing
ACTIVE
orFAILED
:
- Run mannequin inference by AWS SDK:
- Submit the inference request by utilizing the
converse
API:
We get the next output response:
Clear up
To wash up your assets and keep away from incurring extra fees, observe these steps:
- Delete unused SageMaker Studio assets
- (Non-compulsory) Delete the SageMaker Studio area
- On the SageMaker console, select Coaching within the navigation pane and confirm that your coaching job isn’t working anymore.
- Delete {custom} mannequin deployments in Amazon Bedrock. To take action, use the AWS CLI or AWS SDK to delete it.
Conclusion
This put up demonstrates how one can customise Amazon Nova understanding fashions utilizing the DPO recipe on SageMaker coaching jobs. The detailed walkthrough with a particular concentrate on optimizing device calling capabilities showcased vital efficiency enhancements, with the fine-tuned mannequin attaining as much as 81% higher F1 scores in comparison with the bottom mannequin with coaching dataset of round 8k data.
The absolutely managed SageMaker coaching jobs and optimized recipes simplify the customization course of, so organizations can adapt Amazon Nova fashions for domain-specific use instances. This integration represents a step ahead in making superior AI customization accessible and sensible for organizations throughout industries.
To start utilizing the Nova-specific recipes, go to the SageMaker HyperPod recipes repository, the SageMaker Distributed Coaching workshop and the Amazon Nova Samples repository for instance implementations. Our group continues to increase the recipe panorama primarily based on buyer suggestions and rising machine studying developments, so you’ve got the instruments wanted for profitable AI mannequin coaching.
In regards to the authors
Mukund Birje is a Sr. Product Advertising and marketing Supervisor on the AIML group at AWS. In his present function he’s centered on driving adoption of Amazon Nova Basis Fashions. He has over 10 years of expertise in advertising and marketing and branding throughout quite a lot of industries. Exterior of labor you’ll find him mountain climbing, studying, and making an attempt out new eating places. You possibly can join with him on LinkedIn.
Karan Bhandarkar is a Principal Product Supervisor with Amazon Nova. He focuses on enabling prospects to customise the muse fashions with their proprietary knowledge to raised tackle particular enterprise domains and trade necessities. He’s keen about advancing Generative AI applied sciences and driving real-world affect with Generative AI throughout industries.
Kanwaljit Khurmi is a Principal Worldwide Generative AI Options Architect at AWS. He collaborates with AWS product groups, engineering departments, and prospects to supply steering and technical help, serving to them improve the worth of their hybrid machine studying options on AWS. Kanwaljit makes a speciality of aiding prospects with containerized purposes and high-performance computing options.
Bruno Pistone is a Senior World Extensive Generative AI/ML Specialist Options Architect at AWS primarily based in Milan, Italy. He works with AWS product groups and enormous prospects to assist them absolutely perceive their technical wants and design AI and Machine Studying options that take full benefit of the AWS cloud and Amazon Machine Studying stack. His experience contains: mannequin customization, generative AI, and end-to-end Machine Studying. He enjoys spending time with buddies, exploring new locations, and touring to new locations.