Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

Enhance your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

admin by admin
June 7, 2026
in Artificial Intelligence
0
Enhance your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


AI brokers can autonomously deal with complicated, multi-step duties, however their effectiveness relies on calling the appropriate instruments to retrieve info or take motion. When an agent picks the improper software, codecs parameters incorrectly, or breaks a workflow chain, process completion instances develop, error charges rise, help prices enhance, and consumer experiences degrade. As extra organizations transfer agentic functions from pilot to manufacturing, having brokers that choose the appropriate software for every request is crucial for dependable automation.

On this submit, you discover ways to use Supervised Effective-Tuning (SFT) and Direct Choice Optimization (DPO) collectively to enhance the tool-calling accuracy of a small language mannequin (SLM). The instance makes use of Amazon SageMaker AI coaching jobs, so you possibly can deal with coaching code as a substitute of managing your personal coaching infrastructure. You additionally discover ways to consider tool-calling accuracy and examine a base mannequin to a number of fine-tuned variants, so you can also make data-driven choices about mannequin high quality.

Effective-tuning methodologies

Supervised fine-tuning entails curating a high-quality dataset that aligns carefully with the mannequin’s meant perform, offering specific examples of how the mannequin ought to carry out sure duties or work together with particular instruments. This methodology is especially efficient for educating the mannequin to acknowledge the nuances of tool-specific language, instructions, and constraints.

Direct Choice Optimization refines these interactions by incorporating human suggestions or predefined aims instantly into the coaching loop. DPO aligns the mannequin’s output extra carefully with goal outcomes by emphasizing a choice for sure varieties of responses or behaviors over others. The coaching information in DPO accommodates a “like this, not like that” choice, which optimizes the identical objectives as reinforcement studying with out reward capabilities or reward fashions. This method reduces useful resource necessities and coaching time whereas sustaining high quality.

Diagram showing the Direct Preference Optimization training flow that compares preferred and rejected responses to align model outputs with human preferences

Supply: arXiv:2305.18290 [cs.LG]

For instance, the HuggingFace TRL library for DPO takes coaching samples within the following format:

{
    "immediate": [""],
    "chosen": "",  # rated higher than ok
    "rejected": "",  # rated worse than j
}

This feedback-driven method permits for iterative enchancment of the mannequin’s tool-interaction capabilities primarily based on real-world utilization patterns within the coaching information.

Collectively, SFT and DPO kind a strong framework for fine-tuning language fashions to interface with a variety of digital instruments. Through the use of these strategies, you possibly can construct AI methods that perceive and generate human-like textual content and that carry out complicated duties by autonomously interacting with exterior functions, broadening the scope and utility of AI in each shopper and enterprise environments.

To grasp the prices related to Amazon SageMaker Studio notebooks and Amazon SageMaker AI coaching jobs, check with the SageMaker AI pricing web page.

Resolution overview

On this part, we stroll by means of the right way to fine-tune Qwen3 1.7B on Amazon SageMaker AI coaching jobs, a totally managed service that helps distributed multi-GPU and multi-node configurations. With SageMaker AI coaching jobs, you possibly can spin up high-performance clusters on demand, prepare billion-parameter fashions quicker, and mechanically shut down sources when the job finishes. Metrics from infrastructure and from contained in the coaching loop are despatched to MLflow on SageMaker AI for later evaluation.

Stipulations

To fine-tune function-calling fashions on SageMaker AI, you want the next stipulations:

Arrange your surroundings

Within the following sections, we run the code from a SageMaker Studio JupyterLab pocket book occasion. You may as well use your most well-liked IDE, resembling VS Code or PyCharm. Be sure that your native surroundings is configured to work with AWS, as listed within the stipulations.

Full the next steps to arrange your surroundings:

  1. On the SageMaker AI console, select Domains within the navigation pane, then open your area.
  2. Within the navigation pane below Purposes and IDEs, select Studio.
  3. On the Consumer profiles tab, find your consumer profile, then select Launch and Studio.
  4. In SageMaker Studio, launch an ml.t3.medium JupyterLab pocket book occasion with a minimum of 50 GB of storage. A big pocket book occasion isn’t required as a result of the fine-tuning job runs on a separate ephemeral coaching job occasion with NVIDIA accelerators.
  5. To start fine-tuning, clone the GitHub repository: git clone https://github.com/aws-samples/amazon-sagemaker-generativeai.git.
  6. Navigate to the 6_use_cases/usecases/function-calling-sft-dpo listing.
  7. Launch the run_training_job.ipynb pocket book with a Python 3.12 or greater model kernel.

Dataset preparation

Selecting and creating the appropriate dataset is a crucial first step in fine-tuning basis fashions (FMs). This instance makes use of the When2Call dataset revealed by NVIDIA, a benchmark designed to judge tool-calling decision-making for FMs. It consists of when to generate a software name, when to ask follow-up questions, when to point that the query can’t be answered with the instruments supplied, and what to do if the query appears to require software use however a software name can’t be made.

The analysis code and artificial information era scripts used to generate the datasets are in NVIDIA’s GitHub repository.

The datasets comprise three totally different elements.

  1. Dataset for supervised fine-tuning (SFT), which accommodates 15,000 samples.
    from datasets import load_dataset
    train_sft_ds = load_dataset("nvidia/When2Call", "train_sft")
    train_sft_ds
    DatasetDict({
        prepare: Dataset({
            options: ['tools', 'messages'],
            num_rows: 15000
        })

  2. Dataset for choice alignment, which makes use of Direct Choice Optimization (DPO) on this instance. This information accommodates 9,000 samples.
    from datasets import load_dataset
    train_pref_ds = load_dataset("nvidia/When2Call", "train_pref")
    train_pref_ds
    
    DatasetDict({
        prepare: Dataset({
            options: ['tools', 'messages', 'chosen_response', 'rejected_response'],
            num_rows: 9000
        })
    })

  3. The dataset for testing efficiency has two information: Multi-Alternative Query analysis (mcq) and LLM-as-a-judge (llm_judge), which is a subset of the MCQ analysis set and might be downloaded as a single DatasetDict.
    from datasets import load_dataset
    test_ds = load_dataset("nvidia/When2Call", "check")
    test_ds
    
    DatasetDict({
        llm_judge: Dataset({
            options: ['uuid', 'source', 'source_id', 'question', 'correct_answer', 'answers', 'target_tool', 'tools', 'orig_tools', 'orig_question', 'held_out_param'],
            num_rows: 300
        })
        mcq: Dataset({
            options: ['uuid', 'source', 'source_id', 'question', 'correct_answer', 'answers', 'target_tool', 'tools', 'orig_tools', 'orig_question', 'held_out_param'],
            num_rows: 3652
        })
    })

For this use case, we have to do a little bit of preprocessing on the dataset to match the anticipated codecs for TRL’s SFTTrainer and DPOTrainer. To do this, we have to construct a system immediate that accommodates the listing of obtainable instruments and add the system immediate to the messages lists from the unique dataset.

def generate_and_tokenize_prompt(data_point):
    """
    Generates a software utilizing immediate primarily based on affected person info.

    Args:
        data_point (dict): Dictionary containing goal and meaning_representation keys

    Returns:
        dict: Dictionary containing the formatted immediate
    """
    full_prompt = f"""
    You're a useful assistant with entry to the next instruments or perform calls. Your process is to supply a sequence of instruments or perform calls essential to generate response to the consumer utterance. Use the next instruments or perform calls as required:
    {data_point["tools"]}
    """
    return {"system_prompt": full_prompt.strip()}

dstrain_sft = dstrain_sft.map(
    generate_and_tokenize_prompt,
    batched=False

convos=[]
for mess, sys in zip(dstrain_sft['train']['messages'], dstrain_sft['train']['system_prompt']):
    message = {
        "content material": f"{sys}",
        "function": "system"
    }
    convos.append([message, mess[0], mess[1]])
dstrain_sft = dstrain_sft.rename_column("messages", "messages_1")
dstrain_sft['train'] = dstrain_sft['train'].add_column("messages", convos)

Along with what we did for SFT, we have to put together the information for DPO. The DPOTrainer from TRL accepts a particular format that features columns labeled as chosen and rejected along with messages, so we have to create the messages column and rename chosen_response and rejected_response.

ds_train_pref = ds_train_pref.map(
    generate_and_tokenize_prompt,
    batched=False

ds_train_pref = ds_train_pref.rename_column("chosen_response", "chosen")
ds_train_pref = ds_train_pref.rename_column("rejected_response", "rejected")

Now, save the SFT and DPO datasets in Amazon Easy Storage Service (Amazon S3) to make them out there for coaching.

# save train_dataset to s3 utilizing our SageMaker session
input_path = f's3://{sagemaker_session.default_bucket()}/datasets/nvidia_function_calling'

# Save datasets to s3
# We'll high quality tune solely with 20 information as a consequence of restricted compute useful resource for the workshop
dstrain_sft["train"].to_json(f"{input_path}/prepare/dataset.json", orient="information")
sft_dataset_s3_path = f"{input_path}/prepare/dataset.json"
ds_train_pref["train"].to_json(f"{input_path}/pref/dataset.json", orient="information")
perf_dataset_s3_path = f"{input_path}/pref/dataset.json"
# ds_train_pref["train"].to_json(f"{input_path}/pref/dataset.json", orient="information")
# perf_dataset_s3_path = f"{input_path}/pref/dataset.json"
print(f"Coaching information uploaded to:")
print(sft_dataset_s3_path)
print(f"DPO information uploaded to:")
print(perf_dataset_s3_path)
print(f"https://s3.console.aws.amazon.com/s3/buckets/{sagemaker_session.default_bucket()}/?area={sagemaker_session.boto_region_name}&prefix={input_path.break up('/', 3)[-1]}/")

Supervised fine-tuning (SFT) on the bottom mannequin

The next instance demonstrates the right way to fine-tune the Qwen3-1.7B mannequin. The repository accommodates the recipe within the scripts listing, the place you possibly can modify the bottom mannequin and coaching parameters for SFT. This instance makes use of a Spectrum-based fine-tuning recipe, however it’s also possible to use different PEFT strategies like LoRA or QLoRA.

The recipe accommodates the configuration for the mannequin and coaching parameters:

# Mannequin arguments
model_name_or_path: Qwen/Qwen3-1.7B
tokenizer_name_or_path: Qwen/Qwen3-1.7B
model_revision: foremost
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
output_dir: /choose/ml/mannequin/Qwen3-1.7B-function-calling

# Dataset arguments
dataset_id_or_path: /choose/ml/enter/information/dataset/dataset.json
max_seq_length: 2048
packing: true

# Spectrum arguments
spectrum_config_path: /choose/ml/enter/information/code/spectrum-layer/snr_results_Qwen-Qwen3-1.7B_unfrozenparameters_50percent.yaml

# Coaching arguments
num_train_epochs: 10
per_device_train_batch_size: 4
gradient_accumulation_steps: 2
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
learning_rate: 5.0e-5
lr_scheduler_type: cosine
warmup_ratio: 0.1

# Logging arguments
logging_strategy: steps
logging_steps: 5
report_to:
- wandb
save_strategy: "no" # "epoch"
seed: 42

# Hugging Face Hub
push_to_hub: false
# hub_model_id: # if not outlined similar as output_dir
hub_strategy: every_save

Create a coaching job with SageMaker AI ModelTrainer

Subsequent, we use a SageMaker AI coaching job to spin up a coaching cluster and run the mannequin fine-tuning. The SageMaker AI Python SDK ModelTrainer APIs run coaching jobs on totally managed infrastructure, dealing with surroundings setup, scaling, and artifact administration. Through the use of ModelTrainer, you possibly can specify coaching scripts, enter information, and compute sources with out manually provisioning servers.

First, configure the coaching surroundings:

from sagemaker.config import load_sagemaker_config
configs = load_sagemaker_config()
from sagemaker.modules.prepare import ModelTrainer
from sagemaker.modules.configs import Compute, SourceCode, InputData, StoppingCondition, CheckpointConfig
env = {}
env["FI_PROVIDER"] = "efa"
env["NCCL_PROTO"] = "easy"
env["NCCL_SOCKET_IFNAME"] = "eth0"
env["NCCL_IB_DISABLE"] = "1"
env["NCCL_DEBUG"] = "WARN"
env["HF_token"] = os.environ['hf_token'] #required for gated fashions, might be omitted for others
env["data_location"] = sft_dataset_s3_path

To allow experiment monitoring in MLflow, provide the MLflow monitoring server ARN to the job.

# MLflow tracker
tracking_server_arn = ""
env["MLFLOW_TRACKING_ARN"] = tracking_server_arn

The Compute part of the coaching setup determines the infrastructure necessities for coaching. Within the SourceCode part, we outline the native paths to code that might be imported into the coaching job.

compute = Compute(
    instance_count=1,
    instance_type= "ml.p4d.24xlarge",
    volume_size_in_gb=96,
    keep_alive_period_in_seconds=3600,
)

source_code = SourceCode(
    source_dir="./scripts",
    necessities="necessities.txt",
    entry_script="run_training_sft.sh",
)

The next is the listing construction for fine-tuning on SageMaker AI coaching jobs. We additionally present the necessities.txt file within the scripts listing, which ModelTrainer mechanically detects and installs the listed dependencies at runtime. For superior eventualities resembling disabling construct isolation, you possibly can present a bash script because the entry level to run shell instructions previous to beginning coaching.

scripts/
├── accelerate_configs/ # Speed up configuration information
├── run_training_sft.sh # Launch script for distributed coaching with Speed up on SageMaker coaching jobs
├── run_training_dpo.sh # Launch script for distributed coaching with Speed up on SageMaker coaching jobs
├── run_sft.py # Primary coaching script for supervised fine-tuning (SFT)
├── run_dpo.py # Primary coaching script for Direct Choice Optimization (DPO)
├── recipes/ # Predefined coaching configuration recipes (YAML)
└── necessities.txt # Python dependencies put in at runtime

Subsequent, specify the Amazon Elastic Container Registry (Amazon ECR) location for the coaching container, the place to retailer mannequin checkpoints, and what to call the SageMaker AI coaching job. These values are equipped to the ModelTrainer API to configure the job.

image_uri = f"763104351884.dkr.ecr.{sagemaker_session.boto_session.region_name}.amazonaws.com/pytorch-training:2.8.0-gpu-py312-cu129-ubuntu22.04-sagemaker"

checkpoint_s3_path = f"s3://{bucket_name}/function-calling-sft-checkpoints/checkpoints"

job_prefix = f"model-trainer-distributed-function-calling-sft"

model_trainer = ModelTrainer(
    training_image=image_uri,
    compute=compute,
    hyperparameters=hyperparameters,
    surroundings=env,
    source_code=source_code,
    stopping_condition=StoppingCondition(
        max_runtime_in_seconds=90000,
    ),
    checkpoint_config=CheckpointConfig(
        s3_uri=f"{checkpoint_s3_path}/{job_prefix}",
    ),
    base_job_name=job_prefix

)

Lastly, configure the enter information parameters for the place the coaching information resides and begin the SFT coaching job with .prepare().

training_data = InputData(
    channel_name="training_dataset",
    data_source=sft_dataset_s3_path,
)

model_trainer.prepare(input_data_config=[training_data], wait=True)

To fine-tune throughout a number of GPUs, we use Hugging Face Speed up and DeepSpeed ZeRO-3, which work collectively to coach fashions throughout a number of GPUs or nodes extra effectively. Hugging Face Speed up streamlines distributed coaching launches by mechanically dealing with machine placement, course of administration, and combined precision settings. DeepSpeed ZeRO-3 reduces reminiscence utilization by partitioning optimizer states, gradients, and parameters throughout GPUs, so billion-parameter fashions match and prepare quicker.

You possibly can run your SFTTrainer script with Hugging Face Speed up utilizing a command like the next:

NUM_GPUS=$(nvidia-smi --list-gpus | wc -l)
echo "Detected ${NUM_GPUS} GPUs on the machine"
speed up launch 
    --config_file accelerate_configs/deepspeed_zero3.yaml 
    --num_processes ${NUM_GPUS} run_sft.py 
    --config receipes/Qwen3-0.6B-spectrum.yaml

With the SFT mannequin artifact prepared, now you can use that as a base mannequin for DPO coaching. The DPO coaching recipe seems to be much like the SFT one with just a few small modifications.

  • beta – This can be a DPO-specific hyperparameter, sometimes certain between 0–2, that controls how aggressively the mannequin adopts new preferences. A price nearer to 0 is extra aggressive and a worth nearer to 2 is extra conservative. A typical start line is 0.1 to 0.5, which may drive important modifications in habits. Nonetheless, this may result in excessive variance and even degradation. The optimum worth is extremely depending on the dataset.
  • learning_rate – DPO advantages from decrease studying charges (for instance, 5e-7) with a warmup_ratio to stop overfitting. This worth contrasts with the SFT learning_rate from the earlier run of 5e-5. Though this instance makes use of a continuing lr_scheduler_type, cosine annealing is one other widespread choice.
  • batch_size – Giant batch sizes are inclined to carry out higher. The batch measurement on this instance is deliberately small to scale back useful resource necessities.
# Mannequin arguments
model_name_or_path: /choose/ml/enter/mannequin/Qwen3-1.7B-function-calling/
tokenizer_name_or_path: Qwen/Qwen3-1.7B
model_revision: foremost
torch_dtype: bfloat16
attn_implementation: flash_attention_2
bf16: true
tf32: true
output_dir: /choose/ml/mannequin/sft-dpo-qwen-3-1.7b-function-calling

# Dataset arguments
dataset_id_or_path: /choose/ml/enter/information/dataset/dataset.json

# Coaching arguments
beta: 0.1 # hyperparameter that controls how a lot the fine-tuned mannequin is allowed to diverge from its unique, reference mannequin
max_length: 1536
max_prompt_length: 768
loss_type: sigmoid
num_train_epochs: 10
per_device_train_batch_size: 2
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
  use_reentrant: true
learning_rate: 5.0e-7
lr_scheduler_type: fixed
warmup_ratio: 0.03

# Logging arguments
logging_strategy: steps
logging_steps: 5
report_to:
- mlflow
save_strategy: "no"
seed: 42

Optionally, you possibly can present a mix of loss values to carry out Combined Choice Optimization, which permits for the mixture and weighting of a number of loss sorts. On this instance, there’s SFT coaching information and DPO coaching information which are run individually. In case you solely have DPO coaching information, you need to use MPO with the sft loss kind to make use of the accepted column within the DPO information for SFT. If potential, offering separate, distinctive datasets leads to a bigger corpus of information and higher outcomes.

# MPO (Combined Choice Optimization): Combines DPO (sigmoid) for choice and BCO (bco_pair) for high quality

loss_type : ["sigmoid", "bco_pair", "sft"], # Loss sorts to mix
loss_weights : [0.8, 0.2, 1.0] # Corresponding weights, as used within the MPO paper

If loss_weights is omitted, all loss sorts can have equal weights (1.0 by default).

Direct Choice Optimization (DPO) coaching on the SFT-trained mannequin

Within the DPO instance, we present how one can cross configuration information into the coaching container as hyperparameters or as surroundings variables. The previous is picked up within the coaching script with TRLParser and the latter with Python os.environ references.

The DPO coaching configuration is outlined as follows:

from sagemaker.config import load_sagemaker_config
from sagemaker.modules.prepare import ModelTrainer
from sagemaker.modules.configs import Compute, SourceCode, InputData, StoppingCondition, CheckpointConfig

configs = load_sagemaker_config()

env = {}
env["FI_PROVIDER"] = "efa"
env["NCCL_PROTO"] = "easy"
env["NCCL_SOCKET_IFNAME"] = "eth0"
env["NCCL_IB_DISABLE"] = "1"
env["NCCL_DEBUG"] = "WARN"
env["HF_token"] = os.environ['hf_token'] #required for gated fashions, might be omitted for others
env["data_location"] = perf_dataset_s3_path
env["model_location"] = model_data

# MLflow tracker
tracking_server_arn = ""
env["MLFLOW_TRACKING_ARN"] = tracking_server_arn

compute = Compute(
    instance_count=1,
    instance_type= "ml.p4d.24xlarge",
    volume_size_in_gb=96,
    keep_alive_period_in_seconds=3600,
)

image_uri = f"763104351884.dkr.ecr.{sagemaker_session.boto_session.region_name}.amazonaws.com/pytorch-training:2.8.0-gpu-py312-cu129-ubuntu22.04-sagemaker"

checkpoint_s3_path = f"s3://{bucket_name}/function-calling-dpo-checkpoints/checkpoints"

job_prefix = f"model-trainer-distributed-function-calling-dpo"

hyperparameters = {
    "dataset_path": "/choose/ml/enter/information/dataset",
    "model_dir": "/choose/ml/mannequin",
}

source_code = SourceCode(
    source_dir="./scripts",
    necessities="necessities.txt",
    entry_script="run_training_dpo.sh",
)

model_trainer = ModelTrainer(
    training_image=image_uri,
    compute=compute,
    hyperparameters=hyperparameters,
    surroundings=env,
    source_code=source_code,
    stopping_condition=StoppingCondition(
        max_runtime_in_seconds=90000,
    ),
    checkpoint_config=CheckpointConfig(
        s3_uri=f"{checkpoint_s3_path}/{job_prefix}",
    ),
    base_job_name=job_prefix

)

training_data = InputData(
    channel_name="training_dataset",
    data_source=perf_dataset_s3_path,
)

Then kick off the coaching job for DPO:

model_trainer.prepare(input_data_config=[training_data], wait=True)

Outcomes

We ran the experiment for 3 totally different fashions, utilizing the NVIDIA-provided script for analysis, with the next outcomes. Among the many base fashions, Qwen3-0.6B was the strongest performer out of the field regardless of being the smallest, beating Qwen3-1.7B by roughly 6 p.c and Llama-3.2-3B-instruct by roughly 1 p.c.

After a cycle of fine-tuning, the rankings change. The Qwen3-1.7B mannequin positive factors roughly 19 p.c in accuracy and outperforms the others by roughly 4–7 p.c. The spherical of choice optimization was additionally efficient, including one other roughly 10.5 p.c accuracy and ending the experiment within the lead by roughly 8–9 p.c over the opposite fashions.

This exhibits the effectiveness of a multi-step method to mannequin customization. Qwen3-1.7B gained 30 p.c in total accuracy and carried out 9 p.c higher than the Llama-3.2-3B mannequin, which has nearly twice the parameter depend. Reaching comparable or higher efficiency with a smaller mannequin can scale back price and enhance throughput when it’s time to host the mannequin.

Mannequin Tuning Approach Acc-Norm
Llama 3.2 3B Instruct Base 46.50%
Llama 3.2 3B Instruct Spectrum SFT 53.41%
Llama 3.2 3B Instruct Spectrum SFT + DPO 62.67%
Qwen3-0.6B Base 47.64%
Qwen3-0.6B Spectrum SFT 56.10%
Qwen3-0.6B Spectrum SFT + DPO 62.02%
Qwen3-1.7B Base 41.57%
Qwen3-1.7B Spectrum SFT 60.43%
Qwen3-1.7B Spectrum SFT + DPO 71.06%

Clear up

To keep away from incurring prices for sources you now not want, full the next clean-up steps:

  • Delete any SageMaker AI coaching jobs you launched. Coaching jobs that full efficiently don’t proceed to incur prices, however you possibly can clear up information from the SageMaker AI console or with the AWS CLI.
  • Take away the datasets you uploaded to Amazon S3:
    aws s3 rm s3:///datasets/nvidia_function_calling/ --recursive

  • Cease or delete the SageMaker Studio JupyterLab pocket book occasion to keep away from idle prices.
  • Delete any mannequin checkpoints saved in Amazon S3 that you simply now not want.

Conclusion

On this submit, we confirmed the right way to enhance an agent’s tool-calling accuracy by combining supervised fine-tuning (SFT) with Direct Choice Optimization (DPO) on Amazon SageMaker AI. SFT makes use of labeled datasets to refine mannequin parameters, so the mannequin develops a foundational understanding by studying from expert-annotated examples. DPO then aligns the mannequin’s outputs with human preferences or particular efficiency standards by means of direct suggestions, with out the necessity to outline reward capabilities.

By integrating these two methodologies, you get a better-performing mannequin that advantages from the structured, knowledge-driven method of SFT and the adaptability and user-centered refinement of DPO. The result’s a mannequin that’s extra correct, extra related, and higher aligned with how customers need it to behave.

For extra examples on fine-tuning basis fashions, go to the SageMaker AI generative AI samples GitHub repository. For extra details about coaching fashions in SageMaker AI, see the SageMaker AI documentation.


Concerning the authors

Amin Dashti

Amin Dashti

Amin is a Senior Knowledge Scientist and researcher at AWS who bridges deep theoretical perception with sensible machine studying experience. With a background in theoretical physics and over eight years of expertise, he has designed and deployed scalable fashions throughout domains, together with predictive analytics and statistical inference in monetary methods and functions in laptop imaginative and prescient (CV) and pure language processing (NLP).

Giuseppe Zappia

Giuseppe Zappia

Giuseppe is a Principal Generative AI Specialist Options Architect at AWS, centered on serving to massive enterprises design and deploy generative AI options on AWS. He has over 20 years of expertise as a full stack software program engineer and has spent the previous 7 years at AWS centered on the sphere of AI.

Tags: accuracyAgentsAmazonDPOImproveSageMakerSFTtoolcalling
Previous Post

Constructing Semantic Search with Transformers.js and Sentence Embeddings

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • Greatest practices for Amazon SageMaker HyperPod activity governance

    Greatest practices for Amazon SageMaker HyperPod activity governance

    405 shares
    Share 162 Tweet 101
  • How Cursor Really Indexes Your Codebase

    404 shares
    Share 162 Tweet 101
  • Construct a serverless audio summarization resolution with Amazon Bedrock and Whisper

    403 shares
    Share 161 Tweet 101
  • Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

    403 shares
    Share 161 Tweet 101
  • Optimizing Mixtral 8x7B on Amazon SageMaker with AWS Inferentia2

    403 shares
    Share 161 Tweet 101

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • Enhance your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
  • Constructing Semantic Search with Transformers.js and Sentence Embeddings
  • Choosing an Experimentation Platform: A Retrospective
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.