Customise DeepSeek-R1 671b mannequin utilizing Amazon SageMaker HyperPod recipes

This publish is the second a part of the DeepSeek collection specializing in mannequin customization with Amazon SageMaker HyperPod recipes (or recipes for brevity). In Half 1, we demonstrated the efficiency and ease of fine-tuning DeepSeek-R1 distilled fashions utilizing these recipes. On this publish, we use the recipes to fine-tune the unique DeepSeek-R1 671b parameter mannequin. We reveal this by way of the step-by-step implementation of those recipes utilizing each SageMaker coaching jobs and SageMaker HyperPod.

Enterprise use case

After its public launch, DeepSeek-R1 mannequin, developed by DeepSeek AI, confirmed spectacular outcomes throughout a number of analysis benchmarks. The mannequin follows the Combination of Specialists (MoE) structure and has 671 billion parameters. Historically, massive fashions are nicely tailored for a large spectrum of generalized duties by the advantage of being educated on the massive quantity of knowledge. The DeepSeek-R1 mannequin was educated on 14.8 trillion tokens. The unique R1 mannequin demonstrates robust few-shot or zero-shot studying capabilities, permitting it to generalize to new duties and eventualities that weren’t a part of its authentic coaching.

Nonetheless, many shoppers favor to both fine-tune or run steady pre-training of those fashions to adapt it to their particular enterprise functions or to optimize it for particular duties. A monetary group would possibly need to customise the mannequin with their customized knowledge to help with their knowledge processing duties. Or a hospital community can fine-tune it with their affected person data to behave as a medical assistant for his or her medical doctors. High quality-tuning may also prolong the mannequin’s generalization capacity. Prospects can fine-tune it with a corpus of textual content in particular languages that aren’t totally represented within the authentic coaching knowledge. For instance, a mannequin fine-tuned with an extra trillion tokens of Hindi language will be capable to develop the identical generalization capabilities to Hindi.

The choice on which mannequin to fine-tune relies on the top utility in addition to the accessible dataset. Based mostly on the quantity of proprietary knowledge, clients can determine to fine-tune the bigger DeepSeek-R1 mannequin as an alternative of doing it for one of many distilled variations. As well as, the R1 fashions have their very own set of guardrails. Prospects would possibly need to fine-tune to replace these guardrails or develop on them.

High quality-tuning bigger fashions like DeepSeek-R1 requires cautious optimization to steadiness value, deployment necessities, and efficiency effectiveness. To realize optimum outcomes, organizations should meticulously choose an applicable setting, decide the most effective hyperparameters, and implement environment friendly mannequin sharding methods.

Resolution structure

SageMaker HyperPod recipes successfully deal with these necessities by offering a rigorously curated mixture of distributed coaching strategies, optimizations, and configurations for state-of-the-art (SOTA) open supply fashions. These recipes have undergone intensive benchmarking, testing, and validation to supply seamless integration with the SageMaker coaching and fine-tuning processes.

On this publish, we discover options that reveal find out how to fine-tune the DeepSeek-R1 mannequin utilizing these recipes on both SageMaker HyperPod or SageMaker coaching jobs. Your alternative between these companies will rely in your particular necessities and preferences. In case you require granular management over coaching infrastructure and intensive customization choices, SageMaker HyperPod is the perfect alternative. SageMaker coaching jobs, then again, is tailor-made for organizations that desire a totally managed expertise for his or her coaching workflows. To be taught extra particulars about these service options, confer with Generative AI basis mannequin coaching on Amazon SageMaker.

The next diagram illustrates the answer structure for coaching utilizing SageMaker HyperPod. With HyperPod, customers can start the method by connecting to the login/head node of the Slurm cluster. Every step is run as a Slurm job and makes use of Amazon FSx for Lustre for storing mannequin checkpoints. For DeepSeek-R1, the method consists of the next steps:

Obtain the DeepSeek-R1 mannequin and convert weights from FP8 to BF16 format
Load the mannequin into reminiscence and carry out fine-tuning utilizing Quantized Low-Rank Adaptation (QLoRA)
Merge QLoRA adapters with the bottom mannequin
Convert and cargo the mannequin for batch analysis

The next diagram illustrates the answer structure for SageMaker coaching jobs. You may execute every step within the coaching pipeline by initiating the method by way of the SageMaker management airplane utilizing APIs, AWS Command Line Interface (AWS CLI), or the SageMaker ModelTrainer SDK. In response, SageMaker launches coaching jobs with the requested quantity and kind of compute situations to run particular duties. For DeepSeek-R1, the method consists of three predominant steps:

Obtain and convert R1 to BF16 datatype format
Load the mannequin into reminiscence and carry out fine-tuning
Consolidate and cargo the checkpoints into reminiscence, then run inference and metrics to guage efficiency enhancements

Conditions

Full the next stipulations earlier than operating the DeepSeek-R1 671B mannequin fine-tuning pocket book:

Make the next quota improve requests for SageMaker. You must request a minimal of two ml.p5.48xlarge situations (with 8 x NVIDIA H100 GPUs) ranging to a most of 4 ml.p5.48xlarge situations (relying on time-to-train and cost-to-train trade-offs on your use case). On the Service Quotas console, request the next SageMaker quotas. It could actually take as much as 24 hours for the quota improve to be authorised:
- P5 situations (ml.p5.48xlarge) for coaching job utilization: 2–4
- P5 situations (ml.p5.48xlarge) for HyperPod clusters (ml.p5.48xlarge for cluster utilization): 2–4
In case you select to make use of HyperPod clusters to run your coaching, arrange a HyperPod Slurm cluster, referring to Amazon SageMaker HyperPod Developer Information. Alternatively, you can too use the AWS CloudFormation template supplied within the Personal Account workshop and observe the directions to arrange a cluster and a improvement setting to entry and submit jobs to the cluster.
(Optionally available) In case you select to make use of SageMaker coaching jobs, you may create an Amazon SageMaker Studio area (confer with Use fast setup for Amazon SageMaker AI) to entry Jupyter notebooks with the previous position (You should utilize JupyterLab in your native setup too).
1. Create an AWS Identification and Entry Administration (IAM) position with managed insurance policies AmazonSageMakerFullAccess, AmazonFSxFullAccess, and AmazonS3FullAccess to offer the required entry to SageMaker to run the examples.
Clone the GitHub repository with the property for this deployment. This repository consists of a pocket book that references coaching property:

git clone https://github.com/aws-samples/sagemaker-distributed-training-workshop.git
cd 18_sagemaker_training_recipes/ft_deepseek_r1_qlora

Resolution walkthrough

To carry out the answer, observe the steps within the subsequent sections.

Technical concerns

The default weights supplied by the DeepSeek crew on their official R1 repository are of kind FP8. Nonetheless, we selected to disable FP8 in our recipes as a result of we empirically discovered that coaching with BF16 enhances generalization throughout various datasets with minimal adjustments to the recipe hyperparameters. Subsequently, to attain steady fine-tuning for a mannequin of 671b parameter measurement, we suggest first changing the mannequin from FP8 to BF16 utilizing the fp8_cast_bf16.py command-line script supplied by DeepSeek. Executing this script will copy over the transformed BF16 weights in Safetensor format to the required output listing. Keep in mind to repeat over the mannequin’s config.yaml to the output listing so the weights are loaded precisely. These steps are encapsulated in a prologue script and are documented step-by-step below the High quality-tuning part.

Prospects can use a sequence size of 8K for coaching, as examined on a p5.48xlarge occasion, every geared up with eight NVIDIA H100 GPUs. It’s also possible to select a smaller sequence size if wanted. Coaching with a sequence size larger than 8K would possibly result in out-of-memory points with GPUs. Additionally, changing mannequin weights from FP8 to BF16 requires a p5.48xlarge occasion, which can be advisable for coaching because of the mannequin’s excessive host reminiscence necessities throughout initialization.

Prospects should improve their transformers model to transformers==4.48.2 to run the coaching.

High quality-tuning

Run the finetune_deepseek_r1_671_qlora.ipynb pocket book to fine-tune the DeepSeek-R1 mannequin utilizing QLoRA on SageMaker.

Put together the dataset

This part covers loading the FreedomIntelligence/medical-o1-reasoning-SFT dataset, tokenizing and chunking the dataset, and configuring the information channels for SageMaker coaching on Amazon Easy Storage Service (Amazon S3). Full the next steps:

Format the dataset by making use of the immediate format for DeepSeek-R1:

def generate_prompt(data_point):
full_prompt = f"""
Under is an instruction that describes a activity, paired with an enter
that gives additional context.
Write a response that appropriately completes the request.
Earlier than answering, consider carefully in regards to the query and create a step-by-step chain of ideas to make sure a logical and correct response.

### Instruction:
You're a medical knowledgeable with superior information in medical reasoning, diagnostics, and therapy planning.
Please reply the next medical query.

### Query:
{data_point["Question"]}

### Response:
{data_point["Complex_CoT"]}

"""
return {"immediate": full_prompt.strip()}

Load the FreedomIntelligence/medical-o1-reasoning-SFT dataset and cut up it into coaching and validation datasets:

# Load dataset from the hub
train_set = load_dataset(dataset_name, 'en', cut up="prepare[5%:]")
test_set = load_dataset(dataset_name, 'en', cut up="prepare[:5%]")

...

train_dataset = train_set.map(
generate_and_tokenize_prompt,
remove_columns=columns_to_remove,
batched=False
)

test_dataset = test_set.map(
generate_and_tokenize_prompt,
remove_columns=columns_to_remove,
batched=False
)

Load the DeepSeek-R1 tokenizer from the Hugging Face Transformers library and generate tokens for the prepare and validation datasets. We use the unique sequence size of 8K:

model_id = "deepseek-ai/DeepSeek-R1"
max_seq_length=8096

# Initialize a tokenizer by loading a pre-trained tokenizer configuration, utilizing the quick tokenizer implementation if accessible.
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)

...

train_dataset = train_dataset.map(tokenize, remove_columns=["prompt"])
test_dataset = test_dataset.map(tokenize, remove_columns=["prompt"])

Put together the coaching and validation datasets for SageMaker coaching by saving them as arrow recordsdata, required by SageMaker HyperPod recipes, and developing the S3 paths the place these recordsdata shall be uploaded. This dataset shall be utilized in each SageMaker coaching jobs and SageMaker HyperPod examples:

train_dataset_s3_path = f"s3://{bucket_name}/{input_path}/prepare"
val_dataset_s3_path = f"s3://{bucket_name}/{input_path}/take a look at"

train_dataset.save_to_disk(train_dataset_s3_path)
val_dataset.save_to_disk(val_dataset_s3_path)

The following part describes find out how to run a fine-tuning instance with SageMaker coaching jobs.

Possibility A: High quality-tune utilizing SageMaker coaching jobs

Observe these high-level steps:

Obtain DeepSeek-R1 to the FSx for Lustre mounted listing
Convert DeepSeek-R1 from FP8 to BF16
High quality-tune the DeepSeek-R1 mannequin
Merge the educated adapter with the bottom mannequin

Outline a utility perform to create the ModelTrainer class for each step of the SageMaker coaching jobs pipeline:

# Creates and executes a mannequin coaching job utilizing SageMaker
def create_model_trainer(
use_recipes: bool,
compute: dict,
community: dict,
data_channel: dict,
motion: str,
hyperparameters: dict ={},
source_code: str=None,
training_recipe: str=None,
recipe_overrides: str=None,
image_uri: str=None
) -> ModelTrainer:

...

Obtain DeepSeek-R1 to the FSx for Lustre mounted listing

Observe these steps:

Choose the occasion kind, Amazon FSx knowledge channel, community configuration for the coaching job, and supply code, then outline the ModelTrainer class to run the coaching job on the ml.c5.18xlarge occasion to obtain DeepSeek-R1 from the Hugging Face DeepSeek-R1 hub:

# Create compute occasion
compute = ComputeCreator.create(
instance_type="ml.c5.18xlarge",
instance_count=1
)

# Create FSx knowledge channel
data_channel = FSxDataChannelCreator.create_channel(
directory_path=fsx_mount_point
)

# Create community configuration
community = NetworkConfigCreator.create_network_config(network_config)

# Arrange supply code configuration
source_code = SourceCode(
source_dir="scripts",
entry_script="obtain.py"
)
...

# Create mannequin coach
model_trainer = create_model_trainer(
compute=compute,
community=community,
data_channel=data_channel,
motion="obtain",
source_code=source_code
...
)

Provoke the coaching calling prepare perform of the ModelTrainer class:

model_trainer.prepare(input_data_config=[data_channel], wait=True)

Convert DeepSeek R1 from FP8 to BF16

Use ModelTrainer to transform the DeepSeek-R1 downloaded mannequin weights from FP8 to BF16 format for optimum PEFT coaching. We use script convert.sh to run the execution utilizing the ml.c5.18xlarge occasion.

Use SageMaker coaching heat pool configuration to retain and reuse provisioned infrastructure after the completion of a mannequin obtain coaching job within the earlier step:

# Outline constants
FSX_MODELDIR_BF16 = "deepseek-r1-bf16"
FSX_DIR_PATH = f"{fsx_mount_point}/{fsx_dir_basemodel}"

# Create compute occasion
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=1
)

...

# Arrange supply code configuration
source_code = SourceCode(
source_dir="scripts",
entry_script="convert.sh"
)

...
# Create mannequin coach for conversion
model_trainer = create_model_trainer(
..
motion="convert",
...
)

High quality-tune the DeepSeek-R1 mannequin

The following section includes fine-tuning the DeepSeek-R1 mannequin utilizing two ml.p5.48xlarge situations, utilizing distributed coaching. You implement this by way of the SageMaker recipe hf_deepseek_r1_671b_seq8k_gpu_qlora, which contains the QLoRA methodology. QLoRA makes the massive language mannequin (LLM) trainable on restricted compute by quantizing the bottom mannequin to 4-bit precision whereas utilizing small, trainable low-rank adapters for fine-tuning, dramatically decreasing reminiscence necessities with out sacrificing mannequin high quality:

# Create compute configuration with P5 situations
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=2
)

...

# Create mannequin coach for fine-tuning
model_trainer = create_model_trainer(
use_recipes=True,
...
motion="finetune",
training_recipe="fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora",
recipe_overrides=recipe_overrides
)

Provoke the coaching job to fine-tune the mannequin. SageMaker coaching jobs will provision two P5 situations, orchestrate the SageMaker mannequin parallel container smdistributed-modelparallel:2.4.1-gpu-py311-cu121, and execute the recipe to fine-tune DeepSeek-R1 with the QLoRA technique on an ephemeral cluster:

model_trainer.prepare (input_data_config=[data_channel], wait=True)

Merge the educated adapter with the bottom mannequin

Merge the educated adapters with the bottom mannequin so it may be used for inference:

# Create compute configuration with P5 occasion
compute = ComputeCreator.create(
instance_type="ml.p5.48xlarge",
instance_count=1
)

# Configure supply code location and entry level
source_code = SourceCode(
source_dir="scripts",
entry_script="cli-inference.sh"
)
...

# Create mannequin coach for adapter merging
model_trainer = create_model_trainer(
use_recipes=False,
...
motion="mergeadapter",
source_code=source_code,
)

The following part exhibits how one can run comparable steps on HyperPod to run your generative AI workloads.

Possibility B: High quality-tune utilizing SageMaker HyperPod with Slurm

To fine-tune the mannequin utilizing HyperPod, ensure that your cluster is up and prepared by following the stipulations talked about earlier. To entry the login/head node of the HyperPod Slurm cluster out of your improvement setting, observe the login directions at SSH into Cluster within the workshop.

Alternatively, you can too use AWS Techniques Supervisor and run a command corresponding to the next to start out the session. You will discover the cluster ID, occasion group identify, and occasion ID on the Amazon SageMaker console.

aws ssm start-session --target sagemaker-cluster:[cluster-id]_[instance-group-name]-[instance-id] --region region_name

Whenever you’re within the cluster’s login/head node, run the next instructions to arrange the setting. Run sudo su - ubuntu to run the remaining instructions as the foundation consumer, until you could have a selected consumer ID to entry the cluster and your POSIX consumer is created by way of a lifecycle script on the cluster. Discuss with the multi-user setup for extra particulars.

# create a digital setting
python3 -m venv ${PWD}/venv
supply venv/bin/activate

# clone the recipes repository and arrange the setting
git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 set up -r necessities.txt

Create a squash file utilizing Enroot to run the job on the cluster. Enroot runtime presents GPU acceleration, rootless container help, and seamless integration with HPC environments, making it perfect for operating workflows securely.

# create a squash file utilizing Enroot
REGION=
IMAGE="658645717510.dkr.ecr.${REGION}.amazonaws.com/smdistributed-modelparallel:2.4.1-gpu-py311-cu121"
aws ecr get-login-password --region "${REGION}" | docker login --username AWS --password-stdin 658645717510.dkr.ecr.${REGION}.amazonaws.com
enroot import -o $PWD/smdistributed-modelparallel.sqsh dockerd://${IMAGE}

After you’ve created the squash file, replace the recipes_collection/config.yaml file with absolutely the path to the squash file (created within the previous step), and replace the instance_type if wanted. The ultimate config file ought to have the next parameters:

...

cluster_type: slurm
...

instance_type: p5.48xlarge
...

container: /fsx/.sqsh
...

Additionally replace the file recipes_collection/cluster/slurm.yaml so as to add container_mounts pointing to the FSx for Lustre file system utilized in your cluster.

Observe these high-level steps to arrange, fine-tune, and consider the mannequin utilizing HyperPod recipes:

Obtain the mannequin and convert weights to BF16
High quality-tune the mannequin utilizing QLoRA
Merge the educated mannequin adapter
Consider the fine-tuned mannequin

Obtain the mannequin and convert weights to BF16

Obtain the DeepSeek-R1 mannequin from the HuggingFace hub and convert the mannequin weights from FP8 to BF16. You must convert this to make use of QLoRA for fine-tuning. Copy and execute the next bash script:

#!/bin/bash
begin=$(date +%s)
# set up git lfs and obtain the mannequin from huggingface
sudo apt-get set up git-lfs
GIT_LFS_SKIP_SMUDGE=1 && git clone https://huggingface.co/deepseek-ai/DeepSeek-R1 
&& cd DeepSeek-R1 && git config lfs.concurrenttransfers nproc &&  git lfs pull
finish=$(date +%s)
echo "Time taken to obtain mannequin: $((finish - begin)) seconds"
begin=$(date +%s)
#convert the mannequin weights from fp8 to bf16
supply venv/bin/activate
git clone https://github.com/deepseek-ai/DeepSeek-V3.git
cd DeepSeek-V3/inference && pip set up -r necessities.txt && 
wget https://github.com/aws/sagemaker-hyperpod-training-adapter-for-nemo/blob/predominant/src/hyperpod_nemo_adapter/scripts/fp8_cast_bf16.py && 
python fp8_cast_bf16.py --input-fp8-hf-path ./DeepSeek-R1 --output-bf16-hf-path ./DeepSeek-R1-bf16

finish=$(date +%s)
echo "Time taken to transform mannequin to BF16: $((finish - begin)) seconds"

High quality-tune the mannequin utilizing QLoRA

Obtain the ready dataset that you simply uploaded to Amazon S3 into your FSx for Lustre quantity connected to the cluster.

Enter the next instructions to obtain the recordsdata from Amazon S3:

aws s3 cp s3://{bucket_name}/{input_path}/prepare /fsx/ubuntu/deepseek/knowledge/prepare --recursive
aws s3 cp s3://{bucket_name}/{input_path}/take a look at /fsx/ubuntu/deepseek/knowledge/take a look at --recursive

Replace the launcher script to fine-tune the DeepSeek-R1 671B mannequin. The launcher scripts function handy wrappers for executing the coaching script, predominant.py file, simplifying the method of fine-tuning and parameter adjustment. For fine-tuning the DeepSeek R1 671B mannequin, you’ll find the precise script at:

launcher_scripts/deepseek/run_hf_deepseek_r1_671b_seq8k_gpu_qlora.sh

Earlier than operating the script, it is advisable to modify the placement of the coaching and validation recordsdata, replace the HuggingFace mannequin ID, and optionally the entry token for personal fashions and datasets. The script ought to seem like the next (replace recipes.coach.num_nodes when you’re utilizing a multi-node cluster):

#!/bin/bash

# Unique Copyright (c), NVIDIA CORPORATION. Modifications © Amazon.com

#Customers ought to setup their cluster kind in /recipes_collection/config.yaml

SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}

HF_MODEL_NAME_OR_PATH="/fsx/ubuntu/deepseek/DeepSeek-R1-bf16" # Path to the BF16 transformed mannequin

TRAIN_DIR="/fsx/ubuntu/deepseek/knowledge/prepare" # Location of coaching dataset
VAL_DIR="/fsx/ubuntu/deepseek/knowledge/prepare/" # Location of validation dataset

EXP_DIR="/fsx/ubuntu/deepseek/checkpoints" # Location to save lots of experiment information together with logging, checkpoints, and so on.

HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/predominant.py" 
recipes=fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora 
base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/outcomes" 
recipes.run.identify="hf-deepseek-r1-671b-seq8k-gpu-qlora" 
recipes.exp_manager.exp_dir="$EXP_DIR" 
recipes.coach.num_nodes=2 
recipes.mannequin.train_batch_size=1 
recipes.mannequin.knowledge.train_dir="$TRAIN_DIR" 
recipes.mannequin.knowledge.val_dir="$VAL_DIR" 
recipes.mannequin.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH"

You may view the recipe for this fine-tuning activity below recipes_collection/recipes/fine-tuning/deepseek/hf_deepseek_r1_671b_seq8k_gpu_qlora.yaml and override extra parameters as wanted.

Submit the job by operating the launcher script:

bash launcher_scripts/deepseek/run_hf_deepseek_r1_671b_seq8k_gpu_qlora.sh

Monitor the job utilizing Slurm instructions corresponding to squeue and scontrol present to view the standing of the job and the corresponding logs. The logs will be discovered within the outcomes folder within the launch listing. When the job is full, the mannequin adapters are saved within the EXP_DIR that you simply outlined within the launch. The construction of the listing ought to seem like this:

ls -R
.:.:
checkpoints experiment end result.json

./checkpoints:
peft_sharded

./checkpoints/peft_sharded:
step_50

./checkpoints/peft_sharded/step_50:
README.md adapter_config.json adapter_model.safetensors tp0_ep0

You may see the educated adapter weights are saved as a part of the checkpointing below ./checkpoints/peft_sharded/step_N. We’ll later use this to merge with the bottom mannequin.

Merge the educated mannequin adapter

Observe these steps:

Run a job utilizing the smdistributed-modelparallel enroot picture to merge the adapter with the bottom mannequin.

Obtain the merge_peft_checkpoint.py code from sagemaker-hyperpod-training-adapter-for-nemo repository and retailer it in Amazon FSx. Modify the export variables within the following scripts accordingly to replicate the paths for SOURCE_DIR, ADAPTER_PATH, BASE_MODEL_BF16 and MERGE_MODEL_PATH.

#!/bin/bash
# Copyright Amazon.com, Inc. or its associates. All Rights Reserved.
# SPDX-License-Identifier: MIT-0
#SBATCH --nodes=1 # variety of nodes to make use of
#SBATCH --job-name=deepseek_merge_adapter # identify of your job
#SBATCH --exclusive # job has unique use of the useful resource, no sharing
#SBATCH --wait-all-nodes=1

set -ex;
export SOURCE_DIR=/fsx/path_to_merge_code #(folder containing merge_peft_checkpoint.py)
export ADAPTER_PATH=/fsx/path_to_adapter #( from earlier step )
export BASE_MODEL_BF16=/fsx/path_to_base #( BF16 mannequin from step 1 )
export MERGE_MODEL_PATH=/fsx/path_to_merged_model

# default variables for mounting native paths to container
: "${IMAGE:=$(pwd)/smdistributed-modelparallel.sqsh}"
: "${HYPERPOD_PATH:="/var/log/aws/clusters":"/var/log/aws/clusters"}" #that is want for validating its hyperpod cluster
: "${ADAPTER_PATH_1:=$ADAPTER_PATH:$ADAPTER_PATH}"
: "${BASE_MODEL_BF16_1:=$BASE_MODEL_BF16:$BASE_MODEL_BF16}"
: "${MERGE_MODEL_PATH_1:=$MERGE_MODEL_PATH:$MERGE_MODEL_PATH}"
: "${SOURCE_DIR_1:=$SOURCE_DIR:$SOURCE_DIR}"
############

declare -a ARGS=(
--container-image $IMAGE
--container-mounts $HYPERPOD_PATH,$ADAPTER_PATH_1,$BASE_MODEL_BF16_1,$MERGE_MODEL_PATH_1,$SOURCE_DIR_1
)
#Merge adapter with base mannequin.

srun -l "${ARGS[@]}" python  $SOURCE_DIR/merge_peft_checkpoint.py 
--hf_model_name_or_path $BASE_MODEL_BF16 
--peft_adapter_checkpoint_path $ADAPTER_PATH 
--output_model_path $MERGE_MODEL_PATH 
--deepseek_v3 true

Consider the fine-tuned mannequin

Use the fundamental testing scripts supplied by DeekSeek to deploy the merged mannequin.

Begin by cloning their repo:

git clone https://github.com/deepseek-ai/DeepSeek-V3.git

cd DeepSeek-V3/inference
pip set up -r necessities.txt

You must convert the merged mannequin to a selected format for operating inference. On this case, you want 4*P5 situations to deploy the mannequin as a result of the merged mannequin is in BF16. Enter the next command to transform the mannequin:

python convert.py --hf-ckpt-path /fsx/ubuntu/deepseek/DeepSeek-V3-Base/ 
--save-path /fsx/ubuntu/deepseek/DeepSeek-V3-Demo --n-experts 256 
--model-parallel 32

When the conversion is full, use the next sbatch script to run the batch inference, making the next changes:
1. Replace the ckpt-path to the transformed mannequin path from the earlier step.
2. Create a brand new prompts.txt file with every line containing a immediate. The job will use the prompts from this file and generate output.

#!/bin/bash
#SBATCH —nodes=4
#SBATCH —job-name=deepseek_671b_inference
#SBATCH —output=deepseek_671b_percentj.out

# Set setting variables
export MASTER_ADDR=$(scontrol present hostnames $SLURM_JOB_NODELIST | head -n 1)
export MASTER_PORT=29500
supply /fsx/ubuntu/alokana/deepseek/venv/bin/activate
# Run the job utilizing torchrun
srun /fsx/ubuntu/alokana/deepseek/venv/bin/torchrun 
—nnodes=4 
—nproc-per-node=8 
—rdzv_id=$SLURM_JOB_ID 
—rdzv_backend=c10d 
—rdzv_endpoint=$MASTER_ADDR:$MASTER_PORT 
./generate.py 
—ckpt-path /fsx/ubuntu/alokana/deepseek/DeepSeek-R1-Demo 
—config ./configs/config_671B.json 
--input-file ./prompts.txt

Cleanup

To wash up your sources to keep away from incurring extra fees, observe these steps:

Delete any unused SageMaker Studio sources.
(Optionally available) Delete the SageMaker Studio area.
Confirm that your coaching job isn’t operating anymore. To take action, in your SageMaker console, select Coaching and examine Coaching jobs.
In case you created a HyperPod cluster, delete the cluster to cease incurring prices. In case you created the networking stack from the HyperPod workshop, delete the stack as nicely to scrub up the digital non-public cloud (VPC) sources and the FSx for Lustre quantity.

Conclusion

On this publish, we demonstrated find out how to fine-tune massive fashions corresponding to DeepSeek-R1 671B utilizing both SageMaker coaching jobs or SageMaker HyperPod with HyperPod recipes in just a few steps. This method minimizes the complexity of figuring out optimum distributed coaching configurations and offers a easy technique to correctly measurement your workloads with the most effective price-performance structure on AWS.

To start out utilizing SageMaker HyperPod recipes, go to our sagemaker-hyperpod-recipes GitHub repository for complete documentation and instance implementations. Our crew regularly expands our recipes primarily based on buyer suggestions and rising machine studying (ML) traits, ensuring you could have the required instruments for profitable AI mannequin coaching.

Concerning the Authors

Kanwaljit Khurmi is a Principal Worldwide Generative AI Options Architect at AWS. He collaborates with AWS product groups, engineering departments, and clients to supply steering and technical help, serving to them improve the worth of their hybrid machine studying options on AWS. Kanwaljit focuses on aiding clients with containerized functions and high-performance computing options.

Arun Kumar Lokanatha is a Senior ML Options Architect with the Amazon SageMaker crew. He focuses on massive language mannequin coaching workloads, serving to clients construct LLM workloads utilizing SageMaker HyperPod, SageMaker coaching jobs, and SageMaker distributed coaching. Outdoors of labor, he enjoys operating, mountain climbing, and cooking.

Anoop Saha is a Sr GTM Specialist at Amazon Internet Companies (AWS) specializing in generative AI mannequin coaching and inference. He companions with high frontier mannequin builders, strategic clients, and AWS service groups to allow distributed coaching and inference at scale on AWS and lead joint GTM motions. Earlier than AWS, Anoop held a number of management roles at startups and enormous firms, primarily specializing in silicon and system structure of AI infrastructure.

Rohith Nadimpally is a Software program Growth Engineer engaged on AWS SageMaker, the place he accelerates large-scale AI/ML workflows. Earlier than becoming a member of Amazon, he graduated with Honors from Purdue College with a level in Laptop Science. Outdoors of labor, he enjoys enjoying tennis and watching motion pictures.

Customise DeepSeek-R1 671b mannequin utilizing Amazon SageMaker HyperPod recipes – Half 2

Enhance 2-Bit LLM Accuracy with EoRA

How To Construct a Benchmark for Your Fashions

How To Construct a Benchmark for Your Fashions

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

About Us

Category

Recent Posts