High quality-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes

This publish is the second a part of the GPT-OSS sequence specializing in mannequin customization with Amazon SageMaker AI. In Half 1, we demonstrated fine-tuning GPT-OSS fashions utilizing open supply Hugging Face libraries with SageMaker coaching jobs, which helps distributed multi-GPU and multi-node configurations, so you may spin up high-performance clusters on demand.

On this publish, we present how one can fine-tune GPT OSS fashions on utilizing recipes on SageMaker HyperPod and Coaching Jobs. SageMaker HyperPod recipes assist you to get began with coaching and fine-tuning in style publicly out there basis fashions (FMs) comparable to Meta’s Llama, Mistral, and DeepSeek in simply minutes, utilizing both SageMaker HyperPod or coaching jobs. The recipes present pre-built, validated configurations that alleviate the complexity of organising distributed coaching environments whereas sustaining enterprise-grade efficiency and scalability for fashions. We define steps to fine-tune the GPT-OSS mannequin on a multilingual reasoning dataset, HuggingFaceH4/Multilingual-Pondering, so GPT-OSS can deal with structured, chain-of-thought (CoT) reasoning throughout a number of languages.

Resolution overview

This resolution makes use of SageMaker HyperPod recipes to run a fine-tuning job on HyperPod utilizing Amazon Elastic Kubernetes Service (Amazon EKS) orchestration or coaching jobs. Recipes are processed via the SageMaker HyperPod recipe launcher, which serves because the orchestration layer chargeable for launching a job on the corresponding structure comparable to SageMaker HyperPod (Slurm or Amazon EKS) or coaching jobs. To study extra, see SageMaker HyperPod recipes.

For particulars on fine-tuning the GPT-OSS mannequin, see High quality-tune OpenAI GPT-OSS fashions on Amazon SageMaker AI utilizing Hugging Face libraries.

Within the following sections, we focus on the conditions for each choices, after which transfer on to the information preparation. The ready knowledge is saved to Amazon FSx for Lustre, which is used because the persistent file system for SageMaker HyperPod, or Amazon Easy Storage Service (Amazon S3) for coaching jobs. We then use recipes to submit the fine-tuning job, and eventually deploy the educated mannequin to a SageMaker endpoint for testing and evaluating the mannequin. The next diagram illustrates this structure.

Stipulations

To comply with alongside, it’s essential to have the next conditions:

An area growth atmosphere with AWS credentials configured for creating and accessing SageMaker sources, or a distant atmosphere comparable to Amazon SageMaker Studio.
For SageMaker HyperPod fine-tuning, full the next:
For fine-tuning the mannequin utilizing SageMaker coaching jobs, it’s essential to have one ml.p5.48xlarge occasion (with 8 x NVIDIA H100 GPUs) for coaching jobs utilization. If you happen to don’t have enough limits, request the next SageMaker quotas on the Service Quotas console: P5 occasion (ml.p5.48xlarge) for coaching jobs (ml.p5.48xlarge for cluster utilization): 1.

It would take as much as 24 hours for these limits to be authorised. It’s also possible to use SageMaker coaching plans to order these cases for a particular timeframe and use case (cluster or coaching jobs utilization). For extra particulars, see Reserve coaching plans to your coaching jobs or HyperPod clusters.

Subsequent, use your most well-liked growth atmosphere to arrange the dataset for fine-tuning. Yow will discover the complete code within the Generative AI utilizing Amazon SageMaker repository on GitHub.

Information tokenization

We use the Hugging FaceH4/Multilingual-Pondering dataset, which is a multilingual reasoning dataset containing CoT examples translated into languages comparable to French, Spanish, and German. The recipe helps a sequence size of 4,000 tokens for the GPT-OSS 120B mannequin. The next instance code demonstrates learn how to tokenize the multilingual-thinking dataset. The recipe accepts knowledge in Hugging Face format (arrow). After it’s tokenized, it can save you the processed dataset to disk.

from datasets import load_dataset
 
from transformers import AutoTokenizer
import numpy as np
 
dataset = load_dataset("HuggingFaceH4/Multilingual-Pondering", break up="prepare")
 
tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-120b")
messages = dataset[0]["messages"]
dialog = tokenizer.apply_chat_template(messages, tokenize=False)
print(dialog)
 
def preprocess_function(instance):
    return tokenizer.apply_chat_template(instance['messages'], 
                                        return_dict=True, 
                                        padding="max_length", 
                                        max_length=4096, 
                                        truncation=True)
 
def label(x):
    x["labels"]=np.array(x["input_ids"])
    x["labels"][x["labels"]==tokenizer.pad_token_id]=-100
    x["labels"]=x["labels"].tolist()
    return x
 
dataset = dataset.map(preprocess_function, 
                      remove_columns=['reasoning_language', 
                                      'developer', 
                                      'user', 
                                      'analysis', 
                                      'final',
                                      'messages'])
dataset = dataset.map(label)

# for HyperPod, save to mounted FSx quantity
dataset.save_to_disk("/fsx/multilingual_4096")

# for coaching jobs, save to S3
dataset.save_to_disk("multilingual_4096")

def upload_directory(local_dir, bucket_name, s3_prefix=''):
    s3_client = boto3.consumer('s3')
    
    for root, dirs, recordsdata in os.stroll(local_dir):
        for file in recordsdata:
            local_path = os.path.be part of(root, file)
            # Calculate relative path for S3
            relative_path = os.path.relpath(local_path, local_dir)
            s3_path = os.path.be part of(s3_prefix, relative_path).change("", "/")
            
            print(f"Importing {local_path} to {s3_path}")
            s3_client.upload_file(local_path, bucket_name, s3_path)

upload_directory('./multilingual_4096/', , 'multilingual_4096')

Now that you’ve got ready and tokenized the dataset, you may fine-tune the GPT-OSS mannequin in your dataset, utilizing both SageMaker HyperPod or coaching jobs. SageMaker coaching jobs are perfect for one-off or periodic coaching workloads that want non permanent compute sources, making it a completely managed, on-demand expertise to your coaching wants. SageMaker HyperPod is perfect for steady growth and experimentation, offering a persistent, preconfigured, and failure-resilient cluster. Relying in your selection, skip to the suitable part for subsequent steps.

High quality-tune the mannequin utilizing SageMaker HyperPod

To fine-tune the mannequin utilizing HyperPod, begin by organising the digital atmosphere and putting in the required dependencies to execute the coaching job on the EKS cluster. Be certain that the cluster is InService earlier than continuing, and also you’re utilizing Python 3.9 or better in your growth atmosphere.

python3 -m venv ${PWD}/venv
supply venv/bin/activate

Subsequent, obtain and arrange the SageMaker HyperPod recipes repository:

git clone --recursive https://github.com/aws/sagemaker-hyperpod-recipes.git
cd sagemaker-hyperpod-recipes
pip3 set up -r necessities.txt

Now you can use the SageMaker HyperPod recipe launch scripts to submit your coaching job. Utilizing the recipe entails updating the k8s.yaml configuration file and executing the launch script.

In recipes_collection/cluster/k8s.yaml, replace the persistent_volume_claims part. It mounts the FSx declare to the /fsx listing of every computing pod:

- claimName: fsx-claim    
  mountPath: fsx

SageMaker HyperPod recipes present a launch script for every recipe inside the launcher_scripts listing. To fine-tune the GPT-OSS-120B mannequin, replace the launch scripts situated at launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh and replace the cluster_type parameter.

The up to date launch script ought to look just like the next code when working SageMaker HyperPod with Amazon EKS. Guarantee that cluster=k8s and cluster_type=k8s are up to date within the launch script:

#!/bin/bash

# Authentic Copyright (c), NVIDIA CORPORATION. Modifications © Amazon.com

#Customers ought to setup their cluster kind in /recipes_collection/config.yaml

SAGEMAKER_TRAINING_LAUNCHER_DIR=${SAGEMAKER_TRAINING_LAUNCHER_DIR:-"$(pwd)"}

HF_MODEL_NAME_OR_PATH="openai/gpt-oss-120b" # HuggingFace pretrained mannequin title or path

TRAIN_DIR="/fsx/multilingual_4096" # Location of coaching dataset
VAL_DIR="/fsx/multilingual_4096" # Location of validation dataset

EXP_DIR="/fsx/experiment" # Location to avoid wasting experiment information together with logging, checkpoints, ect
HF_ACCESS_TOKEN="hf_xxxxxxxx" # Non-obligatory HuggingFace entry token

HYDRA_FULL_ERROR=1 python3 "${SAGEMAKER_TRAINING_LAUNCHER_DIR}/primary.py" 
    recipes=fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora 
    container="658645717510.dkr.ecr.us-west-2.amazonaws.com/smdistributed-modelparallel:sm-pytorch_gpt_oss_patch_pt-2.7_cuda12.8" 
    base_results_dir="${SAGEMAKER_TRAINING_LAUNCHER_DIR}/outcomes" 
    recipes.run.title="hf-gpt-oss-120b-lora" 
	cluster=k8s  # Imp: add cluster line when working on HP EKS
	cluster_type=k8s  # Imp: add cluster_type line when working on HP EKS
    recipes.exp_manager.exp_dir="$EXP_DIR" 
    recipes.coach.num_nodes=1 
    recipes.mannequin.knowledge.train_dir="$TRAIN_DIR" 
    recipes.mannequin.knowledge.val_dir="$VAL_DIR" 
    recipes.mannequin.hf_model_name_or_path="$HF_MODEL_NAME_OR_PATH" 
    recipes.mannequin.hf_access_token="$HF_ACCESS_TOKEN"

When the script is prepared, you may launch fine-tuning of the GPT OSS 120B mannequin utilizing the next code:

chmod +x launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh 
bash launcher_scripts/gpt_oss/run_hf_gpt_oss_120b_seq4k_gpu_lora.sh

After submitting a job for fine-tuning, you should use the next command to confirm profitable submission. It is best to be capable to see the pods working in your cluster:

kubectl get pods
NAME                                READY  STATUS   RESTARTS   AGE
hf-gpt-oss-120b-lora-h2cwd-worker-0 1/1    Operating  0          14m

To test logs for the job, you should use the kubectl logs command:

kubectl logs -f hf-gpt-oss-120b-lora-h2cwd-worker-0

It is best to be capable to see the next logs when the coaching begins and completes. One can find the checkpoints written to the /fsx/experiment/checkpoints folder.

warnings.warn(
    
Epoch 0:  40%|████      | 50/125 [08:47<13:10,  0.09it/s, Loss/train=0.254, Norms/grad_norm=0.128, LR/learning_rate=2.2e-6] [NeMo I 2025-08-18 17:49:48 nemo_logging:381] save SageMakerCheckpointType.PEFT_FULL checkpoint: /fsx/experiment/checkpoints/peft_full/steps_50
[NeMo I 2025-08-18 17:49:48 nemo_logging:381] Saving PEFT checkpoint to /fsx/experiment/checkpoints/peft_full/steps_50
[NeMo I 2025-08-18 17:49:49 nemo_logging:381] Loading Base mannequin from : openai/gpt-oss-120b
You are trying to make use of Flash Consideration 2 with out specifying a torch dtype. This may result in surprising behaviour
Loading checkpoint shards: 100%|██████████| 15/15 [01:49<00:00,  7.33s/it]
[NeMo I 2025-08-18 17:51:39 nemo_logging:381] Merging the adapter, this may take some time......
Unloading and merging mannequin: 100%|██████████| 547/547 [00:07<00:00, 71.27it/s]
[NeMo I 2025-08-18 17:51:47 nemo_logging:381] Checkpointing to /fsx/experiment/checkpoints/peft_full/steps_50/final-model......
[NeMo I 2025-08-18 18:00:14 nemo_logging:381] Efficiently save the merged mannequin checkpoint.
`Coach.match` stopped: `max_steps=50` reached.
Epoch 0:  40%|████      | 50/125 [23:09<34:43,  0.04it/s, Loss/train=0.264, Norms/grad_norm=0.137, LR/learning_rate=2e-6]

When the coaching is full, the ultimate merged mannequin might be discovered within the experiment listing path you outlined within the launcher script underneath /fsx/experiment/checkpoints/peft_full/steps_50/final-model.

High quality-tune utilizing SageMaker coaching jobs

It’s also possible to use recipes straight with SageMaker coaching jobs utilizing the SageMaker Python SDK. The coaching jobs routinely spin up the compute, load the enter knowledge, run the coaching script, save the mannequin to your output location, and tear down the cases, for a clean coaching expertise.

The next code snippet reveals learn how to use recipes with the PyTorch estimator. You should use the training_recipe parameter to specify the coaching or fine-tuning recipe for use, and recipe_overrides for any parameters that want substitute. For coaching jobs, replace the enter, output, and outcomes directories to areas in /decide/ml as required by SageMaker coaching jobs.

import os
import sagemaker,boto3
from sagemaker.pytorch import PyTorch
from sagemaker.inputs import FileSystemInput

sagemaker_session = sagemaker.Session()
function = sagemaker.get_execution_role()
bucket = sagemaker_session.default_bucket()
output = os.path.be part of(f"s3://{bucket}", "output")

recipe_overrides = {
    "run": {
        "results_dir": "/decide/ml/mannequin",
    },
    "exp_manager": {
        "exp_dir": "",
        "explicit_log_dir": "/decide/ml/output/tensorboard",
        "checkpoint_dir": "/decide/ml/checkpoints",
    },
    "mannequin": {
        "knowledge": {
            "train_dir": "/decide/ml/enter/knowledge/prepare",
            "val_dir": "/decide/ml/enter/knowledge/val",
        },
    },
    "use_smp_model": "False",
}


# create the estimator object
estimator = PyTorch(
  output_path=output,
  base_job_name=f"gpt-oss-recipe",
  function=function,
  instance_type="ml.p5.48xlarge",
  training_recipe="fine-tuning/gpt_oss/hf_gpt_oss_120b_seq4k_gpu_lora",
  recipe_overrides=recipe_overrides,
  sagemaker_session=sagemaker_session,
  image_uri="658645717510.dkr.ecr.us-west-2.amazonaws.com/smdistributed-modelparallel:sm-pytorch_gpt_oss_patch_pt-2.7_cuda12.8",
)

# submit the coaching job
estimator.match(
inputs={
"prepare": f"s3://{bucket}/datasets/multilingual_4096/", 
"val": f"s3://{bucket}/datasets/multilingual_4096/"}, wait=True)

After the job is submitted, you may monitor the standing of your coaching job on the SageMaker console, by selecting Coaching jobs underneath Coaching within the navigation pane. Select the coaching job that begins with gpt-oss-recipe to view its particulars and logs. When the coaching job is full, the outputs will probably be saved to an S3 location. You may get the situation of the output artifacts from the S3 mannequin artifact part on the job particulars web page.

Run inference

After you fine-tune your GPT-OSS mannequin with SageMaker recipes on both SageMaker coaching jobs or SageMaker HyperPod, the output is a personalized mannequin artifact that merges the bottom mannequin with the personalized PEFT adapters. This last mannequin is saved in Amazon S3 and might be deployed straight from Amazon S3 to SageMaker endpoints for real-time inference.

To serve GPT-OSS fashions, it’s essential to have the most recent vLLM containers (v0.10.1 or later). A full checklist of vllm-openai Docker picture variations is on the market on Docker hub.

The steps to deploy your fine-tuned GPT-OSS mannequin are outlined on this part.

Construct the most recent GPT-OSS container to your SageMaker endpoint

If you happen to’re deploying the mannequin from SageMaker Studio utilizing JupyterLab or the Code Editor, each environments include Docker preinstalled. Just be sure you’re utilizing the SageMaker Distribution picture v3.0 or later for compatibility.You possibly can construct your deployment container by working the next instructions:

%%bash # <- use this in the event you're working this inside JupterLab cell

# navigate to deploy dir from the present workdir, to construct container
cd ./deploy 

# construct a push container
chmod +X construct.sh
bash construct.sh

cd ..

If you happen to’re working these instructions from a neighborhood terminal or different atmosphere, merely omit the %%bash line and run the instructions as normal shell instructions.

The construct.sh script is chargeable for routinely constructing and pushing a vllm-openai container that’s optimized for SageMaker endpoints. After it’s constructed, the customized SageMaker endpoint suitable vllm picture is pushed to Amazon Elastic Container Registry (Amazon ECR). SageMaker endpoints can then pull this picture from Amazon ECR at runtime to spin up the container for inference.

The next is an instance of the construct.sh script:

export REGION={area}
export ACCOUNT_ID={account_id}
export REPOSITORY_NAME=vllm
export TAG=v0.10.1

full_name="${ACCOUNT_ID}.dkr.ecr.${REGION}.amazonaws.com/${REPOSITORY_NAME}:${TAG}"

echo "constructing $full_name"

DOCKER_BUILDKIT=0 docker construct . --network sagemaker --tag $full_name --file Dockerfile

aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com

# If the repository does not exist in ECR, create it.
aws ecr describe-repositories --region ${REGION} --repository-names "${REPOSITIRY_NAME}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --region ${REGION} --repository-name "${REPOSITORY_NAME}" > /dev/null
fi

docker tag $REPOSITORY_NAME:$TAG ${full_name}
docker push ${full_name}

The Dockerfile defines how we convert an open supply vLLM Docker picture right into a SageMaker hosting-compatible picture. This entails extending the bottom vllm-openai picture, including the serve entrypoint script, and making it executable. See the next instance Dockerfile:

FROM vllm/vllm-openai:v0.10.1

COPY serve /usr/bin/serve
RUN chmod 777 /usr/bin/serve

ENTRYPOINT [ "/usr/bin/serve" ]

The serve script acts as a translation layer between SageMaker internet hosting conventions and the vLLM runtime. You possibly can preserve the identical deployment workflow you’re accustomed to when internet hosting fashions on SageMaker endpoints, whereas routinely changing SageMaker-specific configurations into the format anticipated by vLLM.

Key factors to notice about this script:

It enforces the usage of port 8080, which SageMaker requires for inference containers
It dynamically interprets atmosphere variables prefixed with OPTION_ into CLI arguments for vLLM (for instance, OPTION_MAX_MODEL_LEN=4096 modifications to --max-model-len 4096)
It prints the ultimate set of arguments for visibility
It lastly launches the vLLM API server with the translated arguments

The next is an instance serve script:

#!/bin/bash

# Outline the prefix for atmosphere variables to search for
PREFIX="OPTION_"
ARG_PREFIX="--"

# Initialize an array for storing the arguments
# port 8080 required by sagemaker, https://docs.aws.amazon.com/sagemaker/newest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-container-response
ARGS=(--port 8080)

# Loop via all atmosphere variables
whereas IFS='=' learn -r key worth; do
    # Take away the prefix from the important thing, convert to lowercase, and change underscores with dashes
    arg_name=$(echo "${key#"${PREFIX}"}" | tr '[:upper:]' '[:lower:]' | tr '_' '-')

    # Add the argument title and worth to the ARGS array
    ARGS+=("${ARG_PREFIX}${arg_name}")
    if [ -n "$value" ]; then
        ARGS+=("$worth")
    fi
accomplished < <(env | grep "^${PREFIX}")

echo "-------------------------------------------------------------------"
echo "vLLM engine args: [${ARGS[@]}]"
echo "-------------------------------------------------------------------"

# Cross the collected arguments to the primary entrypoint
exec python3 -m vllm.entrypoints.openai.api_server "${ARGS[@]}"

Host personalized GPT-OSS as a SageMaker real-time endpoint

Now you may deploy your fine-tuned GPT-OSS mannequin utilizing the ECR picture URI you constructed within the earlier step. On this instance, the mannequin artifacts are saved securely in an S3 bucket, and SageMaker will obtain them into the container at runtime.Full the next configurations:

Set model_data to level to the S3 prefix the place your mannequin artifacts are situated
Set the OPTION_MODEL atmosphere variable to /decide/ml/mannequin, which is the place SageMaker mounts the mannequin contained in the container
(Non-obligatory) If you happen to’re serving a mannequin from Hugging Face Hub as an alternative of Amazon S3, you may set OPTION_MODEL on to the Hugging Face mannequin ID as an alternative

The endpoint startup may take a number of minutes because the mannequin artifacts are downloaded and the container is initialized.The next is an instance deployment code:

inference_image = f"{account_id}.dkr.ecr.{area}.amazonaws.com/vllm:v0.10.1"

...
...

lmi_model = sagemaker.Mannequin(
    image_uri=inference_image,
    env={
        "OPTION_MODEL": "/decide/ml/mannequin", # set this to let SM endpoint learn a mannequin saved in s3, else set it to HF MODEL ID
        "OPTION_SERVED_MODEL_NAME": "mannequin",
        "OPTION_TENSOR_PARALLEL_SIZE": json.dumps(num_gpus),
        "OPTION_DTYPE": "bfloat16",
        #"VLLM_ATTENTION_BACKEND": "TRITON_ATTN_VLLM_V1", # not required for vLLM 0.10.1 and above
        "OPTION_ASYNC_SCHEDULING": "true",
        "OPTION_QUANTIZATION": "mxfp4"
    },
    function=function,
    title=model_name,
    model_data={
        'S3DataSource': {
            'S3Uri': "s3://path/to/gpt-oss/mannequin/artifacts",
            'S3DataType': 'S3Prefix',
            'CompressionType': 'None'
        }
    },
)

...

lmi_model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    container_startup_health_check_timeout=600,
    endpoint_name=endpoint_name,
    endpoint_type=sagemaker.enums.EndpointType.INFERENCE_COMPONENT_BASED,
    inference_component_name=inference_component_name,
    sources=ResourceRequirements(requests={"num_accelerators": 1, "reminiscence": 1024*3, "copies": 1,}),
)

Pattern inference

After your endpoint is deployed and within the InService state, you may invoke your fine-tuned GPT-OSS mannequin utilizing the SageMaker Python SDK.

The next is an instance predictor setup:

pretrained_predictor = sagemaker.Predictor(
    endpoint_name=endpoint_name,
    sagemaker_session=sagemaker.Session(boto3.Session(region_name=boto3.Session().region_name)),
    serializer=serializers.JSONSerializer(),
    deserializer=deserializers.JSONDeserializer(),
    component_name=inference_component_name
)

The modified vLLM container is absolutely suitable with the OpenAI-style messages enter format, making it easy to ship chat-style requests:

payload = {
    "messages": [{"role": "user", "content": "Hello who are you?"}],
    "parameters": {"max_new_tokens": 64, "temperature": 0.2}
}

output = pretrained_predictor.predict(payload)

You have got efficiently deployed and invoked your customized fine-tuned GPT-OSS mannequin on SageMaker real-time endpoints, utilizing the vLLM framework for optimized, low-latency inference. Yow will discover extra GPT-OSS internet hosting examples within the OpenAI gpt-oss examples GitHub repo.

Clear up

To keep away from incurring extra fees, full the next steps to scrub up the sources used on this publish:

Delete the SageMaker endpoint:

pretrained_predictor.delete_endpoint()

If you happen to created a SageMaker HyperPod cluster for the needs of this publish, delete the cluster by following the directions in Deleting a SageMaker HyperPod cluster.
Clear up the FSx for Lustre quantity if it’s not wanted by following directions in Deleting a file system.
If you happen to used coaching jobs, the coaching cases are routinely deleted when the roles are full.

Conclusion

On this publish, we confirmed learn how to fine-tune OpenAI’s GPT-OSS fashions (gpt-oss-120b and gpt-oss-20b) on SageMaker AI utilizing SageMaker HyperPod recipes. We mentioned how SageMaker HyperPod recipes present a robust but accessible resolution for organizations to scale their AI mannequin coaching capabilities with giant language fashions (LLMs) together with GPT-OSS, utilizing both a persistent cluster via SageMaker HyperPod, or an ephemeral cluster utilizing SageMaker coaching jobs. The structure streamlines advanced distributed coaching workflows via its intuitive recipe-based method, lowering setup time from weeks to minutes. We additionally confirmed how these fine-tuned fashions might be seamlessly deployed to manufacturing utilizing SageMaker endpoints with vLLM optimization, offering enterprise-grade inference capabilities with OpenAI-compatible APIs. This end-to-end workflow, from coaching to deployment, helps organizations construct and serve customized LLM options whereas utilizing the scalable infrastructure of AWS and complete ML platform capabilities of SageMaker.

To start utilizing the SageMaker HyperPod recipes, go to the Amazon SageMaker HyperPod recipes GitHub repo for complete documentation and instance implementations. If you happen to’re all for exploring the fine-tuning additional, the Generative AI utilizing Amazon SageMaker GitHub repo has the required code and notebooks. Our crew continues to broaden the recipe ecosystem based mostly on buyer suggestions and rising ML traits, ensuring that you’ve got the instruments wanted for profitable AI mannequin coaching.

Particular because of everybody who contributed to the launch: Hengzhi Pei, Zach Kimberg, Andrew Tian, Leonard Lausen, Sanjay Dorairaj, Manish Agarwal, Sareeta Panda, Chang Ning Tsai, Maxwell Nuyens, Natasha Sivananjaiah, and Kanwaljit Khurmi.

Concerning the authors

Durga Sury is a Senior Options Architect at Amazon SageMaker, the place she helps enterprise clients construct safe and scalable AI/ML programs. When she’s not architecting options, yow will discover her having fun with sunny walks together with her canine, immersing herself in homicide thriller books, or catching up on her favourite Netflix reveals.

Pranav Murthy is a Senior Generative AI Information Scientist at AWS, specializing in serving to organizations innovate with Generative AI, Deep Studying, and Machine Studying on Amazon SageMaker AI. Over the previous 10+ years, he has developed and scaled superior pc imaginative and prescient (CV) and pure language processing (NLP) fashions to sort out high-impact issues—from optimizing international provide chains to enabling real-time video analytics and multilingual search. When he’s not constructing AI options, Pranav enjoys taking part in strategic video games like chess, touring to find new cultures, and mentoring aspiring AI practitioners. Yow will discover Pranav on LinkedIn.

Sumedha Swamy is a Senior Supervisor of Product Administration at Amazon Internet Providers (AWS), the place he leads a number of areas of the Amazon SageMaker, together with SageMaker Studio – the industry-leading built-in growth atmosphere for machine studying, developer and administrator experiences, AI infrastructure, and SageMaker SDK.

Dmitry Soldatkin is a Senior AI/ML Options Architect at Amazon Internet Providers (AWS), serving to clients design and construct AI/ML options. Dmitry’s work covers a variety of ML use instances, with a major curiosity in Generative AI, deep studying, and scaling ML throughout the enterprise. He has helped corporations in lots of industries, together with insurance coverage, monetary providers, utilities, and telecommunications. You possibly can join with Dmitry on LinkedIn.

Arun Kumar Lokanatha is a Senior ML Options Architect with the Amazon SageMaker crew. He focuses on giant language mannequin coaching workloads, serving to clients construct LLM workloads utilizing SageMaker HyperPod, SageMaker coaching jobs, and SageMaker distributed coaching. Exterior of labor, he enjoys working, mountaineering, and cooking.

Anirudh Viswanathan is a Senior Product Supervisor, Technical, at AWS with the SageMaker crew, the place he focuses on Machine Studying. He holds a Grasp’s in Robotics from Carnegie Mellon College and an MBA from the Wharton Faculty of Enterprise. Anirudh is a named inventor on greater than 50 AI/ML patents. He enjoys long-distance working, exploring artwork galleries, and attending Broadway reveals.

High quality-tune OpenAI GPT-OSS fashions utilizing Amazon SageMaker HyperPod recipes

Every part You Must Know In regards to the New Energy BI Storage Mode

Easy methods to Carry out Complete Giant Scale LLM Validation

Easy methods to Carry out Complete Giant Scale LLM Validation

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

About Us

Category

Recent Posts