Use-case based mostly deployments on SageMaker JumpStart

Amazon SageMaker JumpStart gives pretrained fashions for a variety of drawback sorts that can assist you get began with AI workloads. SageMaker JumpStart affords entry to options for prime use circumstances that may be deployed to SageMaker AI Managed Inference endpoints or SageMaker HyperPod clusters. By pre-set deployment choices, prospects can shortly transfer from mannequin choice to mannequin deployment.

Mannequin deployments by way of SageMaker JumpStart are quick and simple. Prospects might choose choices based mostly on anticipated concurrent customers, with visibility into P50 latency, time-to-first token (TTFT), and throughput (token/second/person). Whereas concurrent person configuration choices are useful for general-purpose situations, they aren’t task-aware, and we acknowledge that prospects use SageMaker JumpStart for numerous, particular use circumstances like content material technology, content material summarization, or Q&A. Every use case would possibly require particular configurations to enhance efficiency. Furthermore, the definition of efficiency isn’t constrained to simply latency, and a few prospects would possibly measure efficiency in throughput or lowest price per token.

Constructing on this basis, we’re excited to announce the launch of SageMaker JumpStart optimized deployments. SageMaker JumpStart improved deployments tackle the necessity for wealthy and simple deployment customization on SageMaker JumpStart by providing pre-defined deployment configurations, designed for particular use circumstances. Prospects keep the identical stage of visibility into the small print of their proposed deployments, however now deployments are optimized for his or her particular use case and efficiency constraint.

Stipulations

To start utilizing SageMaker JumpStart optimized deployments, prospects require at minimal the next:

After these options are in place, prospects can start utilizing SageMaker JumpStart optimized deployments instantly.

Getting began

To get began utilizing SageMaker JumpStart optimized deployments, open SageMaker Studio and select Fashions. Choose any of the fashions that assist optimized deployments (listed within the following part) and select Deploy within the top-right nook. The ensuing display now contains a collapsible window labeled “Efficiency”, which options the choice choices for optimized deployments.

The displayed choices require customers to first choose a use case. For text-based fashions, these use circumstances can vary from generative writing to chat-style interactions; picture and video will function completely different use circumstances after assist is added for these enter sorts. After deciding on a use case, prospects should choose one in every of three constraint optimizations: Value optimized, Throughput optimized, and Latency optimized. There may be additionally a Balanced possibility for purchasers searching for the perfect common efficiency throughout all logged metrics.

After chosen, a pre-set deployment configuration is outlined for the endpoint. Prospects can additional assessment and choose extra configuration values like timeouts, endpoint naming, and safety settings. After configuration is full, prospects select the Deploy possibility within the bottom-right nook.

Accessible fashions

SageMaker JumpStart optimized deployments can be found for the next fashions:

Meta
- Llama-3.1-8B-Instruct
- Llama-2-7b-hf
- Llama-3.2-3B
- Meta-Llama-3-8B
- Llama-3.2-1B-Instruct
- Llama-3.2-1B
- Llama-3.1-70B-Instruct
- Llama-3.2-3B-Instruct
- Meta-Llama-3-8B
Microsoft
Mistral AI
- Mistral-7B-Instruct-v0.2
- Mistral-Small-24B-Instruct-2501
- Mistral-7B-v0.1
- Mistral-7B-Instruct-v0.3
- Mixtral-8x7B-Instruct-v0.1
Qwen
- Qwen3-8B
- Qwen3-32B
- Qwen3-0.6B
- Qwen2.5-7B-Instruct
- Qwen2.5-72B-Instruct
- Qwen2-VL-7B-Instruct
- Qwen2-1.5B-Instruct
- Qwen2-7B
Google
- gemma-7b
- gemma-7b-it
- gemma-2b
Tiiuae

These are the launch fashions for optimized deployments, and we’re actively increasing assist to incorporate extra fashions.

Name to motion

Prospects can begin working with SageMaker JumpStart optimized deployments instantly. Choose one of many out there optimized deployment fashions within the SageMaker Studio mannequin hub. Experiment with the completely different deployment choices to find out the precise configuration to your utility.