We’re excited to announce that Amazon Bedrock Customized Mannequin Import now helps Qwen fashions. Now you can import customized weights for Qwen2, Qwen2_VL, and Qwen2_5_VL architectures, together with fashions like Qwen 2, 2.5 Coder, Qwen 2.5 VL, and QwQ 32B. You may deliver your individual custom-made Qwen fashions into Amazon Bedrock and deploy them in a completely managed, serverless setting—with out having to handle infrastructure or mannequin serving.
On this put up, we cowl deploy Qwen 2.5 fashions with Amazon Bedrock Customized Mannequin Import, making them accessible to organizations trying to make use of state-of-the-art AI capabilities throughout the AWS infrastructure at an efficient price.
Overview of Qwen fashions
Qwen 2 and a couple of.5 are households of huge language fashions, out there in a variety of sizes and specialised variants to swimsuit numerous wants:
- Common language fashions: Fashions starting from 0.5B to 72B parameters, with each base and instruct variations for general-purpose duties
- Qwen 2.5-Coder: Specialised for code technology and completion
- Qwen 2.5-Math: Targeted on superior mathematical reasoning
- Qwen 2.5-VL (vision-language): Picture and video processing capabilities, enabling multimodal purposes
Overview of Amazon Bedrock Customized Mannequin Import
Amazon Bedrock Customized Mannequin Import permits the import and use of your custom-made fashions alongside present basis fashions (FMs) by means of a single serverless, unified API. You may entry your imported customized fashions on-demand and with out the necessity to handle the underlying infrastructure. Speed up your generative AI utility growth by integrating your supported customized fashions with native Amazon Bedrock instruments and options like Amazon Bedrock Data Bases, Amazon Bedrock Guardrails, and Amazon Bedrock Brokers. Amazon Bedrock Customized Mannequin Import is usually out there within the US-East (N. Virginia), US-West (Oregon), and Europe (Frankfurt) AWS Areas. Now, we’ll discover how you need to use Qwen 2.5 fashions for 2 frequent use instances: as a coding assistant and for picture understanding. Qwen2.5-Coder is a state-of-the-art code mannequin, matching capabilities of proprietary fashions like GPT-4o. It helps over 90 programming languages and excels at code technology, debugging, and reasoning. Qwen 2.5-VL brings superior multimodal capabilities. In response to Qwen, Qwen 2.5-VL shouldn’t be solely proficient at recognizing objects akin to flowers and animals, but additionally at analyzing charts, extracting textual content from photos, decoding doc layouts, and processing lengthy movies.
Stipulations
Earlier than importing the Qwen mannequin with Amazon Bedrock Customized Mannequin Import, just remember to have the next in place:
- An lively AWS account
- An Amazon Easy Storage Service (Amazon S3) bucket to retailer the Qwen mannequin information
- Enough permissions to create Amazon Bedrock mannequin import jobs
- Verified that your Area helps Amazon Bedrock Customized Mannequin Import
Use case 1: Qwen coding assistant
On this instance, we’ll show construct a coding assistant utilizing the Qwen2.5-Coder-7B-Instruct mannequin
- Go to to Hugging Face and seek for and replica the Mannequin ID Qwen/Qwen2.5-Coder-7B-Instruct:
You’ll use Qwen/Qwen2.5-Coder-7B-Instruct
for the remainder of the walkthrough. We don’t show fine-tuning steps, however you too can fine-tune earlier than importing.
- Use the next command to obtain a snapshot of the mannequin domestically. The Python library for Hugging Face gives a utility known as snapshot obtain for this:
Relying in your mannequin dimension, this might take a couple of minutes. When accomplished, your Qwen Coder 7B mannequin folder will include the next information.
- Configuration information: Together with
config.json
,generation_config.json
,tokenizer_config.json
,tokenizer.json
, andvocab.json
- Mannequin information: 4
safetensor
information andmannequin.safetensors.index.json
- Documentation:
LICENSE
,README.md
, andmerges.txt
- Add the mannequin to Amazon S3, utilizing
boto3
or the command line:
aws s3 cp ./extractedfolder s3://yourbucket/path/ --recursive
- Begin the import mannequin job utilizing the next API name:
It’s also possible to do that utilizing the AWS Administration Console for Amazon Bedrock.
- Within the Amazon Bedrock console, select Imported fashions within the navigation pane.
- Select Import a mannequin.
- Enter the small print, together with a Mannequin identify, Import job identify, and mannequin S3 location.
- Create a brand new service function or use an present service function. Then select Import mannequin
- After you select Import on the console, you need to see standing as importing when mannequin is being imported:
When you’re utilizing your individual function, ensure you add the next belief relationship as describes in Create a service function for mannequin import.
After your mannequin is imported, look ahead to mannequin inference to be prepared, after which chat with the mannequin on the playground or by means of the API. Within the following instance, we append Python
to immediate the mannequin to immediately output Python code to record objects in an S3 bucket. Keep in mind to make use of the fitting chat template to enter prompts within the format required. For instance, you will get the fitting chat template for any appropriate mannequin on Hugging Face utilizing under code:
Notice that when utilizing the invoke_model
APIs, you need to use the total Amazon Useful resource Title (ARN) for the imported mannequin. You will discover the Mannequin ARN within the Bedrock console, by navigating to the Imported fashions part after which viewing the Mannequin particulars web page, as proven within the following determine
After the mannequin is prepared for inference, you need to use Chat Playground in Bedrock console or APIs to invoke the mannequin.
Use case 2: Qwen 2.5 VL picture understanding
Qwen2.5-VL-* presents multimodal capabilities, combining imaginative and prescient and language understanding in a single mannequin. This part demonstrates deploy Qwen2.5-VL utilizing Amazon Bedrock Customized Mannequin Import and take a look at its picture understanding capabilities.
Import Qwen2.5-VL-7B to Amazon Bedrock
Obtain the mannequin from Huggingface Face and add it to Amazon S3:
Subsequent, import the mannequin to Amazon Bedrock (both by way of Console or API):
Check the imaginative and prescient capabilities
After the import is full, take a look at the mannequin with a picture enter. The Qwen2.5-VL-* mannequin requires correct formatting of multimodal inputs:
When supplied with an instance picture of a cat (such the next picture), the mannequin precisely describes key options such because the cat’s place, fur shade, eye shade, and normal look. This demonstrates Qwen2.5-VL-* mannequin’s potential to course of visible info and generate related textual content descriptions.
The mannequin’s response:
Pricing
You should utilize Amazon Bedrock Customized Mannequin Import to make use of your customized mannequin weights inside Amazon Bedrock for supported architectures, serving them alongside Amazon Bedrock hosted FMs in a completely managed approach by means of On-Demand mode. Customized Mannequin Import doesn’t cost for mannequin import. You might be charged for inference based mostly on two components: the variety of lively mannequin copies and their length of exercise. Billing happens in 5-minute increments, ranging from the primary profitable invocation of every mannequin copy. The pricing per mannequin copy per minute varies based mostly on components together with structure, context size, Area, and compute unit model, and is tiered by mannequin copy dimension. The customized mannequin unites required for internet hosting depends upon the mannequin’s structure, parameter depend, and context size. Amazon Bedrock mechanically manages scaling based mostly in your utilization patterns. If there aren’t any invocations for five minutes, it scales to zero and scales up when wanted, although this may contain cold-start latency of as much as a minute. Further copies are added if inference quantity persistently exceeds single-copy concurrency limits. The utmost throughput and concurrency per copy is set throughout import, based mostly on components akin to enter/output token combine, {hardware} kind, mannequin dimension, structure, and inference optimizations.
For extra info, see Amazon Bedrock pricing.
Clear up
To keep away from ongoing prices after finishing the experiments:
- Delete your imported Qwen fashions from Amazon Bedrock Customized Mannequin Import utilizing the console or the API.
- Optionally, delete the mannequin information out of your S3 bucket should you now not want them.
Do not forget that whereas Amazon Bedrock Customized Mannequin Import doesn’t cost for the import course of itself, you might be billed for mannequin inference utilization and storage.
Conclusion
Amazon Bedrock Customized Mannequin Import empowers organizations to make use of highly effective publicly out there fashions like Qwen 2.5, amongst others, whereas benefiting from enterprise-grade infrastructure. The serverless nature of Amazon Bedrock eliminates the complexity of managing mannequin deployments and operations, permitting groups to deal with constructing purposes slightly than infrastructure. With options like auto scaling, pay-per-use pricing, and seamless integration with AWS providers, Amazon Bedrock gives a production-ready setting for AI workloads. The mix of Qwen 2.5’s superior AI capabilities and Amazon Bedrock managed infrastructure presents an optimum steadiness of efficiency, price, and operational effectivity. Organizations can begin with smaller fashions and scale up as wanted, whereas sustaining full management over their mannequin deployments and benefiting from AWS safety and compliance capabilities.
For extra info, consult with the Amazon Bedrock Person Information.
Concerning the Authors
Ajit Mahareddy is an skilled Product and Go-To-Market (GTM) chief with over 20 years of expertise in Product Administration, Engineering, and Go-To-Market. Previous to his present function, Ajit led product administration constructing AI/ML merchandise at main expertise corporations, together with Uber, Turing, and eHealth. He’s obsessed with advancing Generative AI applied sciences and driving real-world influence with Generative AI.
Shreyas Subramanian is a Principal Knowledge Scientist and helps clients by utilizing generative AI and deep studying to resolve their enterprise challenges utilizing AWS providers. Shreyas has a background in large-scale optimization and ML and in the usage of ML and reinforcement studying for accelerating optimization duties.
Yanyan Zhang is a Senior Generative AI Knowledge Scientist at Amazon Net Companies, the place she has been engaged on cutting-edge AI/ML applied sciences as a Generative AI Specialist, serving to clients use generative AI to attain their desired outcomes. Yanyan graduated from Texas A&M College with a PhD in Electrical Engineering. Exterior of labor, she loves touring, figuring out, and exploring new issues.
Dharinee Gupta is an Engineering Supervisor at AWS Bedrock, the place she focuses on enabling clients to seamlessly make the most of open supply fashions by means of serverless options. Her staff focuses on optimizing these fashions to ship one of the best cost-performance steadiness for purchasers. Previous to her present function, she gained in depth expertise in authentication and authorization programs at Amazon, creating safe entry options for Amazon choices. Dharinee is obsessed with making superior AI applied sciences accessible and environment friendly for AWS clients.
Lokeshwaran Ravi is a Senior Deep Studying Compiler Engineer at AWS, specializing in ML optimization, mannequin acceleration, and AI safety. He focuses on enhancing effectivity, lowering prices, and constructing safe ecosystems to democratize AI applied sciences, making cutting-edge ML accessible and impactful throughout industries.
June Gained is a Principal Product Supervisor with Amazon SageMaker JumpStart. He focuses on making basis fashions simply discoverable and usable to assist clients construct generative AI purposes. His expertise at Amazon additionally consists of cell procuring purposes and final mile supply.