2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy

Very correct 2-bit quantization for operating 70B LLMs on a 24 GB GPU

Latest developments in low-bit quantization for LLMs, like AQLM and AutoRound, at the moment are displaying acceptable ranges of degradation in downstream duties, particularly for big fashions. That stated, 2-bit quantization nonetheless introduces noticeable accuracy loss typically.

One promising algorithm for low-bit quantization is VPTQ (MIT license), proposed by Microsoft. It was launched in October 2024 and has since proven glorious efficiency and effectivity in quantizing giant fashions.

On this article, we are going to:

Overview the VPTQ quantization algorithm.
Display the right way to use VPTQ fashions, lots of that are already accessible. As an example, we are able to simply discover low-bit variants of Llama 3.3 70B, Llama 3.1 405B, and Qwen2.5 72B.
Consider these fashions and talk about the outcomes to know when VPTQ fashions could be a sensible choice for LLMs in manufacturing.

Remarkably, 2-bit quantization with VPTQ virtually achieves efficiency similar to the unique 16-bit mannequin on duties akin to MMLU. Furthermore, it permits operating Llama 3.1 405B on a single GPU, whereas utilizing much less reminiscence than a 70B mannequin!

2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy

DeepSeek-R1 mannequin now out there in Amazon Bedrock Market and Amazon SageMaker JumpStart

Harnessing Amazon Bedrock generative AI for resilient provide chain

Harnessing Amazon Bedrock generative AI for resilient provide chain

Leave a Reply Cancel reply

Popular News

How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

The Journey from Jupyter to Programmer: A Fast-Begin Information

The right way to run Qwen 2.5 on AWS AI chips utilizing Hugging Face libraries

About Us

Category

Recent Posts