Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy

admin by admin
January 31, 2025
in Artificial Intelligence
0
2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


Very correct 2-bit quantization for operating 70B LLMs on a 24 GB GPU

Benjamin Marie

Towards Data Science

Generated with ChatGPT

Latest developments in low-bit quantization for LLMs, like AQLM and AutoRound, at the moment are displaying acceptable ranges of degradation in downstream duties, particularly for big fashions. That stated, 2-bit quantization nonetheless introduces noticeable accuracy loss typically.

One promising algorithm for low-bit quantization is VPTQ (MIT license), proposed by Microsoft. It was launched in October 2024 and has since proven glorious efficiency and effectivity in quantizing giant fashions.

On this article, we are going to:

  1. Overview the VPTQ quantization algorithm.
  2. Display the right way to use VPTQ fashions, lots of that are already accessible. As an example, we are able to simply discover low-bit variants of Llama 3.3 70B, Llama 3.1 405B, and Qwen2.5 72B.
  3. Consider these fashions and talk about the outcomes to know when VPTQ fashions could be a sensible choice for LLMs in manufacturing.

Remarkably, 2-bit quantization with VPTQ virtually achieves efficiency similar to the unique 16-bit mannequin on duties akin to MMLU. Furthermore, it permits operating Llama 3.1 405B on a single GPU, whereas utilizing much less reminiscence than a 70B mannequin!

Tags: 2bit6.5xaccuracyLLMsPreservingSmallerVPTQ
Previous Post

DeepSeek-R1 mannequin now out there in Amazon Bedrock Market and Amazon SageMaker JumpStart

Next Post

Harnessing Amazon Bedrock generative AI for resilient provide chain

Next Post
Harnessing Amazon Bedrock generative AI for resilient provide chain

Harnessing Amazon Bedrock generative AI for resilient provide chain

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

    How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

    402 shares
    Share 161 Tweet 101
  • Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

    402 shares
    Share 161 Tweet 101
  • Speed up edge AI improvement with SiMa.ai Edgematic with a seamless AWS integration

    402 shares
    Share 161 Tweet 101
  • The Journey from Jupyter to Programmer: A Fast-Begin Information

    402 shares
    Share 161 Tweet 101
  • The right way to run Qwen 2.5 on AWS AI chips utilizing Hugging Face libraries

    402 shares
    Share 161 Tweet 101

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • Why CrewAI’s Supervisor-Employee Structure Fails — and Easy methods to Repair It
  • Amazon SageMaker AI introduces EAGLE based mostly adaptive speculative decoding to speed up generative AI inference
  • Learn how to Implement Three Use Circumstances for the New Calendar-Based mostly Time Intelligence
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.