Automationscribe.com
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automation Scribe
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us
No Result
View All Result
Automationscribe.com
No Result
View All Result

2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy

admin by admin
January 31, 2025
in Artificial Intelligence
0
2-bit VPTQ: 6.5x Smaller LLMs whereas Preserving 95% Accuracy
399
SHARES
2.3k
VIEWS
Share on FacebookShare on Twitter


Very correct 2-bit quantization for operating 70B LLMs on a 24 GB GPU

Benjamin Marie

Towards Data Science

Generated with ChatGPT

Latest developments in low-bit quantization for LLMs, like AQLM and AutoRound, at the moment are displaying acceptable ranges of degradation in downstream duties, particularly for big fashions. That stated, 2-bit quantization nonetheless introduces noticeable accuracy loss typically.

One promising algorithm for low-bit quantization is VPTQ (MIT license), proposed by Microsoft. It was launched in October 2024 and has since proven glorious efficiency and effectivity in quantizing giant fashions.

On this article, we are going to:

  1. Overview the VPTQ quantization algorithm.
  2. Display the right way to use VPTQ fashions, lots of that are already accessible. As an example, we are able to simply discover low-bit variants of Llama 3.3 70B, Llama 3.1 405B, and Qwen2.5 72B.
  3. Consider these fashions and talk about the outcomes to know when VPTQ fashions could be a sensible choice for LLMs in manufacturing.

Remarkably, 2-bit quantization with VPTQ virtually achieves efficiency similar to the unique 16-bit mannequin on duties akin to MMLU. Furthermore, it permits operating Llama 3.1 405B on a single GPU, whereas utilizing much less reminiscence than a 70B mannequin!

Tags: 2bit6.5xaccuracyLLMsPreservingSmallerVPTQ
Previous Post

DeepSeek-R1 mannequin now out there in Amazon Bedrock Market and Amazon SageMaker JumpStart

Next Post

Harnessing Amazon Bedrock generative AI for resilient provide chain

Next Post
Harnessing Amazon Bedrock generative AI for resilient provide chain

Harnessing Amazon Bedrock generative AI for resilient provide chain

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Popular News

  • How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

    How Aviva constructed a scalable, safe, and dependable MLOps platform utilizing Amazon SageMaker

    401 shares
    Share 160 Tweet 100
  • Diffusion Mannequin from Scratch in Pytorch | by Nicholas DiSalvo | Jul, 2024

    401 shares
    Share 160 Tweet 100
  • Unlocking Japanese LLMs with AWS Trainium: Innovators Showcase from the AWS LLM Growth Assist Program

    401 shares
    Share 160 Tweet 100
  • Proton launches ‘Privacy-First’ AI Email Assistant to Compete with Google and Microsoft

    400 shares
    Share 160 Tweet 100
  • Streamlit fairly styled dataframes half 1: utilizing the pandas Styler

    400 shares
    Share 160 Tweet 100

About Us

Automation Scribe is your go-to site for easy-to-understand Artificial Intelligence (AI) articles. Discover insights on AI tools, AI Scribe, and more. Stay updated with the latest advancements in AI technology. Dive into the world of automation with simplified explanations and informative content. Visit us today!

Category

  • AI Scribe
  • AI Tools
  • Artificial Intelligence

Recent Posts

  • LLM Evaluations: from Prototype to Manufacturing
  • AWS Subject Expertise diminished value and delivered low latency and excessive efficiency with Amazon Nova Lite basis mannequin
  • AWS: Deploying a FastAPI App on EC2 in Minutes
  • Home
  • Contact Us
  • Disclaimer
  • Privacy Policy
  • Terms & Conditions

© 2024 automationscribe.com. All rights reserved.

No Result
View All Result
  • Home
  • AI Scribe
  • AI Tools
  • Artificial Intelligence
  • Contact Us

© 2024 automationscribe.com. All rights reserved.