Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Maarten Grootendorst 29,630 lượt xem 1 year ago

Video Not Working? Fix It Now

In this tutorial, we will explore many different methods for loading in pre-quantized models, such as Zephyr 7B. We will explore the three common methods for quantization, GPTQ, GGUF (formerly GGML), and AWQ.

Timeline
0:00 Introduction
0:25 Loading Zephyr 7B
3:25 Quantization
7:42 Pre-quantized LLMs
8:42 GPTQ
10:29 GGUF
12:22 AWQ
14:46 Outro

📒 Google Colab notebook https://colab.research.google.com/drive/1rt318Ew-5dDw21YZx2zK2vnxbsuDAchH?usp=sharing
🛠️ Written version of this tutorial https://maartengrootendorst.substack.com/p/which-quantization-method-is-right
🤗 Zephyr 7B on HuggingFace https://huggingface.co/HuggingFaceH4/zephyr-7b-beta

Support my work:
👪 Join as a Channel Member:
/ @maartengrootendorst
✉️ Newsletter https://maartengrootendorst.substack.com/
📖 Join Medium to Read my Blogs https://medium.com/@maartengrootendorst

I'm writing a book!
📚 Hands-On Large Language Models https://www.oreilly.com/library/view/hands-on-large-language/9781098150952/

#datascience #machinelearning #ai

Comment