How to Quantize an LLM with GGUF or AWQ

Trelis Research 12,299 lượt xem 1 year ago

Video Not Working? Fix It Now

GGUF and AWQ Quantization Scripts
- Includes pushing model files to repo
Purchase here: https://buy.stripe.com/5kA6paaO9dmbcV2fZq

ADVANCED Fine-tuning Repository Access
1. Quantization Scripts
2. Unsupervised + Supervised Fine-tuning Notebooks
3. Q&A Dataset Preparation + Cleaning Scripts
4. Scripts to create and use Embeddings
Learn More: https://trelis.com/advanced-fine-tuning-scripts/

Resources:
- Presentation Slides: https://tinyurl.com/2s58xnam
- Llama.cpp: https://github.com/ggerganov/llama.cpp
- AutoAWQ: https://github.com/casper-hansen/AutoAWQ/
- Runpod Affiliate Link: (supports Trelis) https://tinyurl.com/yjxbdc9w
- AWQ paper: https://arxiv.org/pdf/2306.00978.pdf
- GPTQ paper: https://arxiv.org/pdf/2210.17323.pdf
- Ready-Quantized Models: https://huggingface.co/TheBloke/

Referenced Videos:
- Supervised Fine-tuning (with bitsandbytes): https://youtu.be/DMcxxg5iEZQ
- Tiny Llama (run a GGUF model on your laptop): https://www.youtube.com/watch?v=T5l228844NI
- AWQ API setup and explanation: https://www.youtube.com/watch?v=GKd92rhTBGo

0:00 How to quantize a large language model
0:38: Why quantize a language model
1:30 What is quantization
2:23 Which quantization to use?
3:29 GGUF vs BNB vs AWQ vs GPTQ
10:01 How to quantize with AWQ
18:48 How to quantize with GGUF (GGML)
25:29 Recap

Comment