MENU

Fun & Interesting

Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Efficient NLP 39,519 lượt xem 1 year ago
Video Not Working? Fix It Now

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io

Four techniques to optimize the speed of your model's inference process:
0:38 - Quantization
5:59 - Pruning
9:48 - Knowledge Distillation
13:00 - Engineering Optimizations

References:

LLM Inference Optimization blog post: https://lilianweng.github.io/posts/2023-01-10-inference-optimization/

How to deploy your deep learning project on a budget: https://luckytoilet.wordpress.com/2023/06/20/how-to-deploy-your-deep-learning-side-project-on-a-budget/

Efficient deep learning survey paper: https://arxiv.org/abs/2106.08962

SparseDNN: https://arxiv.org/abs/2101.07948

Comment