Quantization vs Pruning vs Distillation: Optimizing NNs for Inference

Efficient NLP 39,556 2 years ago

Video Not Working? Fix It Now

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Four techniques to optimize the speed of your model's inference process: 0:38 - Quantization 5:59 - Pruning 9:48 - Knowledge Distillation 13:00 - Engineering Optimizations References: LLM Inference Optimization blog post: https://lilianweng.github.io/posts/2023-01-10-inference-optimization/ How to deploy your deep learning project on a budget: https://luckytoilet.wordpress.com/2023/06/20/how-to-deploy-your-deep-learning-side-project-on-a-budget/ Efficient deep learning survey paper: https://arxiv.org/abs/2106.08962 SparseDNN: https://arxiv.org/abs/2101.07948

Comment