MENU

Fun & Interesting

GaLore EXPLAINED: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Video Not Working? Fix It Now

We explain GaLore, a new parameter-efficient training technique that outperforms LoRA in accuracy and supports both pre-training and fine-tuning. Now you can train LLMs without running out of GPU memory! You can even pre-train a LLaMA-7B from scratch on one 24GB GPU (NVIDIA RTX 4090), for example. AI Coffee Break Merch! πŸ›οΈ https://aicoffeebreak.creator-spring.com/ Thanks to our Patrons who support us in Tier 2, 3, 4: πŸ™ Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael, Sunny Dhiana, Andy Ma Outline: 00:00 Parameter-efficient Training 01:05 What is eating up GPU memory & LoRA recap 03:17 GaLore key idea 04:32 GaLore explained 08:43 Memory savings 09:38 Accuracy losses 10:23 Optimal T πŸ“œ Zhao, J., Zhang, Z., Chen, B., Wang, Z., Anandkumar, A. and Tian, Y., 2024. Galore: Memory-efficient llm training by gradient low-rank projection. arXiv preprint arXiv:2403.03507. https://arxiv.org/abs/2403.03507 β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€ πŸ”₯ Optionally, pay us a coffee to help with our Coffee Bean production! β˜• Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak Join this channel to get access to perks: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€ πŸ”— Links: AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ YouTube: https://www.youtube.com/AICoffeeBreak #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​ Video editing: Nils Trost Music 🎡 : Bella Bella Beat - Nana Kwabena

Comment