Meta's LLama3 family of models in 8B and 30B flavors was just released and is already making waves in the open source community. With a much larger tokenizer, GQA for all model sizes, and 7.7 million GPU hours spent training on 15 TRILLION tokens, LLama3 seems primed to overtake incumbent models like Mistral and Gemini. I review the most important parts of the announcement before testing the new 8B model against my own battery of questions. Let's go!
Links:
Official Announcement - https://ai.meta.com/blog/meta-llama-3/
Aston Zhang's Twitter post - https://twitter.com/astonzhangAZ/status/1780990210576441844
Meta's new llama interface - https://www.meta.ai/
Llama3 on Ollama - https://ollama.com/library/llama3
Eric Hartford's Dolphin Llama3 - https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
Meta's training Infrastructure - https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/
Timestamps:
00:00 - Intro
00:16 - The Announcement
00:58 - Benchmarks
01:24 - Humaneval
02:13 - Model Architecture
03:08 - Training data
04:54 - 400B model?
05:07 - LLama3 in the ecosystem
05:32 - Details from Meta Researcher
05:54 - Why did they open source it?
06:46 - Trying out Llama3
07:00 - List 1-10
07:31 - JSON Recipe
08:36 - Environment State
10:13 - Dealing with incomplete info
11:01 - Robbing a bank
11:33 - Debugging code pt.1
12:38 - Debugging code pt.2
13:32 - Conclusion