Meta's Llama3 - The Mistral Killer?

Decoder 2,094 lượt xem 1 year ago

Video Not Working? Fix It Now

Meta's LLama3 family of models in 8B and 30B flavors was just released and is already making waves in the open source community. With a much larger tokenizer, GQA for all model sizes, and 7.7 million GPU hours spent training on 15 TRILLION tokens, LLama3 seems primed to overtake incumbent models like Mistral and Gemini. I review the most important parts of the announcement before testing the new 8B model against my own battery of questions. Let's go!

Links:
Official Announcement - https://ai.meta.com/blog/meta-llama-3/
Aston Zhang's Twitter post - https://twitter.com/astonzhangAZ/status/1780990210576441844
Meta's new llama interface - https://www.meta.ai/
Llama3 on Ollama - https://ollama.com/library/llama3
Eric Hartford's Dolphin Llama3 - https://huggingface.co/cognitivecomputations/dolphin-2.9-llama3-8b
Meta's training Infrastructure - https://engineering.fb.com/2024/03/12/data-center-engineering/building-metas-genai-infrastructure/

Timestamps:
00:00 - Intro
00:16 - The Announcement
00:58 - Benchmarks
01:24 - Humaneval
02:13 - Model Architecture
03:08 - Training data
04:54 - 400B model?
05:07 - LLama3 in the ecosystem
05:32 - Details from Meta Researcher
05:54 - Why did they open source it?
06:46 - Trying out Llama3
07:00 - List 1-10
07:31 - JSON Recipe
08:36 - Environment State
10:13 - Dealing with incomplete info
11:01 - Robbing a bank
11:33 - Debugging code pt.1
12:38 - Debugging code pt.2
13:32 - Conclusion

large language models

machine learning

LLM

tech news

Meta

Llama3

Comment