Chinchilla Explained: Compute-Optimal Massive Language Models

Edan Meyer 20,800 lượt xem 3 years ago

Video Not Working? Fix It Now

Chinchilla is a massive language released by DeepMind as part of a recent paper that focuses on scaling large language models in a compute-optimal manner. It outperforms recent models like GPT-3, Gopher, and Megatron-Turing NLG that use hundreds of billions of parameters with only 70 billion parameters. They achieve this by training 400 large models to find the optimal ratio of parameters and amount of training data to train a model given a computation budget.

Outline:
0:00 - Overview
1:51 - Paper Intro
6:15 - Methods
18:14 - Scaling Implications
23:43 - Chinchilla Overview
25:48 - Chinchilla Performance
29:49 - Summary
30:07 - Thoughts & Critiques

Paper (Training Compute-Optimal Large Language Models): https://arxiv.org/abs/2203.15556

Comment