TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Paper here: https://arxiv.org/abs/2410.23168
Code: https://github.com/haiyang-w/tokenformer
Notes: https://drive.google.com/file/d/17PsGwefQJoSQxBHykoSFeMrKZhPDFx-E/view?usp=sharing
00:00 Intro
02:48 Methodology
7:54 This is an MLP
10:18 How they change the transformer
16:00 Model scaling
20:48 Results