Byte Latent Transformer: Patches Scale Better Than Tokens

Gabriel Mongaras 2,769 4 months ago

Video Not Working? Fix It Now

Paper here: https://arxiv.org/abs/2412.09871 Code: https://github.com/facebookresearch/blt Notes: https://drive.google.com/file/d/1B5BdO9FtmxTJiWwVJ3Wa-v3pqaRdbWMh/view?usp=drive_link https://drive.google.com/file/d/1BBYwr5botkuvI8CkjarIFNiN6B7uliWr/view?usp=drive_link 00:00 Intro 1:15 Current tokenization strategies 02:48 Methodology 8:08 Patching strategy 15:28 N-gram informed byte encodings 22:09 Encoder, global transformer, decoder 36:59 Inference 39:40 Some notes and results

Comment