NEW Lama 4 Scout applies two new AI methods for scaling the softmax function ( @UTokyoScience / The University of Tokyo) and an optimized layer configuration of RoPE and NoPE layer with normalization ( @CohereAI ) to achieve a long context window for the latest Llama 4 model. 10M token length. But can Llama 4 Scout reason about the context length?
All rights with authors:
"The Llama 4 herd: The beginning of a new era of natively multimodal AI innovation"
published April 5, 2025 by META
from META Blog
#airesearch
#meta
#llama4
#reasoning