Live REASONING TEST of Llama 4 Maverick. Lama 4 Maverick is a 400B parameter model, with 128 experts and 17B active parameters.
I perform my standard logic test on the new Llama 4 Maverick 400B, in parallel to Claude 3.7 Sonnet Thinking 32K to have a comparable performance.
Meta claims: "achieving comparable results to the new DeepSeek v3 on reasoning and coding". My test can not confirm this statement by META.
To learn more about my standard causal reasoning test, see my video on Gemini 2.5 Pro (second half):
https://www.youtube.com/watch?v=iVZaJeXu7E8
The origin and original of my extreme logic test ( I developed for Strawberry) you find in this video:
https://www.youtube.com/watch?v=tpun1uOKecc
The very latest iteration of my extreme logic test with advanced complexities I have not published yet, since I want my latest test configuration not to be in the training data for model updates.
Since Llama 4 Behemoth (2T parameter) is still training in Meta's cloud, I will perform my logic tests when the model becomes available.
Llama 4 Scout, a 17 billion active parameter model with 16 experts (the smallest Llama 4) fits on a single H100 GPU (ONLY with Int4 quantization), according to META.
#meta
#llama4
#reasoning
#test