MENU

Fun & Interesting

Why M3 Ultra and M4 Max dominate AMD and Intel desktops: a deep dive into memory performance

Petar 5,537 lượt xem 3 days ago
Video Not Working? Fix It Now

0:00 Introduction
2:07 Architectural overview: M1, M4 Max, M3 Ultra, Ryzen 9950X
6:24 Peak performance benchmark in dense matrix multiplication (max attainable GFLOPS)
6:49 Computational complexity of matrix-matrix multiplication
7:45 Roofline model explanation
9:39 Rooflines for the M4 Max and the Ryzen 9950X: regions where each CPU dominates
10:07 Computational complexity of matrix-vector multiplication
11:05 STREAM memory bandwidth benchmark and analysis
12:15 Peak performance in a memory bound scenario: matrix-vector multiplication

In this deep dive video I examine in detail the peak compute and memory bandwidth of the Apple Mac Studio M4 Max and M3 Ultra, and compare them to the AMD Ryzen 9950X.

I use the programming language Julia to showcase the strong and weak sides of each CPU architectures by performing dense matrix-matrix and matrix-vector computations.

Using the roofline model, I visualize the region where we can expect the Apple CPUs to dominate in performance and the region where we can expect the AMD CPU to dominate.

Thanks to their impressive memory bandwidth, Apple CPUs are uniquely suited for scientific applications like Computational Fluid Dynamics.

Previous video in the series: https://youtu.be/EudKr2bny2c

Testing performed in the programming language Julia: https://julialang.org

Sources:
1. M1 Ultra in Computational Fluid Dynamics: http://hrtapps.com/blogs/20220427/
2. M1 Max memory bandwidth vs AMD:
https://www.anandtech.com/show/16529/amd-epyc-milan-review/4
https://www.anandtech.com/show/17024/apple-m1-max-performance-review/2
3. Roofline model: https://escholarship.org/content/qt78h8v7mr/qt78h8v7mr.pdf
4. STREAM benchmark: https://www.cs.virginia.edu/stream/ref.html
5. Memory bandwidth data for Epyc 9655: https://openbenchmarking.org/result/2411164-NE-AMDEPYCTU56#results

Comment