MENU

Fun & Interesting

Can Amazon compete against Nvidia GPUs?

Dr Waku 7,428 lượt xem 2 days ago
Video Not Working? Fix It Now

Amazon has been developing its own hardware for AI inference and training. As a hyperscaler cloud provider, Amazon does not want to be reliant on one supplier to provide GPUs. It's also potentially a very lucrative market for them to be in.

By funding the AI lab Anthropic with billions of dollars, Amazon has convinced them to try out the in-house AI accelerator chips that Amazon is developing. Amazon is building a cluster of 400,000 chips just for Anthropic, who will be using it for training their next AI model.

Amazon has been offering their hardware for 25% the cost of the equivalent Nvidia hardware, to certain customers. Amazon's chips are called Trainium (for accelerating training) and Inferentia (for accelerating inference). It takes 3-4x of Amazon's chips to match current H100s and B200s from Nvidia.

#gpu #amazon #nvidia

Amazon's New AI Chip Challenges Nvidia's Dominance
https://technologymagazine.com/articles/trainium2-examining-amazons-chips-challenging-nvidia

Amazon’s AI Self Sufficiency | Trainium2 Architecture & Networking
https://semianalysis.com/2024/12/03/amazons-ai-self-sufficiency-trainium2-architecture-networking/

Amazon racing to develop AI chips cheaper, faster than Nvidia's, executives say
https://www.reuters.com/technology/artificial-intelligence/amazon-racing-develop-ai-chips-cheaper-faster-than-nvidias-executives-say-2024-07-25/

Amazon Shares Rise After AWS Announces AI Supercomputer Nvidia Rival—Here’s What To Know
https://www.forbes.com/sites/stephenpastis/2024/12/03/amazon-shares-rise-after-aws-announces-ai-supercomputer-nvidia-rival-heres-what-to-know/

Amazon Developing Custom AI Processors To Compete With NVIDIA
https://wccftech.com/amazon-developing-custom-ai-processors-to-compete-with-nvidia/

Graviton progress: 50% of new AWS instances run on Amazon custom silicon
https://www.networkworld.com/article/3631134/graviton-progress-50-of-new-aws-instances-run-on-amazon-custom-silicon.html

Amazon’s Quebec closings are about control, not union weakness, experts say
https://www.montrealgazette.com/business/article703383.html

AWS Announces EC2 UltraCluster and GA of Trainium2 Instances
https://insidehpc.com/2024/12/aws-announces-ec2-ultracluster-and-ga-of-trainium2-instances/

Amazon Gives Anthropic $2.75 Billion So It Can Spend It On AWS XPUs
https://www.nextplatform.com/2024/03/27/amazon-gives-anthropic-2-75-billion-so-it-can-spend-it-on-aws-gpus/

Anthropic raises $3.5 billion, reaching $61.5 billion valuation as AI investment frenzy continues
https://venturebeat.com/ai/anthropic-raises-3-5-billion-reaching-61-5-billion-valuation-as-ai-investment-frenzy-continues/

Apple uses Amazon's Graviton and Inferentia chips, also explores Trainium2
https://www.datacenterdynamics.com/en/news/apple-uses-amazons-graviton-and-inferentia-chips-also-explores-trainium2/

NVIDIA H100 Tensor Core GPU
https://www.colfax-intl.com/nvidia/nvidia-h100

The Accelerator War – AWS Tranium, Google TPU, Habana Gaudi and Others
https://globaltechresearch.substack.com/p/the-accelerator-war-aws-tranium-google

Amazon’s Trainium Chip Is Offering NVIDIA H100 Performance At 25% Of The Cost
https://officechai.com/ai/amazons-trainium-chip-is-offering-nvidia-h100-performance-at-25-of-the-cost/


0:00 Intro
0:31 Contents
0:40 Part 1: Anthropic's partnership with Amazon
0:43 Introduction to Anthropic and Claude
1:35 Each AI lab has a big cloud partner
2:00 Anthropic's funding sources
2:30 Strings attached to Amazon's investment into Anthropic
3:01 Anthropic moving to Amazon's AI chips
3:48 Project Rainier: 400,000 Trainium 2 chips
4:52 Amazon getting multiple benefits from partnership
5:16 Part 2: Building custom chips
5:34 Hyperscalers (cloud providers) diversifying suppliers
6:32 In-house chips from Google, Microsoft, Amazon
7:08 Other hardware providers: AMD, Groq, Cerebras, Huawei
7:40 Amazon's hardware strategy
8:03 Amazon internal CPU design (Graviton 4)
8:27 ARM architecture instead of x86
9:28 Apple is using Graviton processors
9:59 50% of new servers use Graviton
10:30 Amazon offering their AI hardware for only 25% the cost
11:17 Part 3: Breakdown of Amazon's AI accelerators
11:41 History of Inferentia and Trainium chips
12:16 Future Inferentia chips cancelled
12:57 Performance of Trainium chips
13:40 Comparison with concurrent Nvidia GPUs
14:48 Architecture of Trainium similar to TPUs
15:22 Four specific engines (tensor engine, etc)
15:46 How chips communicate with each other
16:13 Example: DeepSeek optimizations
16:50 Can't use CUDA with Trainium
17:28 Data center rack layout background info
18:05 Rack-scale 18U computer
18:55 Networking four racks together
19:39 Trn2 Ultraserver on AWS
19:58 Did Amazon diversity successfully?
21:10 Conclusion
21:53 Good hardware ideas
22:26 Amazon questionable business practices
23:15 Outro
23:36 Follow me on X, Bluesky, Substack!

Comment