E01 Mixture of Experts Architecture | Why is DeepSeek cheap and good? (with Google Engineer)

Martin Is A Dad 111 lượt xem 3 weeks ago

Video Not Working? Fix It Now

DeepSeek is rattling the whole tech world. As a regular normal SWE, I want to share my insights on why it's cheap and good. The first episode is about the Mixture of Experts (MoE) architecture.

I'm optimistic about it's impact on tech industry and the world. It could bring the product market fit of AI closer to everyone.

Related Deepseek Series:
Episode #1: Mixture of Experts https://youtu.be/Id0_4-nJQN4
Episode #2: Multihead Latent Attention https://youtu.be/oYDkqSPXyMg

Related Transformer Series:
Episode #1: Attention Mechanism https://youtu.be/3RB8WVu9t4Q
Episode #2: Position Encoding https://youtu.be/E1XMcN2lMME
Episode #3: Keys, Values, Queries: https://youtu.be/7i1wlvYLrUo
Episode #4: Multi Head Attention: https://youtu.be/PwSMOwkcl1g
Episode #5: KV Cache and Masked Attention: https://youtu.be/VAtqCJoiOKI

#llm
#deepseek
#openai
#gemini
#google
#nvda
#moe
#nlp
#transformer

Comment