MENU
Home
Mp3 Beta
Fun & Interesting
Funny videos
Tik Tok is funny
Bigo Hot Girls
K-Pop
Funny accident
Comedy
Record Guinness
Funny animals
Got Talent
Talented people
How do they do it?
Unique invention
Strange world
The camera comes back
Movie theaters
Movie 18+
Comedy
Short film
Share on
Facebook
Share on
Twitter
Nghe mp3
Direct Preference Optimization (DPO): Your Language Model is Secretly a Reward Model Explained
Gabriel Mongaras
17,872 lượt xem
1 year ago
Video Not Working?
Fix It Now
Paper found here: https://arxiv.org/abs/2305.18290
Machine Learning
Artificial Intelligence
Attention
Deep Learning
Transformers
MHA
LLM
Large Language Models
GPT
Paper Explanations
Paper Review
Direct Preference Optimization
DPO
PPO
Reinforcement Learning
RL
finetuning
RLHF
Reinforcement Learning With Human Feedback
Reinforcement Learning From Human Feedback
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Comment
48:46
Direct Preference Optimization (DPO) explained: Bradley-Terry model, log probabilities, math
Umar Jamil
22,807 views
19:39
RLHF & DPO Explained (In Simple Terms!)
Entry Point AI
8,232 views
1:16:15
Stanford CS224N | 2023 | Lecture 10 - Prompting, Reinforcement Learning from Human Feedback
Stanford Online
69,610 views
1:09:00
[GRPO Explained] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Yannic Kilcher
143,870 views
27:19
LoRA: Low-Rank Adaptation of LLMs Explained
Gabriel Mongaras
11,240 views
33:26
ORPO: Monolithic Preference Optimization without Reference Model (Paper Explained)
Yannic Kilcher
24,324 views
21:15
Direct Preference Optimization (DPO) - How to fine-tune LLMs directly without reinforcement learning
Serrano.Academy
15,624 views
1:19:27
Stanford CS25: V3 I Retrieval Augmented Language Models
Stanford Online
183,702 views
25:21
L4 TRPO and PPO (Foundations of Deep RL Series)
Pieter Abbeel
36,225 views
18:26
Explaining the Trump Tariff Equation
Stand-up Maths
1,107,833 views
17:07
LoRA explained (and a bit about precision and quantization)
DeepFindr
85,847 views
42:49
Direct Preference Optimization (DPO)
Trelis Research
7,726 views
38:24
Proximal Policy Optimization (PPO) - How to train Large Language Models
Serrano.Academy
50,110 views
33:39
Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou
AI Engineer
9,041 views
58:07
Aligning LLMs with Direct Preference Optimization
DeepLearningAI
31,504 views
38:55
CoPE - Contextual Position Encoding: Learning to Count What's Important
Gabriel Mongaras
1,524 views
1:21:39
DeepSeek-V3
Gabriel Mongaras
21,662 views
32:46
Chinchilla Explained: Compute-Optimal Massive Language Models
Edan Meyer
20,752 views
23:56
Why Trump's tariff chaos actually makes sense (big picture)
Money & Macro
3,346,390 views
11:29
Reinforcement Learning from Human Feedback (RLHF) Explained
IBM Technology
36,141 views