Nesterov Accelerated Gradient from Scratch in Python

Deep Learning with Yacine 5,183 5 years ago

Video Not Working? Fix It Now

Momentum is great, however if the gradient descent steps could slow down when it gets to the bottom of a minima that would be even better. This is Nesterov Accelerated Gradient in a nutshell, check it out! Code can be found here: https://github.com/yacineMahdid/artificial-intelligence-and-machine-learning ## Credit Check out this blogpost for more gradient descent explanation: https://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient The music is taken from Youtube music! ## Table of Content - Introduction: - Theory: - Python Implementation: - Conclusion: Here is an explanation of Nesterov Accelerated Gradient from that very cool blogpost mentioned in the credit section (check it out!): "Nesterov accelerated gradient (NAG) [see reference] is a way to give our momentum term this kind of prescience. We know that we will use our momentum term γvt−1 to move the parameters θ. Computing θ−γvt−1 thus gives us an approximation of the next position of the parameters (the gradient is missing for the full update), a rough idea where our parameters are going to be. We can now effectively look ahead by calculating the gradient not w.r.t. to our current parameters θ but w.r.t. the approximate future position of our parameters:" ## Reference Nesterov, Y. (1983). A method for unconstrained convex minimization problem with the rate of convergence o(1/k2). Doklady ANSSSR (translated as Soviet.Math.Docl.), vol. 269, pp. 543– 547 ---- Join the Discord for general discussion: https://discord.gg/QpkxRbQBpf ---- Follow Me Online Here: Twitter: https://twitter.com/CodeThisCodeTh1 GitHub: https://github.com/yacineMahdid LinkedIn: https://www.linkedin.com/in/yacine-mahdid-809425163/ Instagram: https://www.instagram.com/yacine_mahdid/ ___ Have a great week! 👋

Comment