Momentum is great, however if the gradient descent steps could slow down when it gets to the bottom of a minima that would be even better. This is Nesterov Accelerated Gradient in a nutshell, check it out!
Code can be found here: https://github.com/yacineMahdid/artificial-intelligence-and-machine-learning
## Credit
Check out this blogpost for more gradient descent explanation: https://ruder.io/optimizing-gradient-descent/index.html#nesterovacceleratedgradient
The music is taken from Youtube music!
## Table of Content
- Introduction:
- Theory:
- Python Implementation:
- Conclusion:
Here is an explanation of Nesterov Accelerated Gradient from that very cool blogpost mentioned in the credit section (check it out!):
"Nesterov accelerated gradient (NAG) [see reference] is a way to give our momentum term this kind of prescience. We know that we will use our momentum term γvt−1 to move the parameters θ. Computing θ−γvt−1 thus gives us an approximation of the next position of the parameters (the gradient is missing for the full update), a rough idea where our parameters are going to be. We can now effectively look ahead by calculating the gradient not w.r.t. to our current parameters θ but w.r.t. the approximate future position of our parameters:"
## Reference
Nesterov, Y. (1983). A method for unconstrained convex minimization problem with the rate of convergence o(1/k2). Doklady ANSSSR (translated as Soviet.Math.Docl.), vol. 269, pp. 543– 547
----
Join the Discord for general discussion: https://discord.gg/QpkxRbQBpf
----
Follow Me Online Here:
Twitter: https://twitter.com/CodeThisCodeTh1
GitHub: https://github.com/yacineMahdid
LinkedIn: https://www.linkedin.com/in/yacine-mahdid-809425163/
Instagram: https://www.instagram.com/yacine_mahdid/
___
Have a great week! 👋