Lecture notes: http://learning.stat.purdue.edu/mlss/_media/mlss/bottou.pdf
Large-scale Machine Learning and Stochastic Algorithms
During the last decade, data sizes have outgrown processor speed. We are now frequently facing statistical machine learning problems for which datasets are virtually infinite. Computing time is then the bottleneck.
The first part of the lecture centers on the qualitative difference between small-scale and large-scale learning problem. Whereas small-scale learning problems are subject to the usual approximation--estimation tradeoff, large-scale learning problems are subject to a qualitatively different tradeoff involving the computational complexity of the underlying optimization algorithms in non-trivial ways. Unlikely optimization algorithm such as stochastic gradient show amazing performance for large-scale machine learning problems.
The second part makes a detailed overview of stochastic learning algorithms applied to both linear and nonlinear models. In particular I would like to spend time on the use of stochastic gradient for structured learning problems and on the subtle connection between nonconvex stochastic gradient and active learning.
See other lectures at Purdue MLSS Playlist: http://www.youtube.com/playlist?list=PL2A65507F7D725EFB&feature=view_all