Earth Mover's Distance and Maximum Mean Discrepancy | Unsupervised Learning for Big Data

Krishnaswamy Lab 5,015 3 years ago

Video Not Working? Fix It Now

Much of how we make sense of datapoints is by figuring out how close and far they are to other datapoints. But what happens when, as is increasingly frequent, our datapoints are actually datasets? How do you take the distance between two patients, when you have hundreds of measurements about each patient? This lecture presents an elegant solution to the problem: the Earth Mover's Distance. Alas, EMD is almost always computationally infeasible, so we present a shortcut to a (partial) truth, via the Maximum Mean Discrepancy, which is another application of the absurdly useful Kernel Trick. This is a part of a series of lectures from the Yale class "Unsupervised Learning for Big Data", taught by Professor Smita Krishnaswamy. Unsupervised learning is perhaps the most beautiful and most frequently astonishing area of machine learning. It doesn't need to guzzle tons of labeled data to solve problems by brute force. Instead, it uses elegant mathematical principles to understand (in some sense) the data itself and the patterns underlying it. Because this is a young field, there's no established textbook. The field of unsupervised learning is a collection of methods, and this course is an introduction to several of the most useful techniques, grounded in an intuitive understanding of the principles underlying them. The tools from this class have been applied to an incredible range of problems, from molecular biology, to financial modeling, to medicine and even astrophysics. We're making these lectures publicly available in an effort to make it easier for anyone to make use of these powerful and elegant techniques in their own research. To learn more about the Krishnaswamy Lab's work, visit krishnaswamylab.org

Comment