Entropy, Cross-Entropy and KL-Divergence are often used in Machine Learning, in particular for training classifiers. In this short video, you will understand where they come from and why we use them in ML.
Paper:
- "A mathematical theory of communication", Claude E. Shannon, 1948, http://pubman.mpdl.mpg.de/pubman/item/escidoc:2383164/component/escidoc:2383163/Shannon_Weaver_1949_Mathematical.pdf
Errata:
* At 5:05, the sign is reversed on the second line, it should read: "Entropy = -0.35 log2(0.35) - ... - 0.01 log2(0.01) = 2.23 bits"
* At 8:43, the sum of predicted probabilities should always add up to 100%. Just pretend that I wrote, say, 23% instead of 30% for the Dog probability and everything's fine.
The painting on the first slide is by Annie Clavel, a great French artist currently living in Los Angeles. The painting is reproduced with her kind authorization. Please visit her website: http://www.annieclavel.com/.