What is Batch Normalization? Why is it important in Neural networks? We get into math details too. Code in references.
Follow me on M E D I U M: https://towardsdatascience.com/likelihood-probability-and-the-math-you-should-know-9bf66db5241b
REFERENCES
[1] 2015 paper that introduced Batch Normalization: https://arxiv.org/abs/1502.03167
[2] The paper that claims Batch Norm does NOT reduce internal covariate shift as claimed in [1]: https://arxiv.org/abs/1805.11604
[3] Using BN + Dropout: https://arxiv.org/abs/1905.05928
[4] Andrew Ng on why normalization speeds up training: https://www.coursera.org/lecture/deep-neural-network/normalizing-inputs-lXv6U
[5] Ian Goodfellow on how Batch Normalization helps regularization: https://www.quora.com/Is-there-a-theory-for-why-batch-normalization-has-a-regularizing-effect
[6] Code Batch Normalization from scratch: https://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html