We start with the whats/whys/hows. Then delve into details (math) with examples.
Follow me on M E D I U M: https://towardsdatascience.com/likelihood-probability-and-the-math-you-should-know-9bf66db5241b
REFERENCES
[1] Amazing discussion on the "dying relu problem": https://www.quora.com/What-is-the-dying-ReLU-problem-in-neural-networks
[2] Saturating functions that "squeeze" inputs: https://stats.stackexchange.com/questions/174295/what-does-the-term-saturating-nonlinearities-mean
[3] Plot math functions beautifully with desmos: https://www.desmos.com/
[4] The paper on Exponential Linear units (ELU): https://arxiv.org/abs/1511.07289
[5] Relatively new activation function (swish): https://arxiv.org/pdf/1710.05941v1.pdf
[6] Used an Image of activation functions from this Pawan Jain's Blog: https://towardsdatascience.com/complete-guide-of-activation-functions-34076e95d044
[7] Why bias in Neural Networks? https://stackoverflow.com/questions/7175099/why-the-bias-is-necessary-in-ann-should-we-have-separate-bias-for-each-layer