Miro notes: https://miro.com/app/board/uXjVIPXuHKk=/?share_link_id=843762054496
Colab code: https://colab.research.google.com/drive/1m4JuGfPdqL59SF1X_6lcRT-NEKy4azov?usp=sharing
Building a Two-Layer Neural Network from Scratch for Image Classification: A Step in the Computer Vision Journey
In this lecture of Computer Vision from Scratch, we take a meaningful leap from linear models to a slightly deeper neural network. The goal is to test whether adding hidden layers and non-linearity helps us improve image classification accuracy on the Five Flowers Dataset — which includes daisies, dandelions, roses, sunflowers, and tulips.
From Linear Models to Neural Networks
Previously, we used a simple linear model that flattened the image input and directly connected it to the output layer using a softmax function. While that allowed us to classify flower images to some extent, the model was limited in its capacity to learn complex, nonlinear patterns in image data. Accuracy ranged between 0.4 to 0.6 on the training set, and validation accuracy fluctuated heavily — a clear sign of instability and overfitting.
The Shift to a Two Hidden Layer Neural Network
This time, we moved toward building a neural network with two hidden layers:
Flatten Layer: Converts each 224x224 RGB image into a 1D vector (150k nodes).
Dense Layer with 128 Neurons: Introduced as the second hidden layer.
Output Layer: Consists of 5 nodes (for 5 flower classes) with a softmax activation function.
ReLU Activation: Used in the hidden dense layer to introduce non-linearity.
Why Activation Functions Matter
Without activation functions, a deep network remains equivalent to a single-layer linear model, as the multiple matrix multiplications collapse into one. This is why ReLU (Rectified Linear Unit) was introduced — to allow the model to learn nonlinear relationships in the image data.
Parameter Explosion
The new architecture increased the number of trainable parameters significantly — from 750,000 in the linear model to approximately 15 million in the two-layer neural network. This theoretically should improve model capacity, but performance doesn't always scale linearly with size.
Training Observations
Using the Adam optimizer with a learning rate of 0.001, we trained both models on batches of 16 images. While the loss function dropped dramatically (from tens to single digits), classification accuracy did not significantly improve. This may seem counterintuitive, but here's why:
Lower Loss, Same Accuracy: Even though the model’s predictions became more confident (lower cross-entropy loss), they weren’t necessarily more accurate. If a model's confidence in the correct class increases (e.g., from 60% to 95%), loss reduces, but accuracy remains unchanged.
Hyperparameter Experiments
To investigate further, we explored how varying image size and batch size impacts model performance:
Smaller images led to faster training but possibly reduced performance due to loss of detail. Larger batch sizes smoothed the loss curves but didn’t dramatically improve results.
Key Takeaways
Activation functions are crucial for introducing non-linearity and unlocking the power of deep neural networks.
Deeper isn’t always better — especially when limited by dataset size and computational resources.
Loss and accuracy are not the same — you can improve one without the other.
Hyperparameter tuning (batch size, image resolution, learning rate) plays a massive role in model performance and needs careful experimentation.
We are still learning from scratch — starting with modest accuracy is expected and acceptable.
What’s Next?
In the upcoming lecture, we’ll dive into regularization techniques like dropout to combat overfitting and experiment with deeper architectures. We'll also explore transfer learning, where we leverage pretrained models to improve performance without starting from zero.
If you’ve made it this far, give yourself a pat on the back. This was a dense but pivotal lecture. You’ve just built your first true neural network from scratch. Stick around — the journey is only beginning.