A simple neural network for computer vision | CV from scratch series

Vizuara 2,269 1 month ago

Video Not Working? Fix It Now

Miro notes: https://miro.com/app/board/uXjVIPolTio=/?share_link_id=225966248894 Numpy vs Tensorflow Colab: https://colab.research.google.com/drive/1WaFuI2T5gUvaewAM3XiLiIrogaRLjROV?usp=sharing Linear neural network for image classification google colab: https://colab.research.google.com/drive/15D91NChzSrEM2kTr9E8GI6l1FGvu6men?usp=sharing Substack article on this topic by Vizuara AI: https://open.substack.com/pub/aivizuara/p/a-simple-neural-network-with-1-hidden?r=2nt2pq&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true ***** The Power of Simplicity: Shallow Neural Networks for Image Classification Neural networks have revolutionized how machines interpret and classify images. But what if we strip them down to their simplest form? In this article, we explore a shallow neural network for image classification—without any activation functions or convolutions. While deep learning dominates modern AI, this exercise will demonstrate that even simple models can be surprisingly effective. We’ll reveal why activation functions and depth improve accuracy, but also why a basic model is not disastrous and still captures meaningful information. Preparing the Dataset We will use the famous 5-Flowers Dataset, which contains real-world images of: Daisies Dandelions Roses Sunflowers Tulips Each category has 1000 JPEG images, just like those captured on your smartphone. The dataset is publicly available on GitHub and Kaggle. Reading and Preprocessing the Data To process these images, we: Load the files: Read JPEG images from disk. Decode: Convert JPEGs into pixel values (RGB channels). Scale: Normalize pixel values to a range of [0,1] instead of [0,255]. Resize: Ensure all images are of the same dimensions (224×224 pixels). It is important to verify the dataset pipeline by displaying sample images to ensure that they are correctly loaded and formatted. Building a Shallow Neural Network (No Activation, No Convolution) Understanding the Model A shallow neural network treats each pixel as an independent feature. With an image size of 224×224×3, the model has over 150,000 input neurons feeding into a fully connected layer that outputs a probability for each flower category. The absence of activation functions means that the network operates linearly, without introducing non-linearity in feature representation. Training the Model The model is trained using an optimization algorithm that adjusts weights to minimize classification errors. Training typically runs for several epochs, where the model continuously improves its understanding of the dataset by adjusting weights based on feedback from its predictions. Evaluating Model Performance After training, we analyze the loss and accuracy over epochs to assess how well the model has learned from the data. Validation accuracy gives insights into whether the model generalizes well to unseen data. Making Predictions Once trained, the model can classify new images. Sample images from the evaluation set can be tested to verify how accurately the model predicts the correct flower categories. Comparing predictions against actual labels helps assess the model’s effectiveness. Observations and Takeaways Our shallow model achieves around 40-45% accuracy, which is above random guessing (~20% for 5 classes). While not on par with deep convolutional models, it is far from useless. The model still learns meaningful patterns from pixel values, despite its simplicity. Why is this result NOT disastrous? Baseline for Improvement: This model gives us a reference point before adding complexity. Demonstrates Learning Ability: Despite lacking depth, it captures some information useful for classification. Lightweight and Fast: Such models are computationally inexpensive and can serve as quick prototypes. How to Improve the Model Use a Deep Neural Network (DNN): Adding multiple layers allows the model to learn hierarchical features. Introduce Non-Linear Activation Functions: Functions like ReLU allow the network to capture complex patterns. Implement a Convolutional Neural Network (CNN): Convolutions help the model recognize spatial relationships between pixels. What’s Next? In the next article, we will build a deep neural network to compare performance improvements. Stay tuned! Check out the Google Colab implementation to experiment with the model. Watch the lecture video on Vizuara’s YouTube channel! Thanks for reading Vizuara’s AI Newsletter. Subscribe for free to get more hands-on AI tutorials!

Comment