Practice your Python Pandas data science skills with problems on StrataScratch!
https://stratascratch.com/?via=keith
In this video we walk through a real world python machine learning project using the sci-kit learn library. In it we work our way to building a model that automatically classifies text as either having a positive or negative sentiment. We do this by using amazon reviews as our training data. Full video timeline in the comments!
Link to Code & Data:
https://github.com/keithgalli/sklearn
Raw Data download:
http://jmcauley.ucsd.edu/data/amazon/
Sci-kit learn documentation:
https://scikit-learn.org/stable/documentation.html
Make sure you have sci-kit learn downloaded! To do this either run "pip install sklearn" or use python through Anaconda.
Join the Python Army to get access to perks!
YouTube - https://www.youtube.com/channel/UCq6XkhO5SZ66N04IcPbqNcw/join
Patreon - https://www.patreon.com/keithgalli
---------------------------
Follow me on social media!
Instagram: https://www.instagram.com/keithgalli/
Twitter: https://twitter.com/keithgalli
To get one of the cool shirts I was wearing:
https://www.instagram.com/pagandvls/
---------------------------
Video outline!
0:00 - What we will be doing!
3:40 - Sci-Kit Learn Overview
6:38 - How do we find training data?
9:33 - Download data
11:45 - Load our data into Jupyter Notebook
16:38 - Cleaning our code a bit (building data class)
20:13 - Using Enums
22:50 - Converting text to numerical vectors, bag of words (BOW) explanation
25:45 - Training/Test Split (make sure to "pip install sklearn" !)
33:45 - Bag of words in sklearn (CountVectorizer)
40:05 - fit_transform, fit, transform methods
42:05 - Model Selection (SVM, Decision Tree, Naive Bayes, Logistic Regression) & Classification
47:50 - predict method
53:35 - Analysis & Evaluation (using clf.score() method)
56:58 - F1 score
1:01:01 - Improving our model (evenly distributing positive & negative examples and loading in more data)
1:20:36 - Let's see our model in action! (qualitative testing)
1:22:24 - Tfidf Vectorizer
1:25:40 - GridSearchCv to automatically find the best parameters
1:31:30 - Further NLP improvement opportunities
1:32:50 - Saving our model (Pickle) and reloading it later
1:36:37 - Category Classifier
1:39:14 - Confusion Matrix
---------------------
If you are curious to learn how I make my tutorials, check out this video: https://youtu.be/LEO4igyXbLs
*I use affiliate links on the products that I recommend. I may earn a purchase commission or a referral bonus from the usage of these links.