MENU

Fun & Interesting

K-means Clustering From Scratch In Python [Machine Learning Tutorial]

Dataquest 93,953 3 years ago
Video Not Working? Fix It Now

In this project, we'll build a k-means clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. K-means is one of the most popular forms of clustering. We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikit-learn. You can find the full project code here - https://github.com/dataquestio/project-walkthroughs/tree/master/kmeans . You can download the data here - https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset . Project Steps - Write out pseudocode for the algorithm - Code the k-means algorithm - Plot the clusters from the algorithm - Compare performance to the scikit-learn algorithm Chapters 00:00 Intro 00:37 k-means overview 02:51 Loading in and cleaning FIFA data 06:11 Scaling the data 10:31 Initialize random centroids 14:20 Finding cluster labels for each data point 19:29 Update centroid values 23:30 Plotting k-means iterations 28:24 Pulling the algorithm together 35:25 Comparing our implementation to scikit-learn 37:56 Conclusion and next steps ------------------------------ Join 1M+ Dataquest learners today! Master data skills and change your life. Sign up for free: https://bit.ly/3O8MDef

Comment