In this project, we'll build a k-means clustering algorithm from scratch. Clustering is an unsupervised machine learning technique that can find patterns in your data. K-means is one of the most popular forms of clustering.
We'll create our algorithm using python and pandas. We'll then compare it to the reference implementation from scikit-learn.
You can find the full project code here - https://github.com/dataquestio/project-walkthroughs/tree/master/kmeans .
You can download the data here - https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset .
Project Steps
- Write out pseudocode for the algorithm
- Code the k-means algorithm
- Plot the clusters from the algorithm
- Compare performance to the scikit-learn algorithm
Chapters
00:00 Intro
00:37 k-means overview
02:51 Loading in and cleaning FIFA data
06:11 Scaling the data
10:31 Initialize random centroids
14:20 Finding cluster labels for each data point
19:29 Update centroid values
23:30 Plotting k-means iterations
28:24 Pulling the algorithm together
35:25 Comparing our implementation to scikit-learn
37:56 Conclusion and next steps
------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: https://bit.ly/3O8MDef