In this video, we are going to learn HDBSCAN, which is a density-based algorithm for clustering. Then, we will apply it to find clusters of weekly sales transactions. HDBSCAN can be used with any distance metric, but we will use two only: Euclidean and Dynamic Time Warping (DTW). We will see how the clustering results differ between the distance formulas.
Source code:
https://www.kaggle.com/code/leesstephanie/sales-clustering-with-hdbscan/notebook (real data set)
https://www.kaggle.com/leesstephanie/hdbscan-for-time-series-clustering (synthetic data set)
https://github.com/stephanielees/HDBSCAN_WeeklySales
More explanation about linkage: https://youtube.com/clip/Ugkx1r3GK144oS4SAi-2L2f2INBNc8D9z39_?si=GXSrJF_7bVQ_X4t9
00:00 Intro
01:16 The intuition of HDBSCAN
01:53 Preparing for going through HDBSCAN algorithm
05:58 Core distance
07:18 Mutual reachability distance
10:38 Minimum Spanning Tree
11:26 Single Linkage Tree, Condensed Tree
20:39 Cluster selection
Application with Python:
24:09 Load data
26:51 Visualization
28:39 Apply HDBSCAN with Euclidean distance
33:35 Apply HDBSCAN with DTW distance
37:07 Discussion
#timeseries #clustering #machinelearning #retailsales #sales #datascience #pythonprogramming #timeseriesclustering