Go in-depth into BERTopic with creator Maarten Grootendorst. We explore three important pillars of the package, modularity, variations, and visualizations. Each of the pillars demonstrates how BERTopic gives control back to the developer allowing for a one-stop-shop of topic modeling. This video also demonstrate BERTopic's basic capabilities and some advanced tricks that new and advanced users of BERTopic may enjoy.
Maarten is Open Source Developer and Maintainer (BERTopic, PolyFuzz, KeyBERT), Data Scientist, Psychologist.
===
Join the Cohere Discord: https://discord.gg/co-mmunity
Discussion thread for this episode (feel free to ask questions): https://discord.com/channels/954421988141711382/1032682672230768681
Maarten on Twitter: https://twitter.com/MaartenGr
BERTopic: https://maartengr.github.io/BERTopic/
BERTopic on Github: https://github.com/MaartenGr/BERTopic
BERTopic paper: https://arxiv.org/abs/2203.05794
====
Contents
0:00 Introduction
0:54 Maarten's introduction
1:44 BERTopic installation
3:19 What is Topic Modeling?
4:57 How BERTopic approaches Topic Modeling
9:04 Modularity, use the components you want (BERTopic Pillar #1)
11:17 Code demo of BERTopic
16:55 Visualization (BERTopic Pillar #2)
23:19 Variations on the pipeline (BERTopic Pillar #3)
29:44 Tips on evaluating topic modeling
31:42 Should a document have more than one topic?
33:33 Short texts vs. long texts in BERTopic
35:17 API Design philosophy
38:51 Intro to KeyBERT
40:41 Intro to PolyFuzz
42:15 Multilingual text in BERTopic
43:03 Dealing with the (-1) noise cluster
43:59 How BERTopic compares to LDA or Top2vec
46:26 What happens after topic modeling? Is it used in online systems?
48:00 Using GPT language models in the pipeline
49:44 How people can help BERTopic