MENU

Fun & Interesting

BERTopic for Topic Modeling - Maarten Grootendorst - Talking Language AI Ep#1

Cohere 33,433 2 years ago
Video Not Working? Fix It Now

Go in-depth into BERTopic with creator Maarten Grootendorst. We explore three important pillars of the package, modularity, variations, and visualizations. Each of the pillars demonstrates how BERTopic gives control back to the developer allowing for a one-stop-shop of topic modeling. This video also demonstrate BERTopic's basic capabilities and some advanced tricks that new and advanced users of BERTopic may enjoy. Maarten is Open Source Developer and Maintainer (BERTopic, PolyFuzz, KeyBERT), Data Scientist, Psychologist. === Join the Cohere Discord: https://discord.gg/co-mmunity Discussion thread for this episode (feel free to ask questions): https://discord.com/channels/954421988141711382/1032682672230768681 Maarten on Twitter: https://twitter.com/MaartenGr BERTopic: https://maartengr.github.io/BERTopic/ BERTopic on Github: https://github.com/MaartenGr/BERTopic BERTopic paper: https://arxiv.org/abs/2203.05794 ==== Contents 0:00 Introduction 0:54 Maarten's introduction 1:44 BERTopic installation 3:19 What is Topic Modeling? 4:57 How BERTopic approaches Topic Modeling 9:04 Modularity, use the components you want (BERTopic Pillar #1) 11:17 Code demo of BERTopic 16:55 Visualization (BERTopic Pillar #2) 23:19 Variations on the pipeline (BERTopic Pillar #3) 29:44 Tips on evaluating topic modeling 31:42 Should a document have more than one topic? 33:33 Short texts vs. long texts in BERTopic 35:17 API Design philosophy 38:51 Intro to KeyBERT 40:41 Intro to PolyFuzz 42:15 Multilingual text in BERTopic 43:03 Dealing with the (-1) noise cluster 43:59 How BERTopic compares to LDA or Top2vec 46:26 What happens after topic modeling? Is it used in online systems? 48:00 Using GPT language models in the pipeline 49:44 How people can help BERTopic

Comment