In this tutorial we develop a simple yet effective topic modeling method in Python using NLP, Deep Learning, and unsupervised clustering. We use it to extract topics from reviews of drug/medication side effects in a dataset of ~3,000 patients and assign summary keywords to each topic so we can easily understand the themes and topics in the text dataset.
Although the tutorial applies it to drug side effect reviews, this model can be applied to any text dataset. For example, it could be used for topic modeling of Amazon product reviews.
We develop a method to extract or construct topics from the text by incorporating KMeans from scikit, Google's Universal Sentence Encoder, and pretrained sentiment analysis models from the Hugging Face platform.
Traditional methods include LDA, LSA, or Trucated SVD. Those methods rely more on the bag of words encodings of the text, whereas here we build a topic modeling algorithm that leverages deep learning models applied to the NLP space.
Link to GitHub with topic modeling python notebook:
https://github.com/michaelgcortes/topic-modeling