During this workshop, Vincent will guide you through a small project where he tried to detect programming languages that appear in transcripts from the popular Talk Python podcast. He will use spaCy for this and he will use this task as a motivating example to help kick the tires.
Timeline:
00:00:05 - Introduction PyLadies Amsterdam
00:01:51 - Introduction Vincent D. Warmerdam
00:02:40 - Introduction to spaCy
00:14:43 - Pre-trained spaCy model
00:16:30 - Using pre-trained English language spaCy model
00:27:05 - Rendering docs with displacy
00:31:37 - Detecting programming languages with spaCy
00:47:39 - spaCy universe
00:48:34 - ChatGPT and NER
00:52:33 - ChatGPT vs spaCy for NER
00:56:42 - spaCy integrations with LLMs
01:02:22 - Pre-trained spaCy models for NER
01:08:59 - GLiNER
01:12:48 - Spancat
01:14:15 - Detecting programming languages with GLiNER
01:18:46 - GLiNER spaCy integrations
01:20:15 - Closing remarks and Q&A
01:22:02 - Announcements
01:24:42 - Getting Started with NLP and spaCy Course
GitHub repo:
https://github.com/pyladiesams/nlp-projects-with-spacy-may2024
Speakers:
Vincent D. Warmerdam
https://twitter.com/fishnets88
https://www.linkedin.com/in/vincentwarmerdam
Vincent is a senior data professional who worked as an engineer, researcher, team lead, and educator in the past. You might know him from tech talks where he attempts to defend common sense over hype in data science.
Vincent is especially interested in understanding algorithmic systems so that one may prevent failure. As such, he's had a preference for simpler solutions that scale, as opposed to the latest and greatest from the hype cycle. He currently works at probabl to collaborate on new tools for the scikit-learn ecosystem and before he was a machine learning engineer at Explosion, the company behind spaCy.