00:00:00 - Using Huggingface
00:03:24 - Finetuning pretrained model
00:05:14 - ULMFit
00:09:15 - Transformer
00:10:52 - Zeiler & Fergus
00:14:47 - US Patent Phrase to Phase Matching Kaggle competition
00:16:10 - NLP Classification
00:20:56 - Kaggle configs, insert python in bash, read competition website
00:24:51 - Pandas, numpy, matplotlib, & pytorch
00:29:26 - Tokenization
00:33:20 - Huggingface model hub
00:36:40 - Examples of tokenized sentences
00:38:47 - Numericalization
00:41:13 - Question: rationale behind how input data was formatted
00:43:20 - ULMFit fits large documents easily
00:45:55 - Overfitting & underfitting
00:50:45 - Splitting the dataset
00:52:31 - Creating a good validation set
00:57:13 - Test set
00:59:00 - Metric vs loss
01:01:27 - The problem with metrics
01:04:10 - Pearson correlation
01:10:27 - Correlation is sensitive to outliers
01:14:00 - Training a model
01:19:20 - Question: when is it ok to remove outliers?
01:22:10 - Predictions
01:25:30 - Opportunities for research and startups
01:26:16 - Misusing NLP
01:33:00 - Question: isn’t the target categorical in this case?
Transcript thanks to wyquek, jmp, bencoman, fmussari, mike.moloch, amr.malik, kurianbenoy, gagan, and Raymond Wu on forums.fast.ai.
Timestamps thanks to RogerS49 and Wyquek on forums.fast.ai.