Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.
Feature hashing
https://en.wikipedia.org/wiki/Feature_hashing
🟢Get all my free data science interview resources
https://www.emmading.com/resources
🟡 Product Case Interview Cheatsheet https://www.emmading.com/product-case-cheat-sheet
🟠 Statistics Interview Cheatsheet https://www.emmading.com/statistics-interview-cheat-sheet
🟣 Behavioral Interview Cheatsheet https://www.emmading.com/behavioral-interview-cheat-sheet
🔵 Data Science Resume Checklist https://www.emmading.com/data-science-resume-checklist
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: https://www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
https://www.linkedin.com/in/emmading001/
====================
Contents of this video:
====================
00:00 Introduction
00:48 Categorical Data
02:22 Ordinal Features & Class Labels
03:38 One-Hot Encoding
05:32 Dummy Encoding
06:30 Problems of One-Hot & Dummy Encoding
07:26 Feature Hashing