Handling Categorical Data in Machine Learning: Easy Explanation for Data Science Interviews

Emma Ding 7,937 lượt xem 2 years ago

Video Not Working? Fix It Now

Handling categorical data in machine learning projects is a very common topic in data science interviews. In this video, I’ll cover the difference between treating a variable as a dummy variable vs. a non-dummy variable, how you can deal with categorical features when the number of levels is very large, and the pros and cons of various strategies.

Feature hashing
https://en.wikipedia.org/wiki/Feature_hashing

🟢Get all my free data science interview resources
https://www.emmading.com/resources
🟡 Product Case Interview Cheatsheet https://www.emmading.com/product-case-cheat-sheet
🟠 Statistics Interview Cheatsheet https://www.emmading.com/statistics-interview-cheat-sheet
🟣 Behavioral Interview Cheatsheet https://www.emmading.com/behavioral-interview-cheat-sheet
🔵 Data Science Resume Checklist https://www.emmading.com/data-science-resume-checklist

✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: https://www.emmading.com/coaching

// Comment
Got any questions? Something to add?
Write a comment below to chat.

// Let's connect on LinkedIn:
https://www.linkedin.com/in/emmading001/

====================
Contents of this video:
====================
00:00 Introduction
00:48 Categorical Data
02:22 Ordinal Features & Class Labels
03:38 One-Hot Encoding
05:32 Dummy Encoding
06:30 Problems of One-Hot & Dummy Encoding
07:26 Feature Hashing

Data Science

Data Science Interview

Emma Ding

Data Interview Pro

categorical data

Machine Learning

Handling Categorical Data

Machine Learning Interview

data science interview prep

data science career

data science interview preparation

Comment