MENU

Fun & Interesting

Outlier detection and removal: z score, standard deviation | Feature engineering tutorial python # 3

codebasics 126,035 5 years ago
Video Not Working? Fix It Now

If we have a dataset that follows normal distribution than we can use 3 or more standard deviation to spot outliers in the dataset. Many times these are legitimate values and it really depends on the situation if you want to remove them or not. But removing outliers can significantly increase the statistical power of machine learning model hence it is recommended that you treat outliers before building a model. Z score indicates how many standard deviation away a given sample is. We are going to go through all this theory and write python code to remove outliers from heights dataset that I have taken it from kaggle. Link for kaggle dataset: https://www.kaggle.com/mustafaali96/weight-height Code & Exercise: https://github.com/codebasics/py/blob/master/ML/FeatureEngineering/2_outliers_z_score/2_outliers_z_score.ipynb CSV file for exercise: https://github.com/codebasics/py/tree/master/ML/FeatureEngineering/2_outliers_z_score/Exercise Topics 00:00 Introduction 00:20 Exploratory analysis on a kaggle dataset 01:14 Plot histogram and bell curve 06:30 Use 3 standard deviation to remove outliers 12:14 Use Z score to remove outliers 17:39 Exercise Do you want to learn technology from me? Check https://codebasics.io/ for my affordable video courses. Website: https://codebasics.io/ Facebook: https://www.facebook.com/codebasicshub Twitter: https://twitter.com/codebasicshub

Comment