MENU

Fun & Interesting

R Data Science - Cleaning Messy Data

fiiinspires 4,196 4 years ago
Video Not Working? Fix It Now

In this tutorial, we take a messy text data and wrangle it into a clean form suitable for data analysis. You will learn to think through and implement data preparation tasks using R statistical language. You will be exposed to some functions from the tidyverse package and be equipped to work efficiently with data. Some concepts covered: - install package (tidyverse) - read a comma separated (read_csv) - how to spot a pattern in text data (regex) - replace text (str_replace_all) - clean column names (janitor, clean_names) - separate column into multiple columns - visualize data (ggplot) - write a regular expression pattern for extracting text - create new columns using the mutate function - use mutate and across functions to transform existing columns - remove extra spaces (str_squish) - capitalize text (str_to_title) - extract date-timestamp from text with the lubridate package Note: The data used is a subset of the emergency data available on Kaggle **Downloads** Data: https://www.kaggle.com/mchirico/montcoalert/data Code & Data: https://drive.google.com/file/d/1AyBMHXA5cv1bKX_Nd2WlxAQIBOLKKYd4/view?usp=sharing

Comment