Text data, also known as character data in R is a common data type that often requires significant preprocessing and data cleaning before you can use it for analysis and modeling. Text data is often referred to as string data (character strings) in other programming languages. In this lesson we learn basic R functions for dealing with text data as well as the basics of regular expressions, a powerful tool for matching patterns within text.
This is lesson 14 of a 30-part introduction to the R programming language for data analysis and predictive modeling. Link to the code notebook below:
Introduction to R: Working With Character Strings https://www.kaggle.com/hamelg/intro-to-r-part-14-working-with-character-strings
This guide does not assume any prior exposure to R, programming or data science. It is intended for beginners with an interest in data science and those who might know other programming languages and would like to learn R.
I will create the videos for this guide such that you should be able to learn a lot just watching on YouTube, but to get the most out of the guide, it is recommended that you create a Kaggle account so that you can fork and edit each lesson so that you can follow along and run code yourself.
Follow DataDaft on social media for news and updates:
Twitter: https://twitter.com/DataDaft
Introduction to R Playlist:
https://www.youtube.com/playlist?list=PLiC1doDIe9rDjk9tSOIUZJU4s5NpEyYtE