MENU

Fun & Interesting

Text analysis in R. Part 1: Preprocessing

Kasper Welbers 15,888 lượt xem 4 years ago
Video Not Working? Fix It Now

This is a short series of videos on the basics of computational text analysis in R. It is loosely inspired by our Text analysis in R paper (http://vanatteveldt.com/p/welbers-text-r.pdf), closely related to our R course material Github page (https://github.com/ccs-amsterdam/r-course-material), and 42% love letter to quanteda.



#### Useful links ####

# Low-level string processing:
A good place to start is by learning how to use the stringr package. (I personally prefer the stringi package because I'm used to it, but stringr is probably more accessible to most, as it has this tidyverse flair).

stringr vignette:
https://cran.r-project.org/web/packages/stringr/vignettes/stringr.html



Another great resource on stringr is the R for data science book, which also does more regular expression stuff:
https://r4ds.had.co.nz/strings.html

# Character encoding
'What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text' by David C. Zentgraf: https://kunststube.net/encoding/

'String encoding and R' by Kevin Ushey: https://kevinushey.github.io/blog/2018/02/21/string-encoding-and-r/

Comment