Cleaning and manipulating data with the tidyverse: dplyr, readr, and stringr in action (CC121)
Data cleaning is one of the more undervalued steps in a data anlaysis. In this episode we'll use a variety of functions from the tidyverse to get three data frames into the right format and then we'll join them all together. This will help us get ready for downstream analyses looking for microbiome-based biomarkers associated with colorectal cancer.
In this episode, Pat will use the #tidyverse in #RStudio. The accompanying blog post can be found at https://www.riffomonas.org/code_club/2021-06-30-data-cleaning.
If you're interested in taking an upcoming 3 day R workshop, email me at riffomonas@gmail.com!
R: https://r-project.org
RStudio: https://rstudio.com
Raw data: https://github.com/riffomonas/raw_data/releases/latest
Workshops: https://www.mothur.org/wiki/workshops
You can also find complete tutorials for learning R with the tidyverse using...
Microbial ecology data: https://www.riffomonas.org/minimalR/
General data: https://www.riffomonas.org/generalR/
0:00 Introduction
2:29 Tidying a mothur shared file
6:21 Formatting a taxonomy file
15:14 Calculating genus relative abundances
17:39 Formatting metadata and joining to relative abundances
23:31 Committing changes
24:26 Recap