From R/Medicine Conference 2022
Peter D.R. Higgins, MD, Ph.D., MSc, Director of Inflammatory Bowel Disease (IBD) Program at the University of Michigan.
Deck: https://speakerdeck.com/higgi13425/big-data-with-arrow-and-duckdb
Sections
0:00 Introduction
0:40 Starting point
1:09 The motivating problem
2:10 The data
3:08 Options
4:25 Lots to like about {data.table}
5:23 Data on disk vs data in ram
6:37 How to wrangle bigger-than-RAM data in R?
8:15 Speed-wrangling
9:42 What about the bigger-than-RAM problem?
10:19 Let’s try it out
11:35 What if data are still bigger-than-RAM?
15:42 Back to the question…
16:19 There’s always that (more than) one guy
16:43 Take home points - speed
17:15 Take home points - bigger-than-RAM data
18:12 Closing
More Resources
Main Site: https://www.r-consortium.org/
News: https://www.r-consortium.org/news
Blog: https://www.r-consortium.org/news/blog
Join: https://www.r-consortium.org/about/join
Twitter: https://twitter.com/Rconsortium
LinkedIn: https://www.linkedin.com/company/r-consortium/