MENU

Fun & Interesting

Using the {arrow} and {duckdb} packages to wrangle medical datasets that are Larger than RAM

R Consortium 8,528 3 years ago
Video Not Working? Fix It Now

From R/Medicine Conference 2022 Peter D.R. Higgins, MD, Ph.D., MSc, Director of Inflammatory Bowel Disease (IBD) Program at the University of Michigan. Deck: https://speakerdeck.com/higgi13425/big-data-with-arrow-and-duckdb Sections 0:00 Introduction 0:40 Starting point 1:09 The motivating problem 2:10 The data 3:08 Options 4:25 Lots to like about {data.table} 5:23 Data on disk vs data in ram 6:37 How to wrangle bigger-than-RAM data in R? 8:15 Speed-wrangling 9:42 What about the bigger-than-RAM problem? 10:19 Let’s try it out 11:35 What if data are still bigger-than-RAM? 15:42 Back to the question… 16:19 There’s always that (more than) one guy 16:43 Take home points - speed 17:15 Take home points - bigger-than-RAM data 18:12 Closing More Resources Main Site: https://www.r-consortium.org/ News: https://www.r-consortium.org/news Blog: https://www.r-consortium.org/news/blog Join: https://www.r-consortium.org/about/join Twitter: https://twitter.com/Rconsortium LinkedIn: https://www.linkedin.com/company/r-consortium/

Comment