MENU

Fun & Interesting

Tutorial: Working with larger than memory data in R with Arrow and DuckDB

LatinR 373 5 months ago
Video Not Working? Fix It Now

While datasets are growing larger, recent advances in technologies such as Apache Arrow and DuckDB are making the analysis of datasets that used to require complex infrastructure accessible to anyone. Using the {arrow}, {duckdb}, and {duckplyr} packages opens up the door to analyzing gigabytes of data in seconds using the same interface as with the {tidyverse}. By learning just a few concepts, R users can enjoy working easily with larger-than-memory datasets directly from their everyday computer. In this tutorial, will analyze real data to explore formats used to store these large datasets on disks, how Arrow and DuckDB can be leveraged to analyze data, and how these tools integrate with the {tidyverse} interface. After attending this tutorial, learners will: - Understand when using Arrow or DuckDB can help speed up a data analysis - Describe how Arrow and DuckDB can work with datasets that are larger than memory - Recognize the type of data manipulations that benefit the most from leveraging tools like Arrow and DuckDB - Decide which package ({arrow}, {duckdb}, or {duckplyr}) is best suited for their data analysis - Develop their own data analysis using Arrow or DuckDB This tutorial is aimed at everyone who needs to analyze datasets that are larger than the memory they have available on their everyday computer or who is interested in learning how to speed up the analysis of large datasets. Participants who don’t have access to HPC will particularly benefit from this tutorial as the tools used can easily be installed on a regular laptop and provide good performance. Tutor François Michonneau is an educator who loves to work with data and putting R in production. He has been using R for over 20 years and maintains several packages on CRAN. After being part of the leadership at The Carpentries for 5 years, he worked at Voltron Data for a couple of years. He's currently looking for his next role.

Comment