Building data pipelines with #python is an important skill for data engineers and data scientists. But what's the best library to use? In this video we look at three options: pandas, polars, and spark (pyspark).
Timeline:
00:00 Data Pipelines
01:11 The Data
02:32 Pandas
04:34 Polars
06:15 PySpark
09:15 Spark SQL
Follow me on twitch for live coding streams: https://www.twitch.tv/medallionstallion_
My other videos:
Speed Up Your Pandas Code: https://www.youtube.com/watch?v=SAFmrTnEHLg
Intro to Pandas video: https://www.youtube.com/watch?v=_Eb0utIRdkw
Exploratory Data Analysis Video: https://www.youtube.com/watch?v=xi0vhXFPegw
Working with Audio data in Python: https://www.youtube.com/watch?v=ZqpSb5p1xQo
Efficient Pandas Dataframes: https://www.youtube.com/watch?v=u4_c2LDi4b8
* Youtube: https://youtube.com/@robmulla?sub_confirmation=1
* Discord: https://discord.gg/HZszek7DQc
* Twitch: https://www.twitch.tv/medallionstallion_
* Twitter: https://twitter.com/Rob_Mulla
* Kaggle: https://www.kaggle.com/robikscube
#python #polars #spark #dataengineering