MENU

Fun & Interesting

What is AWS EMR | Extract and Transform Redfin data with AWS EMR | EMR Studio | Pyspark Notebook

tuplespectra 14,476 2 years ago
Video Not Working? Fix It Now

#dataengineering #emr #spark #pyspark #jupyterlab #jupyternotebook #aws #emrstudio #etlpipeline #redfin In this video, I explained what Amazon EMR (Elastic MapReduce) is all about and its benefits in processing big data. I then showed how you can create VPC and then spin up EMR clusters within this VPC. Later, I showed you how to create Amazon EMR studio and Jupyterlab after which I attached the Jupyter notebook to the provisioned cluster. I then showed how to write Pyspark code in the Jupyter notebook attached to the provisioned EMR to extract data from the Redfin data source, process it and load the transformed data as parquet file into an S3 bucket. Please don’t forget to LIKE, SHARE, COMMENT and SUBSCRIBE to our channel for more AWESOME videos. **Books I recommend** 1. Grit: The Power of Passion and Perseverance https://amzn.to/3EZKSgb 2. Think and Grow Rich!: The Original Version, Restored and Revised: https://amzn.to/3Q2K68s 3. The Book on Rental Property Investing: How to Create Wealth With Intelligent Buy and Hold Real Estate Investing: https://amzn.to/3LLpXRy 4. How to Invest in Real Estate: The Ultimate Beginner's Guide to Getting Started: https://amzn.to/48RbuOb 5. Introducing Python: Modern Computing in Simple Packages https://amzn.to/3Q4driR 6. Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter 3rd Edition: https://amzn.to/3rGF73G ***************** Commands used in this video ***************** Check out my github Repo https://github.com/YemiOla/data_engineering_redfin_emr ***************** USEFUL LINKS ***************** 1. Redfin Analytics|python ETL pipeline with airflow|Data Engineering Project|Snowpipe|Snowflake|Part 1 https://www.youtube.com/watch?v=NWZrBEnJ6Us 2. Redfin Analytics|python ETL pipeline with airflow|Data Engineering Project|Snowpipe|Snowflake|Part 2 https://www.youtube.com/watch?v=QKCsWpygBrg&t=1s 3. Zillow Data Analytics (RapidAPI) | End-To-End Python ETL Pipeline | Data Engineering Project |Part 1 https://www.youtube.com/watch?v=j_skupZ3zw0 4. https://www.redfin.com/news/data-center/ 5. https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-overview-benefits.html 6. PostgreSQL Playlist: https://www.youtube.com/watch?v=oFaLUCWRnRE&list=PLACD_PaYcVF09khO58CISr08Uy6w3cAIF 7. Apache Airflow Playlist https://www.youtube.com/watch?v=uhQ54Dgp6To&list=PLACD_PaYcVF1Hzzc1Ds56bD7oUkfiL_Lv DISCLAIMER: This video and description have affiliate links. This means when you buy through one of these links, we will receive a small commission and this is at no cost to you. This will help support us to continue making awesome and valuable contents for you. #dataengineering #emr #spark #pyspark #jupyterlab #jupyternotebook #aws #emrstudio #etlpipeline #redfin

Comment