This is AWS Data Engineering crash course video in which I have explained about data warehouse & data lake development in AWS. I have given overview of Amazon Redshift, AWS Glue, Apache Hudi, Amazon EMR & Managed Airflow (mwaa) as well. Post the overview I have shared the demo of how to use these services together to build data lake & data warehouse in AWS.
There are 5 exercises mentioned in the video & you can download the sample data & code as well for practice purpose. In first exercise, I have shown how you can use AWS Glue crawler to parse input file & create a table in Glue Catalog.
In second exercise, we have used AWS Glue pyspark application to load data from CSV files into datalake hudi tables
In third exercise, we have used Amazon EMR to read datalake hudi table and created analytics hudi table. It is like reading silver layer data & transforming into golden layer if you follow medallion architecture.
In fourth exercise, I have shown how you can read hudi tables directly in Amazon redshift & created snapshots tables to consume & utilize analytics dataset
Finally in fifth exercise, we will use Managed airflow to orchestrate and run end to end pipelines covering the steps mentioned earlier.
If you are AWS beginner, I am sure you will learn a lot from this video. However this is not aws - zero to hero masterclass. This is more like crash course in which I wanted to share how you can quickly build solutions in AWS using the popular services.
I have referred to following additional videos. Do check these video as well to get better understanding.
Amazon Redshift for beginners: https://youtu.be/dmsuzIOzmIs
AWS DataLake for beginners : https://youtu.be/m-WEGgYq25c
Feel free to reach out to me as well : [email protected]
If you wish to download the presentation slides , sample data files & source code for AWS Glue job , Amazon EMR pyspark application , Amazon Redshift sql script & Managed Airflow DAG code used in the crash course video then check the link below:
https://mailchi.mp/45b9673b727b/aws-data-engineering-crash-course
Are you interested in attending 1-1 training on AWS with me ? Send an email to [email protected] with the heading "1-1 AWS Training session" & I will get back to you with details about our initial introduction meeting.
Video timeline:
00:00 Introduction
01:39 Datawarehouse Migration Projects
04:12 Creating data lakes in AWS
08:38 Amazon Redshift overview
11:00 AWS Glue overview
13:50 Apache Hudi overview
15:46 Amazon EMR overview
17:39 Managed Airflow overview
19:45 Demo AWS Console
23:00 Exercise 1 (Glue crawlers)
28:20 Exercise 2 (Glue pyspark)
46:30 Exercise 3 (Amazon EMR)
01:02:20 Exercise 4 (Amazon Redshift)
01:16:56 Exercise 5 (MWAA end to end pipeline)
Do like, share, comment & subscribe to the channel if you are new here.
Cheers