Build an end to end data lake etl pipeline | Airflow | Iceberg | dbt | Trino | Postgres

BI Insights Inc 4,804 3 months ago

Video Not Working? Fix It Now

In this video we are covering the end to end on-premise data lake/lakehouse setup! We will simplify the makeup of the Datalake. It will be SQL based. We utilize Apache Airflow, Iceberg, dbt, MinIO, Postgres and Trino. We removed the JVM based metastore from the equation in the Python based setup and will continue on that trend. Be sure to check out the related links to get familiar with the tech stack. 🔗 Tools setup guide: Airflow overview setup link: https://youtu.be/In7zwp0FDX4 dbt series link: https://www.youtube.com/playlist?list=PLaz3Ms051BAm5RHojg6JA1WzRqM2mOAqE MinIO setup link: https://youtu.be/DLRiUs1EvhM Iceberg setup link: https://youtu.be/vnNHDylGtEk Postgres setup link: https://youtu.be/fjYiWXHI7Mo Install Python: https://www.youtube.com/watch?v=B0G-44dqHRM Link to GitHub repo: https://github.com/hnawaz007/datalake/tree/main 💡 Why This Matters: No more JVM setup or Hive metastore requirements! With this modern stack, setting up a Datalake becomes faster, leaner, and more efficient. This will give you and end to end overview of the Datalake setup and how to perform data engineering task in the Datalake setup. 👉 Start small with your flat file source and let MinIO + Iceberg + Postgres handle the rest. #DataLake #Iceberg #dbt #DataEngineering #SimplifiedSetup

Comment