In this video we are covering the end to end on-premise data lake/lakehouse setup! We will simplify the makeup of the Datalake. It will be SQL based. We utilize Apache Airflow, Iceberg, dbt, MinIO, Postgres and Trino. We removed the JVM based metastore from the equation in the Python based setup and will continue on that trend. Be sure to check out the related links to get familiar with the tech stack.
🔗 Tools setup guide:
Airflow overview setup link: https://youtu.be/In7zwp0FDX4
dbt series link: https://www.youtube.com/playlist?list=PLaz3Ms051BAm5RHojg6JA1WzRqM2mOAqE
MinIO setup link: https://youtu.be/DLRiUs1EvhM
Iceberg setup link: https://youtu.be/vnNHDylGtEk
Postgres setup link: https://youtu.be/fjYiWXHI7Mo
Install Python: https://www.youtube.com/watch?v=B0G-44dqHRM
Link to GitHub repo: https://github.com/hnawaz007/datalake/tree/main
💡 Why This Matters:
No more JVM setup or Hive metastore requirements! With this modern stack, setting up a Datalake becomes faster, leaner, and more efficient. This will give you and end to end overview of the Datalake setup and how to perform data engineering task in the Datalake setup.
👉 Start small with your flat file source and let MinIO + Iceberg + Postgres handle the rest.
#DataLake #Iceberg #dbt #DataEngineering #SimplifiedSetup