In this project, you will build a complete Data Lakehouse from scratch using Databricks and the Medallion Architecture.

You will start with raw data and step by step transform it into clean, reliable, and business-ready datasets using the Bronze, Silver, and Gold layers. This is how modern data platforms are built in real companies.

You won’t just write code. You will design architecture, improve data quality, model data for analytics, and automate everything with pipelines and jobs.

By the end, you will have a production-style Lakehouse you can confidently showcase and extend further.

Let’s build it the right way.


πŸ“š Project Resources

https://discord.gg/jVkHMdsD Join The Bootcamp Community to discuss your questions
Datasets as source for the lakehouse Project Datasets used as the source data for building the Lakehouse project.
β€£ The full project is available on GitHub, but I strongly recommend not copying the code directly.
You will learn much more by building everything yourself. Only check the solution if you get stuck.
https://docs.databricks.com/aws/en/lakehouse/medallion Official explanation of the Bronze, Silver, and Gold architecture pattern.
https://www.youtube.com/live/ldBLOasG23w?si=r67xUXYnOJ934Ish Live session covering data engineering concepts and Databricks.
https://youtu.be/9GVqKuTVANE?si=dyJDrVlHuWZSpOTE A similar hands-on project where a full SQL data warehouse is built from scratch.
This helps you understand the same layering and cleaning concepts, but using SQL Server, which is important for building the right data engineering mindset.

πŸ—ΊοΈ Project Phases & Guide

πŸ—οΈ Phase1 - Project Initialization

<aside>

Goal: Preparation steps before building the Lakehouse.

</aside>

<aside>

Result: Project is ready to start building Bronze, Silver, and Gold layers.

</aside>