In this project, you will build a complete Data Lakehouse from scratch using Databricks and the Medallion Architecture.
You will start with raw data and step by step transform it into clean, reliable, and business-ready datasets using the Bronze, Silver, and Gold layers. This is how modern data platforms are built in real companies.
You wonβt just write code. You will design architecture, improve data quality, model data for analytics, and automate everything with pipelines and jobs.
By the end, you will have a production-style Lakehouse you can confidently showcase and extend further.
Letβs build it the right way.
| https://discord.gg/jVkHMdsD | Join The Bootcamp Community to discuss your questions |
|---|---|
| Datasets as source for the lakehouse Project | Datasets used as the source data for building the Lakehouse project. |
| β£ | The full project is available on GitHub, but I strongly recommend not copying the code directly. |
| You will learn much more by building everything yourself. Only check the solution if you get stuck. | |
| https://docs.databricks.com/aws/en/lakehouse/medallion | Official explanation of the Bronze, Silver, and Gold architecture pattern. |
| https://www.youtube.com/live/ldBLOasG23w?si=r67xUXYnOJ934Ish | Live session covering data engineering concepts and Databricks. |
| https://youtu.be/9GVqKuTVANE?si=dyJDrVlHuWZSpOTE | A similar hands-on project where a full SQL data warehouse is built from scratch. |
| This helps you understand the same layering and cleaning concepts, but using SQL Server, which is important for building the right data engineering mindset. |
<aside>
Goal: Preparation steps before building the Lakehouse.
</aside>
bronze silver goldraw_sources<aside>
Result: Project is ready to start building Bronze, Silver, and Gold layers.
</aside>