In this project, you will build a complete Data Lakehouse from scratch using Databricks and the Medallion Architecture.

You will start with raw data and step by step transform it into clean, reliable, and business-ready datasets using the Bronze, Silver, and Gold layers. This is how modern data platforms are built in real companies.

You won’t just write code. You will design architecture, improve data quality, model data for analytics, and automate everything with pipelines and jobs.

By the end, you will have a production-style Lakehouse you can confidently showcase and extend further.

Let’s build it the right way.

📚 Project Resources

https://discord.gg/jVkHMdsD	Join The Bootcamp Community to discuss your questions
Datasets as source for the lakehouse Project	Datasets used as the source data for building the Lakehouse project.
‣	The full project is available on GitHub, but I strongly recommend not copying the code directly.
You will learn much more by building everything yourself. Only check the solution if you get stuck.
https://docs.databricks.com/aws/en/lakehouse/medallion	Official explanation of the Bronze, Silver, and Gold architecture pattern.
https://www.youtube.com/live/ldBLOasG23w?si=r67xUXYnOJ934Ish	Live session covering data engineering concepts and Databricks.
https://youtu.be/9GVqKuTVANE?si=dyJDrVlHuWZSpOTE	A similar hands-on project where a full SQL data warehouse is built from scratch.
This helps you understand the same layering and cleaning concepts, but using SQL Server, which is important for building the right data engineering mindset.

🗺️ Project Phases & Guide

🏗️ Phase1 - Project Initialization

<aside>

Goal: Preparation steps before building the Lakehouse.

</aside>

[ ] Design the architecture
- [ ] Read Databricks reference for the project → LINK
- [ ] Draw the data lakehouse architecture using draw.io or similar → LINK
[ ] Create GitHub repository → LINK
[ ] **Connect GitHub to Databricks using URL (**Workspace → Create → Git Folder)
[ ] Create Lakehouse schemas (Unity Catalog) usingUI or SQL**:** bronze silver gold
[ ] Create a volume inside bronze schema raw_sources
[ ] Upload the 6 CSV files from engineering folder into the Bronze volume → LINK

<aside>

Result: Project is ready to start building Bronze, Silver, and Gold layers.

</aside>