DataEngExpert

Spark, Databricks, Snowflake, Kafka, Flink — the most comprehensive data engineering curriculum on the internet.

Course syllabus

178 lessons • 116+ hours of content • 25 assignments

Kickoff
1
Bootcamp Kickoff
2
Boot Camp Database Setup
3
January 2025 Bootcamp Kickoff
4
Databricks Boot Camp Kickoff
5
Capstone Project Brainstorming
Week 1: Dimensional Data Modeling
1
Dimensional Data Modeling Complex Data Type and Cumulation Day 1 Lecture
2
Dimensional Data Modeling Complex Data Type and Cumulation Day 1 Lab
3
Dimensional Data Modeling: Building Slowly Changing Dimensions Day 2 Lecture
4
Dimensional Data Modeling: Building Slowly Changing Dimensions Day 2 Lab
5
Dimensional Data Modeling: Graph Data Modeling Day 3 Lecture
6
Dimensional Data Modeling: Graph Data Modeling Day 3 Lab
Week 2: Fact Data Modeling
1
Fact Data Modeling: Core Concepts, Deduplication Day 1 Lecture
2
Fact Data Modeling: Practical Insights into Data Modeling Day 1 Lab
3
Fact Data Modeling: Core Elements in Data Modeling Day 2 Lecture
4
Fact Data Modeling: Compact Tables for Efficient Data Representation Day 2 Lab
5
Fact Data Modeling: Minimizing Shuffle and Reducing Facts Day 3 Lecture
6
Fact Data Modeling: Practical Guide to Formatting and Aggregating Data Day 3 Lab
Week 3: Apache Spark Fundamentals
1
Apache Spark: Architecture, Optimization, and Best Practices Day 1 Lecture
2
Apache Spark: Hands-On for Broadcast and Hash Joins Day 1 Lab
3
Apache Spark: Managing Spark Jobs and Notebooks Day 2 Lecture
4
Apache Spark: User-Defined Functions and Broadcast Join Day 2 Lab
5
Unit Testing Spark Jobs: Importance, Challenges, and Leadership Perspectives Lecture
6
Unit Testing Spark Jobs: Mastering Spark and PySpark Testing Lab
7
Spark Basics
Week 4: Databricks Basics
1
Databricks Platform Overview Day 1 Lecture
2
Databricks Platform Overview Day 1 Lab
3
Introduction to Spark Day 2 Lecture
4
Introduction to Spark Day 2 Lab
5
Apache Spark Core Day 3 Lecture
6
Apache Spark Core Day 3 Lab
Week 5: Databricks & Advanced Spark
1
Apache Spark Shuffle Joins Day 1 Lecture
2
Apache Spark Memory Turning, Partitioning Day 2 Lecture
3
Apache Spark Memory Turning, Partitioning Day 2 Lab
4
Apache Spark Unit Testing Day 3 Lecture
5
Apache Spark Unit Testing Day 3 Lab
6
Setting Up CI/CD and Unit Testing in Databricks for Reliable Data Pipelines
7
Databricks and Advanced Spark Day1 Lecture
8
Databricks and Advanced Spark Day1 Lab
9
Databricks and Advanced Spark Day2 Lecture
10
Databricks and Advanced Spark Day2 Lab
11
Apache Spark Shuffle Joins Day 1 Lab
Week 6: Data Lakes with Delta Table
1
Delta Table Day 1 Lecture
2
Delta Table Day 1 Lab
3
Delta Lake Bonus
4
Delta Table Day 2 Lecture
5
Delta Table Day 2 Lab
6
Test Again
7
Data Lakes with Delta Table Day1 Lecture
8
Data Lakes with Delta Table Day1 Lab
9
Data Lakes with Delta Table Day2 Lecture
10
Data Lakes with Delta Table Day2 Lab
Week 7: Analytical Patterns & Advanced SQL
1
Applying Analytical Patterns Day 1 Lecture
2
Applying Analytical Patterns Day 1 Lab
3
Advanced SQL Patterns Day 2 Lecture
4
Advanced SQL Patterns Day 2 Lab
5
Analytical Patterns Recognizing Business Value Day 3 Lecture
6
Analytical Patterns Recognizing Business Value Day 3 Lab
7
Applying Analytical Patterns: Exploring SQL, Scaling Projects and Aggregation Analysis Day 1 Lecture
8
Applying Analytical Patterns: Mastering Growth Accounting and Retention Analysis Day 1 Lab
9
Applying Analytical Patterns: Recursive CTEs and Window Functions Day 2 Lecture
10
Applying Analytical Patterns: Aggregations and Cardinality Reduction Day 2 Lab
Week 8: Structured Streaming with Spark & Kafka
1
Apache Spark programming with Databricks Day 1 Lecture
2
Apache Spark programming with Databricks Day 1 Lab
3
Apache Spark programming with Databricks Day 2 Lecture
4
Apache Spark programming with Databricks Day 2 Lab
5
Structured Streaming Kafka to Delta Live Table Day1 Lecture
6
Structured Streaming Kafka to Delta Live Table Day1 Lab
7
Structured Streaming Kafka to Delta Live Table Day2 Lecture
8
Structured Streaming Kafka to Delta Live Table Day2 Lab
9
Exploring UDFs and SQL Benchmarks in Spark Streaming
10
Advanced Spark Optimization Techniques Day 1 Lecture
11
Advanced Spark Optimization Techniques Day 1 Lab
12
Spark Structured Streaming Day 2 Lecture
13
Spark Structured Streaming Day 2 Lab
14
Deep Dive On Workflows Day 3 Lecture
15
Deep Dive On Workflows Day 3 Lab
Week 9: Real-time Pipelines with Flink & Kafka
1
Flink Lab Setup
2
Streaming Pipelines: Mastering Streaming and Real-time Pipelines Day 1 Lecture
3
Streaming Pipelines: Setting up Streaming Pipelines Day 1 Lab
4
Streaming Pipelines: Exploring Data Collection and Processing Day 2 Lecture
5
Streaming Pipelines: Kafka, Postgres, Spark Integrations and Parallelism Day 2 Lab
Week 10: Managing Unstructured Data
1
Managing Unstructured Data - Day 1 Lecture
2
Managing Unstructured Data - Day 1 Lab
3
Managing Unstructured Data - Day 2 Lecture
4
Managing Unstructured Data - Day 2 Lab
5
Managing Unstructured Data Day1 Lecture
6
Managing Unstructured Data Day1 Lab
7
Managing Unstructured Data Day2 Lecture
8
Managing Unstructured Data Day2 Lab
Week 11: Data Quality Patterns
1
Data Quality Patterns: MIDAS Process from Airbnb Day 1 Lecture
2
Data Quality Patterns: Spec-Building Document Day 1 Lab
3
Data Quality Patterns: WAP Patterns Day 2 Lecture
Week 12: Data Pipeline Maintenance
1
Data Pipeline Maintenance: Navigating the Complexities of Data Engineering Day 1 Lecture
2
Data Pipeline Maintenance: Strategies for Maintenance and Dock Building Day 2 Lecture
Week 13: Data Visualization and Impact
1
Data Visualization and Impact: Mastering Data Engineering Day 1 Lecture
2
Data Visualization and Impact: Hands-On with the CSV files Day 1 Lab
3
Data Visualization and Impact: Insights and Best Practices Day 2 Lecture
4
Data Visualization and Impact: Exploring Data Visualization and Aggregation Techniques Day 2 Lab
Week 14: KPIs and Experimentation
1
KPIs and Experimentation: Decoding Business Success: Metrics, Growth Strategies and Collaborative Approaches Day 1 Lecture
2
KPIs and Experimentation: Setting up and Analysing Experiments Day 1 Lab
3
KPIs and Experimentation: Leading and Lagging Metrics Day 2 Lecture
Week 15: Building AI Agents with Databricks
1
Building AI Agents with Databricks Day1 Lecture
2
Building AI Agents with Databricks Day1 Lab
3
Building AI Agents with Databricks Day2 Lecture
4
Building AI Agents with Databricks Day2 Lab
5
Enhancements and Implementations in MLflow
Capstone Project
1
Capstone May 2025
2
Capstone Showcase Jan 2025
Q&A with Zach
1
Q&A Week 1
2
Q&A Week 2
3
Q&A Week 3
4
Q&A Week 4
5
Q&A with Zach Week 1
6
Q&A with Zach Week 2
7
Q&A with Zach Week 3
8
Q&A with Zach Week 4
9
Q&A with Zach Week 5
10
Navigating Data Engineering -Tips, Tools, and Career Insights
11
Q&A with Zach Week1
12
Q&A with Zach Week2
13
Q&A with Zach Week3
14
Q&A with Zach Week4
15
Q&A with Zach Week5
Tech Talks
1
Tech Talk - Alex
2
Tech Talk - Joe
3
Tech Talk - Shubham (2025-06-24)
4
Tech Talk - Vaishali
5
Tech Talk - Shubham (2026-02-01)
6
Tech Talk - Brian
7
Tech Talks 3
8
Tech Talk - Xinran
9
Tech Talk - Jason
TA Office Hours
1
TA Office Hour 1 (2025-01-14)
2
TA Office Hour 2 (2025-01-14)
3
TA Office Hour 3 (2025-01-21)
4
TA Office Hour 4 (2025-01-21)
5
TA Office Hour 5 (2025-01-29)
6
TA Office Hour 6 (2025-01-29)
7
TA Office Hour 1 (2025-06-04)
8
TA Office Hour 2 (2025-06-05)
9
TA Office Hour 3 (2025-06-11)
10
TA Office Hour 4 (2025-06-17)
11
TA Office Hour 5 (2025-06-17)
12
TA Office Hour 6 (2025-06-19)
13
TA Office Hour 7
14
TA Office Hour 8
15
TA Office hour
16
TA Office hour 1
17
TA Office hour 2
18
TA Office hour 3
19
TA Office hour 4
20
TA Office hour 5
21
TA Office hour 6
22
TA Office hour 7
23
TA Office hour 8
Career Development Sessions
1
Career Development - LinkedIn Optimization
2
Career Development - Resume Review
3
Career Development - Interview Help
4
Career Development - Data Modeling Interview
5
Career Development - Strategic Networking
Guest Speaker Sessions
1
Jason Reid (cofounder of Tabular)
2
Shachar Meir
3
Brian Pulliam
4
Sundas Khalid
5
YZ
6
Prasad Rao
7
Joe Reis
Bonus - Azure
1
Azure - Week 1
2
Azure - Week 2
3
Azure - Week 3
4
Azure - Week 4
5
Azure - Week 5
Bonus - LLMs
1
RAG and LLMs Day1 Lecture
2
RAG and LLMs Day1 Lab
3
RAG and LLMs Day2 Lecture
4
RAG and LLMs Day2 Lab
5
LLMs Day 3 Part1
6
LLMs Day3 Part2