Data Engineering Boot Camp V2 Combined Track

The second live boot camp offered by Zach Wilson in Summer 2023.

Zach Wilson

Taught by Zach Wilson

Founder at DataExpert.io

Learn directly from the experts

Zach Wilson

Zach Wilson

Founder at DataExpert.io

I have led teams of data engineers and software engineers at Airbnb, Facebook, and Netflix. My next goal is to upskill as many data knowledge workers as I can!

Course syllabus

67 lessons • 59+ hours of content • 11 assignments

Dimensional Data Modeling in Postgres
1
Dimensional Data Modeling - Complex Data Types Arrays and Structs with Postgres (Day 1 Lab)
2
Dimensional Data Modeling - Graph Dimensional Modeling with Postgres (Day 3 Lecture)
3
Dimensional Data Modeling - Slowly-changing Dimensions and idempotent queries with Postgres (Day 2 Lab)
4
Dimensional Data Modeling - Graph Data Modeling with Postgres (Day 3 Lab)
5
Dimensional Data Modeling - Slowly-changing Dimensions and idempotent queries with Postgres (Day 2 Lecture)
6
Dimensional Data Modeling - Complex Data Types Arrays and Structs with Postgres (Day 1 Lecture)
Dimensional Data Modeling with Apache Iceberg
1
Dimensional Data Modeling - Complex Data Types Arrays and Structs with Iceberg (Day 1 Lab)
2
Dimensional Data Modeling - Complex Data Types Arrays and Structs with Iceberg (Day 1 Lecture)
3
Dimensional Data Modeling - Slowly-changing Dimensions and idempotent queries with Trino (Day 2 Lab)
4
Dimensional Data Modeling - Slowly-changing Dimensions and Idempotent Queries in Iceberg (Day 2 Lecture)
5
Dimensional Data Modeling - Graph Data Modeling with Iceberg (Day 3 Lab)
6
Dimensional Data Modeling - Graph Data Modeling with Iceberg (Day 3 Lecture)
Fact Data Modeling in Postgres
1
Fact Data Modeling - Navigating Dimensions and Graph Modeling (Day 1 Lab)
2
Fact Data Modeling - Mastering Denormalization Timing and Processing Large Volume Data (Day 3 Lab)
3
Fact Data Modeling - Additive vs Non-Additive Dimensions and Beyond (Day 1 Lecture)
4
Fact Data Modeling - Exploring Datelist Structures for User Growth Analysis (Day 2 Lab)
5
Fact Data Modeling - Navigating Challenges: Denormalization and Large Volume Data Processing (Day 3 Lecture)
6
Fact Data Modeling - Distinguishing Facts from Dimensions and Leveraging Reduced Facts for Insightful Analysis (Day 2 Lecture)
Capstone Project
1
Capstone Project
Fact Data Modeling with Apache Iceberg
1
Fact Data Modeling - Practical Insights into Data Modeling and Analysis with Iceberg (Day 1 Lab)
2
Fact Data Modeling - Core Concepts, Deduplication Techniques, and Retention Considerations with Iceberg (Day 1 Lecture)
3
Fact Data Modeling - Compact Tables for Efficient Data Representation with Iceberg (Day 2 Lab)
4
Fact Data Modeling - Core Elements in Data Modeling with Iceberg (Day 2 Lecture)
5
Fact Data Modeling - Practical Guide to Formatting and Aggregating Data with Iceberg (Day 3 Lab)
6
Fact Data Modeling - Minimizing Shuffle and Reducing Facts with Iceberg (Day 3 Lecture)
Pipeline Spec Building and dbt fundamentals
1
Analytics Data Quality - Exploring Data Modeling and Quality Checks (Day 1 Lab)
2
Analytics Data Quality - Strategies and Insights from Zach's Airbnb Experience (Day 1 Lecture)
3
Analytics Data Quality - Mastering DBT Projects with Bruno: Troubleshooting, Profiles, Snapshots, and Testing (Day 2 Lab)
4
Analytics Data Quality - Mastering DBT Projects with Bruno: Practical Overview and Hands-On Demonstrations (Day 2 Lecture)
Unit Testing Spark Pipelines and Write-Audit-Publish
1
Infrastructure Data Quality - Mastering Spark and PySpark Testing: Comprehensive Overview and Practical Guidance (Day 1 Lab)
2
Infrastructure Data Quality - Elevating Data Quality in Analytics Engineering: Importance, Challenges, and Leadership Perspectives (Day 1 Lecture)
3
Infrastructure Data Quality - Implementing Data Quality Measures with Astronomer and Airflow: Hands-On Lab with Marc Lamberti (Day 2 Lab)
4
Infrastructure Data Quality - Elevating Infrastructure Data Quality: Insights and Best Practices with Marc Lamberti (Day 2 Lecture)
Analytical Patterns & Analysis with Trino
1
Applying Advanced SQL for Analytical Insights - Mastering Growth Accounting: Hands-On Journey through User Growth and Retention Analysis (Day 1 Lab)
2
Applying Advanced SQL for Analytical Insights - Exploring SQL, Scaling Projects, and Aggregation Analysis (Day 1 Lecture)
3
Applying Advanced SQL for Analytical Insights - Aggregations and Cardinality Reduction in Bootcamp Web Events (Day 2 Lab)
4
Applying Advanced SQL for Analytical Insights - Recursive CTEs, Window Functions, and Practical Insights (Day 2 Lecture)
Streaming Pipelines with Apache Flink
1
Flink Streaming - Real-Time Data Processing Essentials: Lambda vs Kappa Architectures, UDFs, and Windowing Techniques (Day 2 Lecture)
2
Flink Streaming - Fundamentals of Real-Time Data Processing (Day 1 Lecture)
3
Flink Streaming - Analyzing TechCreator.io Popularity with Flink: Tumbling Windows for Real-Time Traffic Insights (Day 2 Lab)
4
Flink Streaming - Setting Up Streaming Pipelines and Integrating Kafka with Postgres (Day 1 Lab)
Data Pipeline Maintenance
1
Data Pipeline Maintenance - Simulated Scenarios and Runbook Creation for Effective On-Call Procedures (Day 1 Lab)
2
Advanced Data Pipeline Maintenance - Signals of Technical Debt, Data Migration Models, and On-Call Best Practices (Day 2 Lecture)
3
Data Pipeline Maintenance - Challenges, Ownership Models, and Team Structures (Day 1 Lecture)
Product Sense, KPIs and Experimentation
1
Applying KPIs and Experimentation - Setting Up Web Experiments in Statsig (Day 1 Lab)
2
Applying KPIs and Experimentation - Understanding User Behavior and Driving Business Growth (Day 1 Lecture)
3
Applying KPIs and Experimentation - Leading vs. Lagging Metrics and the Power of Funnels in Data Analysis (Day 2 Lecture)
Batch Pipelines with Apache Spark
1
Spark Batch Processing - Caching, DataFrame, Dataset, SparkSQL, and Bucketing in Iceberg (Day 2 Lab)
2
Spark Batch Processing - Data Partitioning, Performance Optimization, and Iceberg Tables (Day 1 Lab)
3
Spark Batch Processing - Comparing with Hive and MapReduce, Key Components, and Performance Optimization (Day 1 Lecture)
4
Spark Batch Processing - Caching, UDFs, DataFrames, Datasets, SparkSQL, and Parquet (Day 2 Lecture)
Data Impact Communication & Visualization
1
Data Impact Communication & Visualization - Performant Dashboard Design and Effective Visualization Combinations (Day 2 Lecture)
2
Data Impact Communication & Visualization - Preparing for Complex Visualizations (Day 1 Lab)
3
Data Impact Communication & Visualization - Creating Executive Summary and Exploratory Dashboards with events.csv Dataset (Day 2 Lab)
4
Data Impact Communication & Visualization - Preparing for Complex Visualizations (Day 1 Lab)
Write-Audit-Publish and Pipeline Spec Building
1
Analytics Data Quality - Hands-On Exercises and Validating Real-World Datasets (Day 1 Lab)
2
Analytics Data Quality - Overcoming Data Quality Challenges: Identifying Poor Data Sources and Setting Row Count Checks (Day 2 Lab)
3
Analytics Data Quality - Mitigating Poor Data Quality: Causes, Contracts, and Row Count Thresholds (Day 2 Lecture)
4
Analytics Data Quality - Building Trust in Data: Validation Techniques and Quality Checks (Day 1 Lecture)
Analytical Patterns & Analysis with Postgres
1
Applying Advanced SQL for Analytical Insights - Window Functions, GROUPING SETS, CUBE, ROLLUP, and Funnel Analytics (Day 2 Lab)
2
Applying Advanced SQL for Analytical Insights - Bridging SQL Features and Data Modeling Insights (Day 2 Lecture)
3
Applying Advanced SQL for Analytical Insights - Growth Accounting, Survivorship Analysis, and Smoothing Trends (Day 1 Lab)
4
Applying Advanced SQL for Analytical Insights - Repeatable Analyses and State Change Tracking (Day 1 Lecture)
Write-Audit-Publish pattern and CI/CD (recorded)
1
Infrastructure Data Quality - Enhancing Data Quality and Infrastructure Efficiency in Data Engineering (Day 2 Lecture)
2
Infrastructure Data Quality - Real-World Validation with PySpark and PyTest (Day 1 Lab)
3
Infrastructure Data Quality - Preemptive Data Quality Assurance: Integrating Software Engineering Best Practices (Day 1 Lecture)
4
Infrastructure Data Quality - Setting Up CI/CD with GitHub Actions for Data Quality Assurance (Day 2 Lab)

Social proof

What students say