Master data engineering and AI like a big tech engineer

Real Time Formula 1 Analytics

Big Bag Data

Chess.com Analytics

Trading Strategy for Crypto Currency - Analytics Engineering Capstone Submission

BetFlow - Real Time Sports Betting App

Showcase of student projects

Capstone Projects

Our students have gone on to work at companies like Meta, Airbnb and Amazon. As well as achieve 100% raises!

Fast Track Your Career

Immediate free cloud access to Databricks, AWS, Snowflake, Astronomer, and more!

Free Cloud Access with tons of hands on exercises

Weekly Guest Speaker Sessions

Why Choose DataExpert.io Academy?

In this lesson, Zach focuses on the five key reasons why Spark jobs can be slow, including data model bottlenecks and job misconfigurations. He emphasizes the importance of processing only the necessary data, advocating for incremental refresh strategies over full dataset refreshes, which can lead to significant performance improvements. Additionally, Zach discusses the impact of source file formats and the need for proper configurations to avoid congestion and misconfigurations.

Databricks and Advanced Spark Day2 Lecture

academy/2/course/2027/w2d2les_1772164553608/transcription.json

Sign in to view content

Databricks and Advanced Spark Day2 Lecture

Description