Sign in to view content

Sign in to view this lesson and continue learning.

Databricks and Advanced Spark Day2 Lecture

Description

In this lesson, Zach focuses on the five key reasons why Spark jobs can be slow, including data model bottlenecks and job misconfigurations. He emphasizes the importance of processing only the necessary data, advocating for incremental refresh strategies over full dataset refreshes, which can lead to significant performance improvements. Additionally, Zach discusses the impact of source file formats and the need for proper configurations to avoid congestion and misconfigurations.