Purchase Required

You need to purchase this content in order to view it

Spark Batch Processing - Hands-On Techniques for Broadcast Hash Join (Day 1 Lab)

Week 4: Batch Pipelines with Apache Spark
45 mins

Description

Dive into practical Spark join optimization techniques in this hands-on video. We kick off by setting up the Spark directory and executing Docker compose for a seamless start. Learn how to efficiently read CSV files using spark.read options with headers set to true. Explore the concept and benefits of broadcast join, encountering and troubleshooting issues along the way. The highlight of this lab is the demonstration of the effectiveness of broadcast hash join, with a discussion on the delicate trade-off between compute and memory usage. Immerse yourself in a real-world Spark optimization experience, gaining valuable insights for your data journey. [Recorded on Dec 5th, 2023