Sign in to view content

Sign in to view this lesson and continue learning.

Spark Batch Processing - User-Defined Functions (UDFs) and Broadcast Join (Day 2 Lab)

Week 4: Batch Pipelines with Apache Spark
71 mins
Apache Spark

Description

In this Spark lab, Zach explains the concept of User-Defined Functions (UDFs) and their application within the Spark execution environment. Through a practical code example, he guides viewers on schema updates and emphasizes the significance of caching. The tutorial extends to showcasing the power of broadcast join, featuring a query with partitioning. Zach concludes by highlighting the advantages of leveraging PySpark UDFs. [Recorded on Dec 7, 2023]