Purchase Required

You need to purchase this content in order to view it

Spark Batch Processing - User-Defined Functions (UDFs) and Broadcast Join (Day 2 Lab)

Week 4: Batch Pipelines with Apache Spark
71 mins

Description

In this Spark lab, Zach explains the concept of User-Defined Functions (UDFs) and their application within the Spark execution environment. Through a practical code example, he guides viewers on schema updates and emphasizes the significance of caching. The tutorial extends to showcasing the power of broadcast join, featuring a query with partitioning. Zach concludes by highlighting the advantages of leveraging PySpark UDFs. [Recorded on Dec 7, 2023]