Purchase Required

You need to purchase this content in order to view it

Advanced Spark (Day 2 Lab)

Week 4: Batch Pipelines with Apache Spark
37 mins

Description

In this lab, Zach shows the students how to use Glue Job Runner and Iceberg to optimize the data processing. He goes over setting up the job, running Python functions, and using UDFs. He also demonstrates how to monitor the job and view the output table. Plus, he explains the benefits of using Iceberg for data compression and partitioning. [Recorded on May30th, 2024]