Sign in to view content

Sign in to view this lesson and continue learning.

Advanced Spark (Day 2 Lab)

Week 4: Batch Pipelines with Apache Spark
37 mins

Description

In this lab, Zach shows the students how to use Glue Job Runner and Iceberg to optimize the data processing. He goes over setting up the job, running Python functions, and using UDFs. He also demonstrates how to monitor the job and view the output table. Plus, he explains the benefits of using Iceberg for data compression and partitioning. [Recorded on May30th, 2024]