Sign in to view content

Sign in to view this lesson and continue learning.

Apache Spark Shuffle Joins Day 1 Lecture

Module
40 mins
ETL/ELTApache Spark

Description

In this lecture, Zach discusses how we handle two petabytes of data daily at Netflix, focusing on different sampling techniques to optimize processing. He shares insights on the importance of precision in data analysis and how we managed to reduce processing time and costs significantly by using a 0.1% sample. He also touches on the challenges of dynamic IP addresses in our cloud environment and the need for collaboration with application owners to implement effective logging.