Master data engineering and AI like a big tech engineer

Real Time Formula 1 Analytics

Big Bag Data

Chess.com Analytics

Trading Strategy for Crypto Currency - Analytics Engineering Capstone Submission

BetFlow - Real Time Sports Betting App

Showcase of student projects

Capstone Projects

Our students have gone on to work at companies like Meta, Airbnb and Amazon. As well as achieve 100% raises!

Fast Track Your Career

Immediate free cloud access to Databricks, AWS, Snowflake, Astronomer, and more!

Free Cloud Access with tons of hands on exercises

Weekly Guest Speaker Sessions

Why Choose DataExpert.io Academy?

In this video, Zach dives into the differences between UDFs and built-in SQL functions in Spark Streaming, using a benchmark that processes 5 million random numbers. He explores how UDFs can be slower, with results showing about a 10% speed-up when using built-in functions, but this can vary based on caching and the complexity of operations. Zach also experiments with increasing the row count to 100 million and even 1 billion to see how performance changes. He encourages everyone to run similar benchmarks and observe the results for themselves, as there are nuances that can affect performance.

Exploring UDFs and SQL Benchmarks in Spark Streaming

academy/2/course/2113/w4d2lab_2_1773378791687/transcription.json

Sign in to view content

Exploring UDFs and SQL Benchmarks in Spark Streaming

Description