Master data engineering and AI like a big tech engineer

Real Time Formula 1 Analytics

Big Bag Data

Chess.com Analytics

Trading Strategy for Crypto Currency - Analytics Engineering Capstone Submission

BetFlow - Real Time Sports Betting App

Showcase of student projects

Capstone Projects

Our students have gone on to work at companies like Meta, Airbnb and Amazon. As well as achieve 100% raises!

Fast Track Your Career

Immediate free cloud access to Databricks, AWS, Snowflake, Astronomer, and more!

Free Cloud Access with tons of hands on exercises

Weekly Guest Speaker Sessions

Why Choose DataExpert.io Academy?

Dive into practical Spark join optimization techniques in this hands-on video. We kick off by setting up the Spark directory and executing Docker compose for a seamless start. Learn how to efficiently read CSV files using spark.read options with headers set to true. Explore the concept and benefits of broadcast join, encountering and troubleshooting issues along the way. The highlight of this lab is the demonstration of the effectiveness of broadcast hash join, with a discussion on the delicate trade-off between compute and memory usage. Immerse yourself in a real-world Spark optimization experience, gaining valuable insights for your data journey. [Recorded on Dec 5th, 2023

Spark Batch Processing - Hands-On Techniques for Broadcast Hash Join (Day 1 Lab) 

academy/2/course/101/spark-batch-day-1-lab-v3-transcript.json

Sign in to view content

Spark Batch Processing - Hands-On Techniques for Broadcast Hash Join (Day 1 Lab)

Description