Sign in to view content

Sign in to view this lesson and continue learning.

Injecting quality into your Airflow DAG Lab

Module
54 mins

Description

In this video, Zach walk you through the setup and execution of a simple data pipeline using Airflow, focusing on reading data from Kafka and storing it in a production table. He covers key components like ExecutionTimeout, MaxActiveRuns, and the importance of data quality checks. He demonstrates how to handle missing data and ensure our pipeline is idempotent, meaning it won't create duplicates when rerun. He also highlights the significance of staging tables and the write-audit-publish pattern for maintaining data integrity.