Etl

18 articles tagged with "Etl"

Checklist for Building a Cloud Data Engineer Portfolio

Checklist for Building a Cloud Data Engineer Portfolio

Two to three production-ready cloud data projects beat dozens of tutorials for landing data engineering interviews.

12 min read
Data Engineering
5 Tools To Showcase Data Engineering Skills

5 Tools To Showcase Data Engineering Skills

Learn how Airflow, AWS, Snowflake, dbt, and Spark projects can power a standout data engineering portfolio with real end-to-end workflows.

16 min read
Data Engineering
How To Add Data Quality Checks in Pipelines

How To Add Data Quality Checks in Pipelines

Automated data validations for ingestion and transformations using Great Expectations and dbt-expectations to catch errors early and keep analytics trustworthy.

11 min read
Data Engineering
Green Data Pipelines vs. Traditional Pipelines

Green Data Pipelines vs. Traditional Pipelines

Compare green and traditional data pipelines: energy use, cost savings, scalability, and techniques like lazy evaluation, sparse models, and carbon-aware scheduling.

13 min read
Data Engineering
Open Source ETL Tools: Comparison Guide 2026

Open Source ETL Tools: Comparison Guide 2026

Compare six open-source ETL tools—Airbyte, Airflow, NiFi, Pentaho, Meltano, and Talend (retired)—to find the best fit for scale, real-time needs, and team skills.

17 min read
Data Engineering
How Databricks Handles Schema Transformations

How Databricks Handles Schema Transformations

Guide to schema enforcement, schema evolution, Auto Loader, mergeSchema, type widening, and streaming best practices in Databricks.

16 min read
Data Engineering
Error Handling in Airflow with Python Pipelines

Error Handling in Airflow with Python Pipelines

Reliable Airflow pipelines require intentional error handling: retries, idempotent tasks, targeted exceptions, alerts, and robust logging.

12 min read
Data Engineering
Backward Compatibility in Schema Evolution: Guide

Backward Compatibility in Schema Evolution: Guide

Evolve schemas without breaking pipelines: learn safe changes, compatibility modes (BACKWARD vs BACKWARD_TRANSITIVE), registry best practices, and rollout tips.

15 min read
Data Engineering
Kubernetes Best Practices for Data Teams

Kubernetes Best Practices for Data Teams

Kubernetes best practices for data teams: cluster setup, Spark/Airflow integration, resource requests, autoscaling, security, monitoring, GitOps, and cost.

20 min read
Data Engineering
How to Debug Airflow DAG Failures

How to Debug Airflow DAG Failures

Step-by-step checklist to diagnose and fix Airflow DAG failures: verify DAG import, inspect task logs, test with dag.test(), validate connections, and tune resources.

15 min read
Data Engineering
AWS vs Azure for Data Engineers: Tool Comparison

AWS vs Azure for Data Engineers: Tool Comparison

Compare AWS and Azure data engineering tools — storage, ETL, streaming, ML, and pricing — to choose the platform that fits your team's skills and infrastructure.

19 min read
Data Engineering
Data Engineering Bootcamp Checklist: What to Look For

Data Engineering Bootcamp Checklist: What to Look For

Assess curriculum, hands-on projects, mentorship, cloud tools, and costs to pick a bootcamp that truly prepares you for data engineering roles.

17 min read
Data Engineering
Page 0 of 2Next