Start Data Engineering

A newsletter with tutorials, data design patterns, open-source tools, and techniques used by data-driven companies to help you become a better data engineer.
Date Title
Aug 13, 2025 How to Use Spark SQL Merge Into - Step-by-Step Tutorial
Aug 12, 2025 Six Data Modeling Techniques For Building Production-Ready Tables Fast
Aug 11, 2025 Free 10-Minute Polars Tutorial for Data Engineers
Aug 10, 2025 Free Python Standard Library How-to Cheatsheet for Data Engineers
Aug 9, 2025 How to Get Really Good at Advanced SQL for Data Engineering
Aug 5, 2025 How to quickly set up a local Spark development environment?
Jun 10, 2025 Using Joins and Group Bys the right way for data warehousing
Jun 7, 2025 CTEs(Common Table Expression) or Temporary Tables for Spark SQL
Jun 3, 2025 Advanced SQL is knowing how to model the data & get there effectively
May 5, 2025 Data Engineering Interview Preparation Series #3: SQL
Apr 14, 2025 How to Extract Data from APIs for Data Pipelines using Python
Apr 5, 2025 How to create an SCD2 Table using MERGE INTO with Spark & Iceberg
Mar 18, 2025 How to quickly deliver data to business users? #1. Adv Data types & Schema evolution
Mar 1, 2025 How to Manage Upstream Schema Changes in Data Driven Fast Moving Company
Feb 16, 2025 Visual Studio Code (VSCode) extensions for data engineers
Feb 10, 2025 Should Data Pipelines in Python be Function based or Object-Oriented (OOP)?
Feb 3, 2025 How to turn a 1000-line messy SQL into a modular, & easy-to-maintain data pipeline?
Jan 28, 2025 How to ensure consistent metrics in your warehouse
Jan 20, 2025 Data Engineering Interview Preparation Series #2: System Design
Dec 18, 2024 How to reference a seed from a different dbt project?
Nov 22, 2024 What do Snowflake, Databricks, Redshift, BigQuery actually do?
Oct 17, 2024 25 SQL tips to level up your data engineering skills
Oct 14, 2024 How to use nested data types effectively in SQL
Sep 23, 2024 How to decide on a data project for your portfolio
Sep 18, 2024 How to build a data project with step-by-step instructions
Sep 5, 2024 What are the Key Parts of Data Engineering?
Aug 13, 2024 Data Engineering Interview Preparation Series #1: Data Structures and Algorithms
Jul 26, 2024 How to implement data quality checks with greatexpectations
Jul 16, 2024 What are the types of data quality checks?
Jul 1, 2024 SQL or Python for Data Transformations?
Jun 24, 2024 Why use Apache Airflow (or any orchestrator)?
Jun 14, 2024 Data Engineering Projects
Jun 12, 2024 Data Engineering Project for Beginners - Batch edition
Jun 11, 2024 Build Data Engineering Projects, with Free Template
May 30, 2024 Python Essentials for Data Engineers
May 29, 2024 dbt(Data Build Tool) Tutorial
May 28, 2024 Building Cost Efficient Data Pipelines with Python & DuckDB
May 21, 2024 Enable stakeholder data access with Text-to-SQL RAGs
May 9, 2024 How to reduce your Snowflake cost
Apr 22, 2024 How to test PySpark code with pytest
Apr 22, 2024 Docker Fundamentals for Data Engineers
Feb 22, 2024 Data Engineering Best Practices - #2. Metadata & Logging
Dec 13, 2023 Uplevel your dbt workflow with these tools and techniques
Nov 14, 2023 What is an Open Table Format? & Why to use one?
Oct 25, 2023 6 Steps to Avoid Messy Data in Your Warehouse
Jul 20, 2023 Data Engineering Best Practices - #1. Data flow & Code
Jun 30, 2023 What is a self-serve data platform & how to build one
Jun 13, 2023 How to become a valuable data engineer
May 15, 2023 Data Engineering Project: Stream Edition
Feb 15, 2023 Change Data Capture, with Debezium
Jan 12, 2023 Data Pipeline Design Patterns - #2. Coding patterns in Python
Dec 11, 2022 Data Pipeline Design Patterns - #1. Data flow patterns
Aug 11, 2022 How to gather requirements for your data project
Jun 24, 2022 5 Steps to land a high paying data engineering job
May 18, 2022 Setting up a local development environment for python data projects using Docker
Apr 12, 2022 What is the difference between a data lake and a data warehouse?
Mar 18, 2022 End-to-end data engineering project - batch edition
Feb 22, 2022 Automating data testing with CI pipelines, using Github Actions
Dec 12, 2021 How to choose the right tools for your data pipeline
Nov 11, 2021 Setting up end-to-end tests for cloud data pipelines
Oct 22, 2021 How to improve at SQL as a data engineer
Oct 12, 2021 6 Responsibilities of a Data Engineer
Oct 12, 2021 6 Key Concepts, to Master Window Functions
Oct 12, 2021 Whats the difference between ETL & ELT?
Oct 12, 2021 What are Common Table Expressions(CTEs) and when to use them?
Oct 12, 2021 How to add tests to your data pipelines
Oct 11, 2021 10 Skills to Ace Your Data Engineering Interviews
Oct 5, 2021 What is a staging area?
Oct 3, 2021 What is a Data Warehouse?
Sep 16, 2021 How to Scale Your Data Pipelines
Aug 29, 2021 Understand & Deliver on Your Data Engineering Task
Aug 17, 2021 4 Key Patterns to Load Data Into A Data Warehouse
Jul 21, 2021 How to Validate Datatypes in Python
Jun 25, 2021 Designing a Data Project to Impress Hiring Managers
May 13, 2021 How to make data pipelines idempotent
Apr 26, 2021 Writing memory efficient data pipelines in Python
Apr 8, 2021 How to gather requirements to re-engineer a legacy data pipeline
Mar 27, 2021 How to trigger a spark job from AWS Lambda
Feb 28, 2021 How to set up a dbt data-ops workflow, using dbt cloud and Snowflake
Feb 13, 2021 Apache Superset Tutorial
Feb 7, 2021 How to Join a fact and a type 2 dimension (SCD2) table
Jan 30, 2021 How to update millions of records in MySQL?
Jan 16, 2021 How to unit test sql transforms in dbt
Jan 6, 2021 How to Backfill a SQL query using Apache Airflow
Jan 1, 2021 How to do Change Data Capture (CDC), using Singer
Nov 8, 2020 How to Pull Data from an API, Using AWS Lambda
Oct 12, 2020 How to submit Spark jobs to EMR cluster from Airflow
Jul 26, 2020 Ensuring Data Quality, With Great Expectations
Jul 11, 2020 Designing a “low-effort” ELT system, using stitch and dbt
Jun 19, 2020 3 Key techniques, to optimize your Apache Spark code
Jun 11, 2020 What, why, when to use Apache Kafka, with an example
Jun 2, 2020 A proven approach to land a Data Engineering job
May 2, 2020 What Does It Mean for a Column to Be Indexed
Apr 25, 2020 Advantages of Using dbt(Data Build Tool)
Apr 18, 2020 Apache Airflow Review: the good, the bad
Apr 11, 2020 Review: Building a Real Time Data Warehouse
Apr 5, 2020 3 Key Points to Help You Partition Late Arriving Events
Mar 29, 2020 Scheduling a SQL script, using Apache Airflow, with an example
Mar 20, 2020 10 Key skills, to help you become a data engineer
No matching items
Back to top
Morty Proxy This is a proxified and sanitized view of the page, visit original site.