DuckDB for Data Engineers: From Local to Cloud with MotherDuck

Introduction

Description

Modern data teams need flexibility to work locally and scale effortlessly to the cloud. DuckDB and MotherDuck make that possible. In this course, you’ll learn how to build hybrid data workflows that run on your laptop, in the cloud, or across both, without changing tools.

You’ll explore how DuckDB enables fast, in-process analytics and how MotherDuck extends it into a fully managed cloud service with shared databases, governance, and elastic compute.

This is a fully hands-on course. You’ll build an end-to-end project analyzing real NYC service request data, use both CLI and UI to explore it, connect Python for ELT processing, and visualize hybrid results between laptop and cloud.

By the end, you’ll know how to move seamlessly between environments, use MotherDuck for collaboration and scale, and even extend your setup with the new Duck Lake lakehouse format.

Why DuckDB & MotherDuck?

DuckDB is the “SQLite for analytics”, a lightweight OLAP engine that runs anywhere: laptop, server, or embedded in Python. MotherDuck takes the same engine to the cloud with team features, managed storage, and hybrid execution.

You’ll learn how both fit into modern data platforms and why more engineers are adopting this local-to-cloud workflow for fast exploration, prototyping, and production analytics.

Building Your Local Lab

Start by setting up DuckDB on your machine and exploring data right away. No server, no complex configuration. You’ll query CSV and Parquet files, create a persistent .duckdb database, and explore your data visually using the built-in DuckDB UI.

Creating and Managing Tables

You’ll see how simple it is to turn a CSV into a full OLAP database table in a single step. From there, you’ll export clean results to CSV or Parquet, re-read them directly, and understand how DuckDB’s file-based model keeps analytics lightweight but powerful.

Transforming Data with Python

Move into Python and perform ELT the modern way. You’ll load data, clean and normalize columns, derive new metrics such as closed_in_days, and validate results, all from within a simple DuckDB-powered Python script. You’ll also learn how to export the results for use in other tools.

Local vs. Cloud Execution

Compare running the same query locally and in the cloud. Using EXPLAIN and EXPLAIN ANALYZE, you’ll see how DuckDB and MotherDuck split work across environments. You’ll visualize hybrid operators, understand cloud compute behavior, and learn quick optimizations that reduce cost and runtime.

Hybrid Analytics, One Workflow

Combine local and cloud data in a single query. You’ll union results from your local DuckDB with MotherDuck’s shared datasets and analyze them together. Then, using Python and simple visualization, you’ll create a Manhattan elevator complaint heatmap, pinpoint a potential HQ location, and query the surrounding area in the cloud for deeper business insights.

Duck Lake Deep Dive

Duck Lake extends the DuckDB/MotherDuck ecosystem into the lakehouse world. You’ll see how it stores Parquet files with full metadata, schema evolution, and transactional guarantees. Then, you’ll connect it to your own S3 bucket, create a Duck Lake database, and query your data just like any other table.

SQL on Your Lakehouse

Finish by running analytical SQL queries directly on your Duck Lake data, aggregating, filtering, and joining across open Parquet storage with full transaction support. You’ll experience firsthand how the lakehouse model feels when powered by DuckDB simplicity and MotherDuck scalability.