Introduction
Description
dbt is a SQL-first transformation workflow. The tool makes it easy to transform, test, and document your data with ease. Another big plus: your team can work directly within the warehouse to create trusted datasets for reporting, machine learning modeling, and operational workflows. Therefore, dbt is definitely a tool that you should look into as a data engineer. And this training is the perfect starting point for doing so!
Introduction to dbt
Before diving into the main part of this training, we first talk about the current challenges and opportunities with ELT tools and processes before going through some ETL and ELT basics. Then we introduce to you the dbt core and dbt Cloud, talk about the dbt benefits and special features.
Setup Snowflake, dbt core & GitHub
For our hands-on part, we need to do some preparation. You are going to create a Github repository as well as a Snowflake warehouse and an account on dbt cloud. Furthermore, you will do a basic dbt setup and choose your data platform as well as a dbt model. This can be a single .sql or .py file.
Creating data pipelines with dbt
Throughout the course, you learn how to create a series of pipelines (models) using e-commerce data, dbt Core, dbt Cloud, and Snowflake.
dbt materializations
After creating your pipelines, the next step is to store the transformed data to the target. For this, you could configure a dbt materialization, which could be a table, a view, an incremental model or an ephemeral model.
In the hands-on part of this section you work with the materializations and create your first .sql and .py model. Additionally, you will learn about and work with the dbt external and internal sources and their dependencies.
Testing dbt models
The next part of the training is about testing your dbt models. Tests are assertions you make about your models and other resources in your project. For this, you have two types of tests available: schema/generic tests and data/bespoke tests. After learning about the different tests that can be run with dbt, you will also test your dbt models.
Deploying and scheduling dbt models
Now that you have dbt models running on your local machine, you learn how you can make them accessible to your team members, how you can run them repeatedly and how you can keep the models updated. For doing that, you learn about some common ways of how to deploy and schedule your dbt models in dbt Cloud.
Advanced dbt features
In the last part of this training, you get to know some of the advanced features of dbt. You learn how continuous integration and deployment (CI/CD) works by implementing CI/CD pipelines in dbt Cloud hands-on. You also learn about dbt documentation, how it works and what you can do with it, and in the end generate documentation for your own project.
Provided material
- GitHub repository with all the source codes
- E-commerce dataset for this course
- Hands-on explainer videos
- curated list of links to more knowledge in each lesson
Requriements
You should have done our Snowflake for Data Engineers course or similar before starting this course.
Course Curriculum
Pricing
dbt for Data Engineers is included in our Data Engineering Academy