Description

Data pipelines are the number one thing within the Data Science platform. Without them, data ingestion or machine learning processing, for example, would not be possible.

This 170 minute training will help you understand how to create stream and batch processing pipelines as well as machine learning pipelines by going through some of the most essential basics - complemented by templates and examples for useful cloud computing platforms. 


Basics Section

Platform & Pipeline Basics

In the first part, we will have a detailed look at the platform blueprint and different types of pipelines. You will learn about the difference between those types of pipelines, how they work, how machine learning on a platform looks and how you bring the different pipelines together.

Platform Blueprint & End to End Pipeline Example

The platform blueprint as well as end to end pipelines are crucial topics within the Data Engineering world. They can be found on every platform and really work everywhere. Without being too detailed, the blueprint is more of a framework, which presents the most important parts of a platform: connect, buffer, process, store, and visualize. It also shows where single tools can be used. This way, you understand how a platform looks and how it works. Using end to end pipelines as an example, I will also show you how to easily make use of the blueprint for your work as a Data Engineer.

Push and Pull Pipelines

As a Data Engineer, it is very crucial to understand the difference between push and pull pipelines, which is why they are also a topic of this course. They visualize the different ways of how data is transferred to the platform - so whether they are sent or fetched. For better understanding, I also added many examples with further details.

Batch & Streaming Pipelines

Batch and streaming pipelines are classic types of pipelines that you will come across often in your work as a Data Engineer. In this chapter you will understand the difference between those two types and how they work. You will also develop a feel for which pipeline you are dealing with or which pipeline you need to create for a certain scenario.

Visualization Pipelines

Data processing and storage is a huge topic and it is very important to get this data flow visualized in some way - even if you don’t have direct access to the data. The chapter visualization pipelines is a little guide to how you can manage that and is complemented by an example with Apache Spark.

Lambda Architecture

Lambda architecture is a topic that you will always come across when dealing with platforms and pipelines. A lambda architecture enables you to bring batch and streaming pipelines together within your platform. It is also used a lot for machine learning where you can train with batch pipelines and apply your analysis with the streaming pipelines. So it is definitely worth expanding your knowledge about it.

Platform Examples

In the last part of the training, we go through some templates and examples based on the platform blueprint. I will show you how such a platform architecture can look like on AWS, Hadoop, GCP, or Azure - with all its streaming and batch pipelines.

This way, you can understand how such a platform looks in theory and how it would look in practice. I show you which tools can be used at which point within the architecture and what important techniques come to use. This knowledge will help you a lot in your job as a Data Engineer as you will learn how to build pipelines and use tools like Lambda, API gateway or DynamoDB in a more practical way.


Advanced Concepts Section

Processing Models: Event-Driven, Batch & Streaming

In this part, we take a closer look at the different processing models you will encounter in modern data platforms. You will understand how event-driven, batch, micro-batching, and streaming workflows differ, how they work under the hood, and in which scenarios each model is used. This chapter also connects platform goals like transactions, analytics, and reverse ETL with the right type of processing.

Platform Blueprint Recap & Goal-Oriented Design

We revisit the platform blueprint and show how common tools can be mapped to specific parts of your architecture. You will also learn how to approach platform design by starting from your data and business goals — instead of jumping directly into tool selection. This helps you make smarter decisions when building your own platform.

Modern Architecture Patterns

Architecture patterns like the Lakehouse and Medallion architecture are essential when it comes to structuring and managing data efficiently. This chapter explains how lakehouses bridge the gap between raw file storage and transactional tables, and how bronze, silver, and gold layers in the Medallion architecture help you maintain clean and scalable data pipelines.

Machine Learning & GenAI

This section explains how machine learning pipelines fit into your platform. You’ll learn where training, inference, and deployment typically happen — and how to integrate them effectively. We also introduce the concept of semantic search and retrieval-augmented generation (RAG), a technique used in modern AI applications that retrieve answers from large document stores.

Testing Your Platform

Testing is a critical part of maintaining a reliable platform. In this short but important part, we cover testing strategies that can be applied across ingestion, processing, and transformation stages of your data pipeline.


Training Curriculum



  Part 1 - Platform & Pipeline Basics
Available in days
days after you enroll
  Platform Examples (Currently slides only)
Available in days
days after you enroll
  Part 2 - Advanced Concepts
Available in days
days after you enroll