Apache Spark & Kafka Bootcamp

If you've been working with batch pipelines, you might have realized that sooner or later, they hit their limits. When you need immediate results—whether for critical real-time analytics, monitoring, or event-driven architectures—streaming pipelines become essential.

In this bootcamp, you learn how Apache Kafka and Apache Spark work together to enable scalable, real-time data processing. These two powerful tools have become the backbone of modern Data Engineering, helping you process and analyze data in real-time.

The best part is that we won’t just cover theory. You’ll get hands-on experience building an end-to-end streaming Data Engineering project, and walk away with practical skills you can apply to real-world problems.

Whether you're aiming to break into Data Engineering or enhance your current skill set, this bootcamp with more than 16 hrs of video material is your perfect stepping stone.

Get access to this bootcamp

This bootcamp is part of my Data Engineering Academy. Enroll now to get access to this and all future bootcamps that are based on our full Academy course curriculum.

Work through this practical roadmap:

#1 Learn who Data Engineers are, what they do and how you can learn Data Engineering

Get an overview of the role of Data Engineers within the Data Science field. Learn what Data Engineers do, the tools they use, and how they fit into the Data Science ecosystem. Also, get into key concepts such as data pipelines, platforms, and the intersection between Data Engineering and Machine Learning. It's a great starting point for those new to the field, providing clear insights into Data Engineering jobs and necessary skills.

Learn more

#2 Get into all major Python topics a Data Engineer needs

Dive into essential Python skills specifically for Data Engineering tasks. Learn advanced Python features, data transformation using pandas, how to work with APIs and databases, handle JSONs, and implement unit tests. This hands-on training is ideal for both beginners and experienced engineers aiming to enhance their Python skills for building robust data pipelines and managing data efficiently.

Learn more

#3 Your quick start with Docker

In this course, you'll Learn the key concepts of Docker, such as containers, images, and registries, along with practical skills for working with DockerHub, building images, and deploying containers. We also cover best practices for using Docker in production environments, including security considerations and using tools like Portainer for monitoring.

Learn more

#4 Learn how to build data pipelines with templates and examples for Azure, GCP and Hadoop

I teach you how to build effective data pipelines, including stream, batch, and machine learning pipelines. We cover fundamental concepts like platform blueprints, push and pull ingestion pipelines, and visualization techniques. You’ll also explore the Lambda architecture and get hands-on examples with tools from AWS, Azure, GCP, and Hadoop.

Learn more

#5 Get into the most important security fundamentals for Data Engineering

Here, we cover essential security fundamentals for Data Engineers. Get into the key topics like network security with firewalls, proxies, and bastion hosts, as well as identity and access management (IAM). You'll also learn about secure data transmission using HTTPS, SSH, and security tokens, along with practical advice on securing APIs.

Learn more

#6 Know the different data storage types and learn when to use which

I guide you through the process of selecting the appropriate data storage solutions, including relational DBs, NoSQL DBs, data warehouses, and data lakes. You'll learn about key concepts like OLTP vs. OLAP, ETL vs. ELT, and the specific use cases for different types of data stores. You also gain practical knowledge to make informed decisions on data storage architecture in your Data Engineering projects, ensuring you choose the right tools for the job.

Learn more

#7 Design schemas for SQL, NoSQL and Data Warehouses

Learn how to design schemas for various data stores, including SQL, NoSQL, and data warehouses. Also, understand why schema design is crucial for maintaining organized and efficient data models, helping prevent data swamps. We cover different types of databases - relational, wide-column, document, and more - providing practical examples to help you apply schema design in real-world scenarios.

Learn more

#8 Learn how to design and build APIs with Python and FastAPI

I teach you how to design, build, and deploy APIs using FastAPI and Docker. You'll learn API fundamentals, HTTP methods, and how to work with API response codes and parameters. We also cover hands-on coding with FastAPI to create and test APIs, including deploying them in Docker and testing with Postman. It's an ideal course for Data Engineers wanting to master API creation for data platforms.

Learn more

#9 Use Apache Kafka for streaming

Here, we teach you the basics of working with Apache Kafka, including how to set up message producers and consumers. You’ll learn Kafka's core components like topics, partitions, and brokers, and how it fits into data platforms for event and stream processing. We also cover setting up a Kafka environment with Docker, working with message queues, and testing with Python. It's ideal for Data Engineers looking to integrate Kafka into their workflows for stream processing and real-time data handling.

Learn more

#10 Do stream & batch processing with Apache Spark

This fundamentals course provides a comprehensive introduction to Apache Spark, covering its architecture, key components, and how to use it for distributed data processing. Learn how to work with Spark transformations, actions, DataFrames, SparkSQL, and RDDs using Jupyter notebooks on Docker. We also do hands-on coding with real datasets and teach you how to build and optimize your own data processing pipelines using Spark.

Learn more

#11 Gain experience with MongoDB

Learn the basics of document stores, set up a development environment using Docker, and design schemas for MongoDB. We also cover CRUD operations, working with subdocuments and arrays, and using MongoDB in Data Science platforms. It's ideal for Data Engineers looking to integrate MongoDB into their projects and understand its use in data platforms.

Learn more

#12 Combine your knowledge in this document streaming project

Learn how to build an end-to-end document streaming pipeline using FastAPI, Apache Kafka, Apache Spark, MongoDB, and Streamlit. You will learn to ingest, process, and visualize streaming e-commerce data in real-time, using Docker for deployment. The hands-on project involves setting up APIs, streaming data via Kafka and Spark, storing it in MongoDB, and creating a dashboard for data visualization. It's ideal to practice real-time data streaming in a modern data platform.

Learn more

Join the Academy to get access

This bootcamp is part of my Data Engineering Academy. Enroll now to get access to this and all future bootcamps that are based on our full Academy course curriculum.

Apache Spark & Kafka Bootcamp

Self-paced, hands-on, end-to-end

Get access to this bootcamp

Work through this practical roadmap:

#1 Learn who Data Engineers are, what they do and how you can learn Data Engineering

#2 Get into all major Python topics a Data Engineer needs

#3 Your quick start with Docker

#4 Learn how to build data pipelines with templates and examples for Azure, GCP and Hadoop

#5 Get into the most important security fundamentals for Data Engineering

#6 Know the different data storage types and learn when to use which

#7 Design schemas for SQL, NoSQL and Data Warehouses

#8 Learn how to design and build APIs with Python and FastAPI

#9 Use Apache Kafka for streaming

#10 Do stream & batch processing with Apache Spark

#11 Gain experience with MongoDB

#12 Combine your knowledge in this document streaming project

Join the Academy to get access