As you know, the fundamentals always build the basis for everything and that’s why they will also be the first step here on your Data Engineering journey.
This course is a guide of Computer Science topics and resources you should know as a Data Engineer and which will help you immensely on your way. The major topics in this fundamentals course will therefore be Software Development and Relational Databases.
In this introduction course you learn what Python is exactly and how you can use it as a simple calculator by using math expressions. You will also learn about strings, the concept of variables, logic or boolean expressions, and loops for repeated calculations. Get to know the functions for using a piece of code repeatedly, learn what lists are, and how you can use dictionaries and modules. Also learn how to code Python and use your code to work with JSON or CSV files.
This course is picking up where our Python introduction course has ended. This way you have a really good Python foundation even if you come from a different field and haven't coded before.
You learn all the important basics a Data Engineer needs. From advanced Python features, how to transform data with pandas to working with APIs and Postgres databases.
Data pipelines are the number one thing within the Data Science platform. Without them, data ingestion or machine learning processing, for example, would not be possible.
This course will help you understand how to create stream and batch processing pipelines as well as machine learning pipelines by going through some of the most essential basics - complemented by templates and examples for useful cloud computing platforms.
Having a proper security concept around your platform and pipelines is super important. Almost anybody can build a one off proof of concept that lacks in the security part. If you want to become a good Data Engineer you need to know how to secure your work and data.
With this security course you are well prepared for your job as Data Engineer. Here, I will show you the basic topics that you are going to experience at work all the time.
One part of creating a data platform and pipelines is to choose data stores, which is the topic of this course.
Here, we will look into relational databases and NoSQL databases as well as data warehouses and data lakes. This way, you learn when to use the different databases and storages and how to incorporate them into your pipeline.
After this course you know how to store your data and how to actually choose the right data storage for your purpose. It helps you to understand the differences between the storage types and to make good decisions in your future work as a Data Engineer. In later courses I will also apply specific data stores out of the different categories.
In this course you learn why schema design is so important and crucial for your work as a Data Engineer and how a schema helps to create a maintainable model of your data and to prevent data swamps. We alsol go through the major data stores for which you can create a schema. This way you learn how to design the actual schemas for such databases and storages for different purposes.
With this knowledge and the knowledge from the "Choosing Data Stores" course, you can choose the right data store and design the schema with which you define and implement an important part of the data platform. You learn how to define a schema perfect for your use, based on the goals you have. This way, you also learn how to optimize your database to store and retrieve the data.
Docker is one of the most popular open source platforms that every Data Engineer should know how to work with. It is the best alternative to Virtual Machines nowadays as it is very clear and lightweight to use. With Docker, you can deploy your own code or run tools on your cloud, for example. It enables you to package applications into self sustained images that make it way more controllable compared to other technologies.
In this fundamentals course you will learn all the basics you need to master the Docker game in your job as a Data Engineer!
APIs are the cornerstone of every data platform. Either you host APIs that are used by clients, or you use external APIs. You have to know how to work with them.
In this course you learn all the necessary fundamentals of working with APIs. You will learn how to design APIs as well as how to code and deploy them with Docker. You will use the FastAPI framework for Python that is perfect for quick API development.
In this course you learn all the basics you need to start working with Apache Kafka. Learn how you can set up your queue and how to write message producers and consumers. Once you go through this course it will be easy for you to work with Kafka and to understand how similar tools on the cloud platforms work.
After Data Pipelines, Data Processing is one of the most important things within Data Engineering. As a Data Engineer, you will come across processing everywhere and it is very crucial to set up a powerful and well distributed processing for your work. A very useful and widely used tool for doing that is Apache Spark.
In this Apache Spark Fundamentals course, you learn about the Spark architecture and the fundamentals of how Spark works. You train your skills with Spark transformations and actions and you work with Jupyter Notebooks on Docker. You also dive into DataFrames, SparkSQL and RDDs to learn how they work. In the end you have all the fundamental knowledge to write your own jobs and build your pipelines with it.
As a Data Engineer, you will often work on analytics platforms where companies store their data in data lakes and warehouses for visualization or machine learning.
By now, modern data warehouses and analytics stores no longer need you to load the data into them. Many warehouses like AWS Redshift, BigQuery or Snowflake allow you to load data directly from files in your data lake. This data lake integration is the key to flexibility while interacting with your data. It makes a modern data warehouse so nice to use for all kinds of analytics workloads.
In this course you learn how easy it is to use data lakes, warehouses and BI tools and how you can combine all these. You will also understand how to load your files into the lake and visualize them in a report.
The AWS project is the perfect project for everyone who wants to start with Cloud platforms. Currently, AWS is the most used platform for data processing. It is really great to use, especially for those people who are new in their Data Engineering job or looking for one.
In this project you learn in easy steps how to start with AWS, which topics need to be taken into consideration and how you can set up a complete end-to-end project. For this, we use data from an e-commerce dataset. Based on the project, you learn how to model data and which AWS tools are important, such as Lambda, API Gateway, Glue, Redshift Kinesis and DynamoDB.
This is a full end-to-end example project. The project uses e-commerce data that contains invoices for customers and items on these invoices.
Our goal is to ingest the invoices one by one, as they are created, and visualize them in a user interface. The technologies we use are FastAPI, Apache Kafka, Apache Spark, MongoDB and Streamlit - tools you already learned in the academy individually. I recommend you take a look into these courses first before you proceed.