Introduction


Description

Google Cloud Platform (GCP) is one of the most popular cloud computing services out there. It is a very crucial platform for Data Engineers as it provides a comprehensive set of tools and services to build, manage, and optimize data pipelines. It enables efficient data storage, processing, analysis, and visualization, empowering Data Engineers to create scalable and high-performing data solutions.


In this hands-on training you learn everything you need to get started on Google Cloud Platform. We guide you through the end-to-end process of building a data pipeline, focusing on extracting data from an external weather API, processing it through a carefully crafted data pipeline, and visualizing the results using Looker Studio.

By the end of this well-rounded practical project, you will have gained the knowledge and experience to work with GCP in real life. You will see, the platform is really easy to work with and the project can also help you with other cloud platforms like AWS as the services are very comparable. 

Little side note: Our team also created a Github repository for this course with an overview of the project itself as well as code fragments that you just need to copy in to make it a bit easier for you to follow and replicate what we are showing you.


Course Outline

Data & Goals

In the beginning, we go through the objectives of our project. You get an in-depth overview of the architecture of our data engineering project, exploring the components and how they interact to form a seamless data pipeline. Furthermore, we take a detailed look at the weather API we will use to source our data. 

We also guide you through the process of setting up a Google Cloud account, so that you can dive right into the next step. (Tip: for the GCP setup, you can use the 300 Dollar free credits that Google provides!)


Project Setup

Now, with your Google Cloud account being set up, you can create your project, enable the necessary APIs, and configure the scheduling.


Pipeline Creation - extract from API

In this section, you will configure the required resources to support your data pipeline. This includes setting up a serverless MySQL database with Cloud SQL, as well as a Linux Virtual Machine (VM) with Compute Engine for database management.

We also delve into the specifics of extracting data from the weather API with Cloud Scheduler. You work with serverless cloud functions to pull data from the API and develop a message queue to send through your data using pub/sub.


Pipeline Creation - write to db

Here, we explore how to write the extracted data to a MySQL database. This includes creating serverless functions to write data to the database, and testing the data writing process.


Visualization

In the last section, we focus on setting up Looker Studio to create meaningful visualizations. You create bubble and time series charts and finally monitor the weather data.