As a Data Engineer, you will often work on analytics platforms where companies store their data in data lakes and warehouses for visualization or machine learning.
By now, modern data warehouses and analytics stores no longer need you to load the data into them. Many warehouses like AWS Redshift, BigQuery or Snowflake allow you to load data directly from files in your data lake. This data lake integration is the key to flexibility while interacting with your data. It makes a modern data warehouse so nice to use for all kinds of analytics workloads.
In this training you learn how easy it is to use data lakes, warehouses and BI tools and how you can combine all these. You will also understand how to load your files into the lake and visualize them in a report.
Data Warehouse & Data Lake Basics
We start by going through how warehouses fit traditionally into a data science platform and what we can do with the modern ones. You learn how data actually comes into data warehouses via ETL or ELT data integration, how this is structured there and how you can leverage the data lake in this. Then we dig a bit deeper into data lakes. Learn what they are, how they work and how you can access them directly.
GCP Hands-on Cloud Storage, BigQuery & Data Studio
You are going to find out hands-on that, although the platforms are different and the tools are named differently, they work quite the same. We will start by configuring Cloud Storage, BigQuery and Data Studio on the Google Cloud Platform (GCP). We'll put a file into the lake, create a BigQuery table and a Quicksight report that you can share with anyone you want.
AWS Hands-on S3, Athena, Glue & Quicksight
Then we go into AWS to set up a manual data lake integration through S3, Athena and Quicksight. After that you are going to change the integration to using the Glue Data catalog. I will show you exactly how I configured Glue for this.
Recap & Bonus Lesson AWS Redshift Spectrum
After a quick training recap, I added for you a BONUS LESSON on how you can do what you did with AWS Athena through Redshift Spectrum.
For the AWS Athena & Redshift part we use the Data Catalog we prepared in the AWS Capstone project.