In this project you learn how to create a data pipeline where you pull data from the Twitter API, analyze, store and visualize the it.
You will host your Machine Learning algorithm on AWS using Lambda and setup your own postgres database with RDS. You create a Streamlit dashboard and gain experience hosting it using Elastic Container Registry (ECR) and Elastic Container Service (ECS).
This project also gives you insights on how to handle dependency management with Poetry.
A great source for public data is the Twitter API. Learn how to configure your access to the API and how to get tweets from a user timeline to process. We look into the configuration and the payload that the API will return.
Every platform needs a data store. Learn how to set up a Postgres database on RDS and why we are going to store the JSON tweets into that database. You will also work with virtual private clouds (VPC) to make the database accessible on the internet. Work with the tool PG Admin to configure your tables and query data from your database.
To apply our machine learning algorithm we apply a pre-trained algorithm from the natural language toolkit nltk. You create a Lambda function that queries tweets from Twitter’s API, creates a sentiment for the tweet and stores it in your database. To be able to run your Lambda function you are going to import the needed python modules as layers. For that you import prepared K-Layers and create a custom python layer yourself.
You will also learn how easy it is to schedule a Lambda function using Event Bridge.
Dependency Management & Streamlit App
As a visiualization you create a Streamlit app. To develop and test your app locally you are going to install Anaconda3 and create a virtual conda environment. With the provided git repository you understand how to manage and install your dependencies with Poetry. Together we will go through the code of the app step by step and I’ll show you how you can deploy in your new virtual environment to test it.
ECS Deployment of Streamlit App
Once our visualization is ready you learn how to work with Docker images and containers on AWS. You are going to prepare an Elastic Container Registry and install the AWS CLI. To login to your AWS account using the CLI you will learn how to create user groups and users with limited access to your account in IAM.
After building your Docker image you push it to ECR. You setup an ECS Fargate cluster and deploy your Streamlit app as a task on it.
- I recommend you do the AWS for Data Engineers project first to understand the AWS basics
Understand how to install and work with Docker for instance through the Document streaming example project
- Lambda intro & IAM setup (3:11)
- Create Lambda function (1:24)
- The Lambda function code explained (8:22)
- Insert the code into your Lambda function (0:56)
- Add layers to Lambda from Klayers (5:32)
- Create & configure custom layers for twython & psycopg2 (4:40)
- Test Lambda & set environment variables (4:53)
- Schedule your Lambda with Event Bridge (3:15)
- Setup container registry ECR (1:52)
- AWS CLI install and ECR login (5:19)
- Dockerfile explained, Docker image build & push image to ECR (2:52)
- Create ECS Fargate cluster (1:34)
- ECS task IAM configuration & Streamlit task creation (4:59)
- Fixing the ECS task (5:14)
- Stopping the task on ECS after you are finished (0:59)
Machine Learning & Containerization on AWS is included in our Data Engineering Academy