Introduction

Description

In this fun engineering example project you learn how to track people making scans with their phone. The aim is to use Elasticsearch as a search engine in combination with a data set in which 100,000 people visit shops and together make 1,000,000 scans.


For the project, you create a custom dataset using Python, pandas and an open San Francisco dataset. This public dataset contains over 140,000 stores, their name and location. From this set, you then choose 10,000 shops and create 100,000 imaginary users, which you let do about 10 app scans each. After writing your data into Elasticsearch, you are going to build a user interface with Streamlit to visualize the data.


The user interface you create for your website will include:

  • A free text search, in which you can search for the shops by name, 
  • A zip code search, which you can use to search for shops in individual regions,
  • The search for shops using the business ID to see who was there, 
  • Tracking by device ID to search and track where a person has been.


Working through this project, you learn how to do data transformation and how to upload paquet files to Elasticsearch. You work with the user interface Kibana for Elasticsearch where you can manage the index and search for individual documents as well as Streamlit for building an interactive user interface.


Project content overview

During this hands-on example project, you will

  • Prepare your San Francisco dataset of 10,000 businesses
  • Create 100,000 fake users
  • Merge fake data with San Francisco businesses
  • Create the 1,000,000 app scans
  • Prepare the Elasticsearch load & load data
  • Create a Streamlit app with control elements, folium maps and tables
  • Do page setup & querying from Elasticsearch
  • and much more…


Requirements

Before getting into the project, you should complete the training “Log Analysis with Elasticsearch” to understand and master the basics of Elasticsearch. Since this training is very data transformation-heavy, I also recommend you work through the "Python for Data Engineers" training's Pandas lessons in advance.
This project will run on a computer with 8GB of RAM.