As a Data Engineer it is often difficult to monitor pipelines and tell if the data flows correctly. And as soon as an error occurs, it is sometimes very difficult to filter out where the source of it lies. The normal process would be to look for the log files of each single process and search through these huge files manually, which is very time consuming and nerve wrecking.
Elasticsearch is a search engine, which can do the exact same thing for you in no time. By sending the log information directly into it, you can search for all the information you need - just like using Google, for example - and get to the information quickly.
In this training, you get to know what Elasticsearch is, what makes it such a great tool and how you can use it efficiently. You learn in our hands-on part how to write events to Elasticsearch, and how to search and create dashboards with Kibana.
Why log analysis with Elasticsearch
Learn why log analysis and pipeline monitoring is so important for your work as a Data Engineer. I created a presentation about all the basics so you will understand how Elasticsearch is built up and how it works compared to a standard relational database.
Elasticsearch Docker Deployment
Before getting into the hands-on part, learn how to set up and run Elasticsearch on your own computer. For this, we will use Docker images for Elasticsearch and Kibana as well as a Docker compose file to start the implementation. Also get to know the Kibana user interface and its most important features, with which you can visualize your data in Elasticsearch.
Writing events to Elasticsearch
In the practical part of this training, you create a new index for the search engine and create data with a Python script that will be indexed and easily accessed. After that, you will write the log information into Elasticsearch.
Analyzing logs with Kibana
Once the data is in, you work with Kibana to read data out and visualize it on dashboards. Learn how to create customized elements and dashboards for the use case in order to find out what is going on in your pipelines and processes or whether data is missing somewhere. Finally, we dive into the log analysis where you learn how to search for errors in your data.