Big Data has become a real phenomenon. Huge amounts of data are stored and processed every day - and that at high speed. At the same time, the data is often unstructured and inconsistent, which makes it even more difficult, if not impossible, to process it by using conventional methods or single systems.

Companies around the world are developing ever more efficient software platforms and methods to collect electronic data on a large scale, store it in gigantic architectures and process it systematically. One of those platforms, which has proven itself up to now, is Apache Hadoop.

Hadoop is a Java-based and open source framework that handles all sorts of storage and processing for Big Data across clusters of computers using simple programming models. It is a versatile, easily accessible and high speed software library architecture, which is designed to detect and handle failures at the application layer.

What you will learn

In this comprehensive training, held by Suyog Nagaokar, you will learn and master the Hadoop architecture and its components like HDFS, YARN, MapReduce, Hive and Sqoop.

With this Big Data training, you will understand the detailed concepts of the Hadoop Eco-system along with hands-on labs. Understand the most important Hadoop commands and learn how to implement and use each component to solve real business problems!

Furthermore, you will install and work with a real Hadoop installation right on your desktop with Cloudera Quickstart VM. Learn how to store and query your data with Sqoop, Hive, and MySQL and write Hive queries to analyze data on Hadoop.

By the end of this training, you will also be able to write Hadoop commands, manage Big Data on a cluster with HDFS and MapReduce and manage your cluster with YARN and Hue.


  • Access to a PC running 64-bit Windows or Linux with an Internet connection
  • At least 8GB of *free* (not total) RAM, if you want to participate in the hands-on activities and exercises. If your PC does not meet these requirements, you can still follow along in the training without doing hands-on activities
  • Some activities will require some prior programming experience, preferably in Python
  • A basic familiarity with the Linux command line will be very helpful

About the Author

Suyog Nagaokar

Suyog has 8+ years experience in Data Engineering, providing automated and optimized solutions to businesses based on Hadoop, Spark and Streaming frameworks, thus helping them generate value from data.

He has experience with Telecom and Banking domains focusing on Customer Genomics, Fraud Analytics, Digital Banking and Machine Learning for Telecom.

Suyog has also mentored industry professionals with 0-15 years experience and Engineering students on Big Data in renowned institutes like EduPristine, IMS ProSchool and DataVision.

Connect with Suyog on LinkedIn

Project Curriculum

  1: What is Big Data and Hadoop
Available in days
days after you enroll
  2: Hadoop Distributions and Setup
Available in days
days after you enroll
  3: Data Warehousing with Apache Hive
Available in days
days after you enroll
  4: Import/Export Data with Apache Sqoop
Available in days
days after you enroll