Data Engineering with Hadoop

Description

Big Data has become a real phenomenon. Huge amounts of data are stored and processed every day - and that at high speed. At the same time, the data is often unstructured and inconsistent, which makes it even more difficult, if not impossible, to process it by using conventional methods or single systems.

Companies around the world are developing ever more efficient software platforms and methods to collect electronic data on a large scale, store it in gigantic architectures and process it systematically. One of those platforms, which has proven itself up to now, is Apache Hadoop.

Hadoop is a Java-based and open source framework that handles all sorts of storage and processing for Big Data across clusters of computers using simple programming models. It is a versatile, easily accessible and high speed software library architecture, which is designed to detect and handle failures at the application layer.

What you will learn

In this comprehensive training, held by Suyog Nagaokar, you will learn and master the Hadoop architecture and its components like HDFS, YARN, MapReduce, Hive and Sqoop.

With this Big Data training, you will understand the detailed concepts of the Hadoop Eco-system along with hands-on labs. Understand the most important Hadoop commands and learn how to implement and use each component to solve real business problems!

Furthermore, you will install and work with a real Hadoop installation right on your desktop with Cloudera Quickstart VM. Learn how to store and query your data with Sqoop, Hive, and MySQL and write Hive queries to analyze data on Hadoop.

By the end of this training, you will also be able to write Hadoop commands, manage Big Data on a cluster with HDFS and MapReduce and manage your cluster with YARN and Hue.

Requirements

Access to a PC running 64-bit Windows or Linux with an Internet connection
At least 8GB of *free* (not total) RAM, if you want to participate in the hands-on activities and exercises. If your PC does not meet these requirements, you can still follow along in the training without doing hands-on activities
Some activities will require some prior programming experience, preferably in Python
A basic familiarity with the Linux command line will be very helpful

About the Author

Suyog Nagaokar

Suyog has 8+ years experience in Data Engineering, providing automated and optimized solutions to businesses based on Hadoop, Spark and Streaming frameworks, thus helping them generate value from data.

He has experience with Telecom and Banking domains focusing on Customer Genomics, Fraud Analytics, Digital Banking and Machine Learning for Telecom.

Suyog has also mentored industry professionals with 0-15 years experience and Engineering students on Big Data in renowned institutes like EduPristine, IMS ProSchool and DataVision.

Connect with Suyog on LinkedIn

Project Curriculum

1: What is Big Data and Hadoop

Available in days

days after you enroll

2: Hadoop Distributions and Setup

Available in days

days after you enroll

3: Data Warehousing with Apache Hive

Available in days

days after you enroll

4: Import/Export Data with Apache Sqoop

Available in days

days after you enroll

Pricing

Data Engineering with Hadoop is included in our Data Engineering Academy

Data Engineering Academy

Learn with our Academy. Access to all current and future content including Discord community and the Associate Data Engineer Certification

48 Course Bundle

$264

Data Engineering with Hadoop

Hadoop tutorial with HDFS, MapReduce, Hive and Sqoop!

Description

What you will learn

Requirements

About the Author

Suyog Nagaokar

Project Curriculum

Pricing

Data Engineering Academy

Learn with our Academy. Access to all current and future content including Discord community and the Associate Data Engineer Certification