Big Data has become a real phenomenon. Huge amounts of data are stored and processed every day - and that at high speed. At the same time, the data is often unstructured and inconsistent, which makes it even more difficult, if not impossible, to process it by using conventional methods or single systems.
Companies around the world are developing ever more efficient software platforms and methods to collect electronic data on a large scale, store it in gigantic architectures and process it systematically. One of those platforms, which has proven itself up to now, is Apache Hadoop.
Hadoop is a Java-based and open source framework that handles all sorts of storage and processing for Big Data across clusters of computers using simple programming models. It is a versatile, easily accessible and high speed software library architecture, which is designed to detect and handle failures at the application layer.
What you will learn
In this comprehensive training, held by Suyog Nagaokar, you will learn and master the Hadoop architecture and its components like HDFS, YARN, MapReduce, Hive and Sqoop.
With this Big Data training, you will understand the detailed concepts of the Hadoop Eco-system along with hands-on labs. Understand the most important Hadoop commands and learn how to implement and use each component to solve real business problems!
Furthermore, you will install and work with a real Hadoop installation right on your desktop with Cloudera Quickstart VM. Learn how to store and query your data with Sqoop, Hive, and MySQL and write Hive queries to analyze data on Hadoop.
By the end of this training, you will also be able to write Hadoop commands, manage Big Data on a cluster with HDFS and MapReduce and manage your cluster with YARN and Hue.
- Access to a PC running 64-bit Windows or Linux with an Internet connection
- At least 8GB of *free* (not total) RAM, if you want to participate in the hands-on activities and exercises. If your PC does not meet these requirements, you can still follow along in the training without doing hands-on activities
- Some activities will require some prior programming experience, preferably in Python
- A basic familiarity with the Linux command line will be very helpful
About the Author
Suyog has 8+ years experience in Data Engineering, providing automated and optimized solutions to businesses based on Hadoop, Spark and Streaming frameworks, thus helping them generate value from data.
He has experience with Telecom and Banking domains focusing on Customer Genomics, Fraud Analytics, Digital Banking and Machine Learning for Telecom.
Suyog has also mentored industry professionals with 0-15 years experience and Engineering students on Big Data in renowned institutes like EduPristine, IMS ProSchool and DataVision.
Connect with Suyog on LinkedIn
- What can you expect from this course? (2:09)
- Introduction to Big Data (14:49)
- What is Hadoop? Why Hadoop? (5:37)
- Hadoop Architecture – Overview (2:38)
- Hadoop Architecture – Key services (7:12)
- Storage/Processing characteristics (7:50)
- Store and process data in HDFS (3:55)
- Handling failures - Part 1 (5:09)
- Handling failures - Part 2 (7:32)
- Rack Awareness (5:58)
- Hadoop 1 v/s Hadoop 2 (12:50)
- Hive Overview (4:53)
- How Hive works (5:56)
- Hive query execution flow (4:58)
- Creating a Data Warehouse & Loading data (5:09)
- Creating a Hive Table (21:17)
- Load data from local & HDFS (17:18)
- Internal tables vs External tables (17:19)
- Partitioning & Bucketing. (Cardinality concept) (16:23)
- Static Partitioning - Lab (14:56)
- Dynamic Partitioning - Lab (13:54)
- Bucketting - Lab (22:31)
- Storing Hive query output (11:33)
- Hive SerDe (14:25)
- ORC File Format (14:09)
- Sqoop overview (3:51)
- Sqoop list-databases and list-tables (6:30)
- Scoop Eval? (3:58)
- Import RDBMS table with Sqoop (11:39)
- Handling parallelism in Sqoop (9:01)
- Import table without primary key (11:00)
- Custom Query for Sqoop Import (8:47)
- Incremental Sqoop Import - Append (9:51)
- Incremental Sqoop Import - Last Modified (13:54)
- Scoop Job (8:00)
- Sqoop Import to a Hive table (10:58)
- Sqoop Import all tables - Part 1 (6:19)
- Sqoop Import all tables - Part 2 (14:02)
- Sqoop Export (6:13)
- Export Hive table (4:35)
- Export with Staging table (6:23)
Data Engineering with Hadoop is included in our Data Engineering Academy