Curriculum
1 Introducing Big Data & Hadoop
Learning Objective:
You will get introduced to real-world problems with Big data and will learn how to solve those problems with state-of-the-art tools. Understand how Hadoop offers solutions to traditional processing with its outstanding features. You will get to Know Hadoop background and different distributions of Hadoop available in the market. Prepare the Unix Box for the training.
Topics:
-
1.1 Big Data Introduction
What is Big Data
Data Analytics
Big Data Challenges
Technologies supported by big data
What is Hadoop?
History of Hadoop
Basic Concepts
Future of Hadoop
The Hadoop Distributed File System
Anatomy of a Hadoop Cluster
Breakthroughs of Hadoop
Hadoop Distributions:
Apache Hadoop
Cloudera Hadoop
Horton Networks Hadoop
MapR Hadoop
Hands On:
Installation of Virtual Machine using VMPlayer on Host Machine. And work with Some basics Unix Commands needs for Hadoop.
2 Hadoop Daemon Processes
Learning Objective:
You will learn what are the different Daemons and their functionality at a high level.
Topics:
-
Name Node
-
Data Node
-
Secondary Name Node
-
Job Tracker
-
Task Tracker
Hands On:
-
Creates a Unix Shell Script to run all the deamons at one time.
-
Starting HDFS and MR separately.
3 HDFS (Hadoop Distributed File System)
Learning Objective:
You will get to know how to Write and Read files in HDFS. Understand how Name Node, Data Node and Secondary Name Node take part in HDFS Architecture. You will also know different ways of Accessing HDFS data.
Topics:
-
Blocks and Input Splits
-
Data Replication
-
Hadoop Rack Awareness
-
Cluster Architecture and Block Placement
-
Accessing HDFS
-
JAVA Approach
-
CLI Approach
Hands On:
-
Writes a shell Script which write and read Files in HDFS. Changes Replication factor at three levels. Use Java for working with HDFS.
-
Writes different HDFS Commands and also Admin Commands.
4 Hadoop Installation Modes and HDFS
Learning Objective:
You will learn different modes of Hadoop, understand Pseudo Mode from scratch and work with Configuration. You will learn functionality of different HDFS operation and Visual Representation of HDFS Read and Write actions with their Daemons Namenode and Data Node.
Topics:
-
Local Mode
-
Pseudo-distributed Mode
-
Fully distributed mode
-
Pseudo Mode installation and configurations
-
HDFS basic file operations
Hands On:
-
Install Virtual Box Manager and install Hadoop in Pseudo distributed mode. Changes the different Configuration files required for Pseudo Distributed mode. Performs different File Operations on HDFS.
5 Hadoop Developer Tasks
Learning Objective:
Understand different Phases in Map Reduce including Map, Shuffling, Sorting and Reduce Phases.Get a deep understanding of Life Cycle of MR in YARN submission. Learn about Distributed Cache concept in detail with examples.
Write Wordcount MR Program and monitor the Job using Job Tracker and YARN Console. Also learn about more use cases.
Topics:
-
Basic API Concepts
-
The Driver Class
-
The Mapper Class
-
The Reducer Class
-
The Combiner Class
-
The Partitioner Class
-
Examining a Sample MapReduce Program with several examples
-
Hadoop’s Streaming API
Hands On:
-
Learn about writing MR job from scratch, writing different Logics in Mapper and Reducer and submitting the MR Job in Standalone and Distributed mode.
-
Also learn about writing Word Count MR job, Calculating Average Salary of employee who meets certain conditions and Sales Calculation using MR.
6 Hadoop Ecosystems
6.1 PIG
Learning Objective:
Understand the importance of Pig in Big Data World, PIG architecture and PIG Latin commands for doing different complex operation on Relations, and also Pig UDF and Aggregation functions with piggy bank library. Learn how to pass dynamic arguments to Pig Scripts.
Topics
-
PIG concepts
-
Install and configure PIG on a cluster
-
PIG Vs MapReduce and SQL
-
Write sample PIG Latin scripts
-
Modes of running PIG
-
PIG UDFs.
Hands On:
Login to Pig Grunt shell to issue Pig Latin commands in different Execution modes. Different ways of loading and transformation on Pig relations lazily. Registering UDF in grunt shell and perform Replicated Join Operations
6.2 HIVE
Learning Objective:
Understand importance of Hive in Big Data World. Different ways of configuring HIVE Metastore. Learn different types of tables in hive. Learn how to optimize hive jobs using Partitioning and Bucketing and Passing dynamic Arguments to Hive scripts. You will get an understanding of Joins,UDFS,Views etc.
Topics:
-
Hive concepts
-
Hive architecture
-
Installing and configuring HIVE
-
Managed tables and external tables
-
Joins in HIVE
-
Multiple ways of inserting data in HIVE tables
-
CTAS, views, alter tables
-
User defined functions in HIVE
-
Hive UDF
Hands On:
-
Executes Hive Queries in different Modes. Creates Internal and External tables. Perform Query Optimization by creating tables with Partition and Bucketing Concepts. Run System defined and User Define Functions including Explode and Windows Functions.
6.3 SQOOP
Learning Objectives:
Learn how to import normally and Incrementally data from RDBMS to HDFS and HIVE tables, and also learn how to export the data from HDFS and HIVE table to RDBMS.Learns Architecture of Sqoop Import and Export.
Topics:
-
SQOOP concepts
-
SQOOP architecture
-
Install and configure SQOOP
-
Connecting to RDBMS
-
Internal mechanism of import/export
-
Import data from Oracle/MySQL to HIVE
-
Export data to Oracle/MySQL
-
Other SQOOP commands.
Hands On:
Triggers Shell script to call Sqoop import and Export Commands. Learn to automate Sqoop Incremental imports with entering the last value of the appended Column. Run Sqoop export from HIVE table directly to RDBMS.
6.4 HBASE
Learning Objectives:
Understand different types of NOSQL databases and CAP theorem. Learn different DDL and CRUD operations of HBASE. Understand Hbase Architecture and Zookeeper Importance in managing HBase. Learns Hbase Column Family optimization and client Side Buffering.
Topics:
-
HBASE concepts
-
ZOOKEEPER concepts
-
HBASE and Region server architecture
-
File storage architecture
-
NoSQL vs SQL
-
Defining Schema and basic operations
-
DDLs
-
DMLs
-
HBASE use cases
Hands On:
-
Create HBASE tables using Shell and perform CRUD operations with JAVA API. Change the column family properties and also perform sharding process. Also create tables with multiple splits to improve the performance of HBASE query.
6.5 OOZIE
Learning Objectives:
Understand Oozie Architecture and monitor Oozie Workflow using Oozie. Understand how Coordinator and Bundles work along with Workflow in Oozie. Also learn Oozie Commands to submit, Monitor and Kill the Workflow.
Topics:
-
OOZIE concepts
-
OOZIE architecture
-
Workflow engine
-
Job coordinator
-
Installing and configuring OOZIE
-
HPDL and XML for creating Workflows
-
Nodes in OOZIE
-
Action nodes and Control nodes
-
Accessing OOZIE jobs through CLI, and web console
-
Develop and run sample workflows in OOZIE
-
Run MapReduce programs
-
Run HIVE scripts/jobs.
Hands on:
Create the Workflow to incremental Imports of Sqoop. Create the Workflow for Pig, Hive and Sqoop Exports. And also execute Coordinator to Schedule the Workflows.
6.6 FLUME
Learning Objectives:
Understand Flume Architecture and its components Source, Channel and Sinks. Configure flume with Socket, File Sources and HDFS and Hbase Sink. Understand Fan In and Fan Out Architecture.
Topics:
-
FLUME Concepts
-
FLUME Architecture
-
Installation and configurations
-
Executing FLUME jobs
Hands on:
Create flume Configurations files and configure with Different Source and Sinks.Stream Twitter Data and create hive table.
7 Data Analytics using Pentaho as an ETL tool
Learning Objective:
You will learn Pentaho Big Data Best Practices, Guidelines, and Techniques documents.
Topics:
-
Data Analytics using Pentaho as an ETL tool
-
Big Data Integration with Zero Coding Required
Hands on:
You will use Pentaho as ETL tool for data analytics.
8 Integrations
Learning Objective:
You will see different Integrations among hadoop ecosystem in a Data engineering Flow. Also understand how important it is to create a flow for ETL process.
Topics:
-
MapReduce and HIVE integration
-
MapReduce and HBASE integration
-
Java and HIVE integration
-
HIVE – HBASE Integration
Hands On:
Uses Storage Handlers for integrating HIVE and HBASE. Integrates HIVE and PIG as well.