Big Data Analytics Training Course

A tanfolyamról

Big Data refers to large amounts of structured and unstructured data that can be analysed using traditional databases and multiple software techniques to reveal patterns that can be used to meet business objectives. Analyses of such large amounts of unstructured data helps in understanding and predicting human behaviour and solving complex business problems. Big Data is huge and consists of complex data sets that traditional data processing software cannot manage.
Big Data analytics is the process of gathering, managing, and analyzing large sets of data (Big Data) to uncover patterns and other useful information. These patterns are a minefield of information and analysing them provide several insights that can be used by organizations to make business decisions.

Big Data analysis provides several advantages such as:

Make correct business decisions
Perform efficient operations
Higher profits
Ensure happy and satisfied customers

Benefits
Big data analytics certification is growing in demand and is most relevant in data science today than in other fields. The field of data analytics is new and there are not enough professionals with the right skills. Hence, the credibility of big data analytics certification promises many growth opportunities for organizations as well as individuals in the booming field of data science.

Individual benefits

An individual with Big Data analytics skills can make decisions more effectively
An individual with Big Data skills can earn a better salary, good career growth, and more chances of getting hired by top companies

Organizational benefits

Big Data allows organizations to understand consumer needs and make informed decisions
Big data tools can identify efficient ways of doing business through sentiment analysis
Businesses can get ahead of the competition by better understanding market conditions
With Big Data Analytics, organizations understand ongoing trends and develop products accordingly.

What you will learn

Understand the Fundamentals
Learn the basics of Apache Hadoop & data ETL, ingestion, and processing with Hadoop tools.
Learn Pig framework
Understand how to join multiple data sets and analyze disparate data with the Pig framework.
Understand the Hive framework
How to organize data into tables, perform transformations, and simplify complex queries with Hive.
Perform Real-time analysis
How to perform real-time interactive analyses on huge data sets stored in HDFS using SQL with Impala.
Choose the best tool
How to pick the best tool in Hadoop, achieve interoperability, and manage repetitive workflows.

Who should attend

Data Architects
Data Scientists
Developers
Data Analysts
BI Analysts
BI Developers
SAS Developers
Project Managers
Mainframe and Analytics Professionals
Professionals who want to acquire knowledge on Big Data

We provide the course in English.

Tematika

Curriculum

1 Introducing Big Data & Hadoop
Learning Objective:
You will get introduced to real-world problems with Big data and will learn how to solve those problems with state-of-the-art tools. Understand how Hadoop offers solutions to traditional processing with its outstanding features. You will get to Know Hadoop background and different distributions of Hadoop available in the market. Prepare the Unix Box for the training.

Topics:

1.1 Big Data Introduction

What is Big Data
Data Analytics
Big Data Challenges
Technologies supported by big data

1.2 Hadoop Introduction

What is Hadoop?
History of Hadoop
Basic Concepts
Future of Hadoop
The Hadoop Distributed File System
Anatomy of a Hadoop Cluster
Breakthroughs of Hadoop
Hadoop Distributions:
Apache Hadoop
Cloudera Hadoop
Horton Networks Hadoop
MapR Hadoop
Hands On:

Installation of Virtual Machine using VMPlayer on Host Machine. And work with Some basics Unix Commands needs for Hadoop.

2 Hadoop Daemon Processes
Learning Objective:
You will learn what are the different Daemons and their functionality at a high level.

Topics:

Name Node
Data Node
Secondary Name Node
Job Tracker
Task Tracker

Hands On:

Creates a Unix Shell Script to run all the deamons at one time.
Starting HDFS and MR separately.

3 HDFS (Hadoop Distributed File System)
Learning Objective:
You will get to know how to Write and Read files in HDFS. Understand how Name Node, Data Node and Secondary Name Node take part in HDFS Architecture. You will also know different ways of Accessing HDFS data.

Topics:

Blocks and Input Splits
Data Replication
Hadoop Rack Awareness
Cluster Architecture and Block Placement
Accessing HDFS
JAVA Approach
CLI Approach

Hands On:

Writes a shell Script which write and read Files in HDFS. Changes Replication factor at three levels. Use Java for working with HDFS.
Writes different HDFS Commands and also Admin Commands.

4 Hadoop Installation Modes and HDFS
Learning Objective:
You will learn different modes of Hadoop, understand Pseudo Mode from scratch and work with Configuration. You will learn functionality of different HDFS operation and Visual Representation of HDFS Read and Write actions with their Daemons Namenode and Data Node.

Topics:

Local Mode
Pseudo-distributed Mode
Fully distributed mode
Pseudo Mode installation and configurations
HDFS basic file operations

Hands On:

Install Virtual Box Manager and install Hadoop in Pseudo distributed mode. Changes the different Configuration files required for Pseudo Distributed mode. Performs different File Operations on HDFS.

5 Hadoop Developer Tasks
Learning Objective:
Understand different Phases in Map Reduce including Map, Shuffling, Sorting and Reduce Phases.Get a deep understanding of Life Cycle of MR in YARN submission. Learn about Distributed Cache concept in detail with examples.
Write Wordcount MR Program and monitor the Job using Job Tracker and YARN Console. Also learn about more use cases.

Topics:

Basic API Concepts
The Driver Class
The Mapper Class
The Reducer Class
The Combiner Class
The Partitioner Class
Examining a Sample MapReduce Program with several examples
Hadoop's Streaming API

Hands On:

Learn about writing MR job from scratch, writing different Logics in Mapper and Reducer and submitting the MR Job in Standalone and Distributed mode.
Also learn about writing Word Count MR job, Calculating Average Salary of employee who meets certain conditions and Sales Calculation using MR.

6 Hadoop Ecosystems
6.1 PIG
Learning Objective:
Understand the importance of Pig in Big Data World, PIG architecture and PIG Latin commands for doing different complex operation on Relations, and also Pig UDF and Aggregation functions with piggy bank library. Learn how to pass dynamic arguments to Pig Scripts.

Topics

PIG concepts
Install and configure PIG on a cluster
PIG Vs MapReduce and SQL
Write sample PIG Latin scripts
Modes of running PIG
PIG UDFs.

Hands On:
Login to Pig Grunt shell to issue Pig Latin commands in different Execution modes. Different ways of loading and transformation on Pig relations lazily. Registering UDF in grunt shell and perform Replicated Join Operations

6.2 HIVE
Learning Objective:
Understand importance of Hive in Big Data World. Different ways of configuring HIVE Metastore. Learn different types of tables in hive. Learn how to optimize hive jobs using Partitioning and Bucketing and Passing dynamic Arguments to Hive scripts. You will get an understanding of Joins,UDFS,Views etc.

Topics:

Hive concepts
Hive architecture
Installing and configuring HIVE
Managed tables and external tables
Joins in HIVE
Multiple ways of inserting data in HIVE tables
CTAS, views, alter tables
User defined functions in HIVE
Hive UDF

Hands On:

Executes Hive Queries in different Modes. Creates Internal and External tables. Perform Query Optimization by creating tables with Partition and Bucketing Concepts. Run System defined and User Define Functions including Explode and Windows Functions.

6.3 SQOOP
Learning Objectives:
Learn how to import normally and Incrementally data from RDBMS to HDFS and HIVE tables, and also learn how to export the data from HDFS and HIVE table to RDBMS.Learns Architecture of Sqoop Import and Export.

Topics:

SQOOP concepts
SQOOP architecture
Install and configure SQOOP
Connecting to RDBMS
Internal mechanism of import/export
Import data from Oracle/MySQL to HIVE
Export data to Oracle/MySQL
Other SQOOP commands.

Hands On:
Triggers Shell script to call Sqoop import and Export Commands. Learn to automate Sqoop Incremental imports with entering the last value of the appended Column. Run Sqoop export from HIVE table directly to RDBMS.

6.4 HBASE
Learning Objectives:
Understand different types of NOSQL databases and CAP theorem. Learn different DDL and CRUD operations of HBASE. Understand Hbase Architecture and Zookeeper Importance in managing HBase. Learns Hbase Column Family optimization and client Side Buffering.

Topics:

HBASE concepts
ZOOKEEPER concepts
HBASE and Region server architecture
File storage architecture
NoSQL vs SQL
Defining Schema and basic operations
DDLs
DMLs
HBASE use cases

Hands On:

Create HBASE tables using Shell and perform CRUD operations with JAVA API. Change the column family properties and also perform sharding process. Also create tables with multiple splits to improve the performance of HBASE query.

6.5 OOZIE
Learning Objectives:
Understand Oozie Architecture and monitor Oozie Workflow using Oozie. Understand how Coordinator and Bundles work along with Workflow in Oozie. Also learn Oozie Commands to submit, Monitor and Kill the Workflow.

Topics:

OOZIE concepts
OOZIE architecture
Workflow engine
Job coordinator
Installing and configuring OOZIE
HPDL and XML for creating Workflows
Nodes in OOZIE
Action nodes and Control nodes
Accessing OOZIE jobs through CLI, and web console
Develop and run sample workflows in OOZIE
Run MapReduce programs
Run HIVE scripts/jobs.

Hands on:
Create the Workflow to incremental Imports of Sqoop. Create the Workflow for Pig, Hive and Sqoop Exports. And also execute Coordinator to Schedule the Workflows.

6.6 FLUME
Learning Objectives:
Understand Flume Architecture and its components Source, Channel and Sinks. Configure flume with Socket, File Sources and HDFS and Hbase Sink. Understand Fan In and Fan Out Architecture.

Topics:

FLUME Concepts
FLUME Architecture
Installation and configurations
Executing FLUME jobs

Hands on:
Create flume Configurations files and configure with Different Source and Sinks.Stream Twitter Data and create hive table.

7 Data Analytics using Pentaho as an ETL tool
Learning Objective:
You will learn Pentaho Big Data Best Practices, Guidelines, and Techniques documents.

Topics:

Data Analytics using Pentaho as an ETL tool
Big Data Integration with Zero Coding Required

Hands on:
You will use Pentaho as ETL tool for data analytics.

8 Integrations
Learning Objective:
You will see different Integrations among hadoop ecosystem in a Data engineering Flow. Also understand how important it is to create a flow for ETL process.

Topics:

MapReduce and HIVE integration
MapReduce and HBASE integration
Java and HIVE integration
HIVE - HBASE Integration

Hands On:
Uses Storage Handlers for integrating HIVE and HBASE. Integrates HIVE and PIG as well.

Kinek ajánljuk

Előfeltételek

Prerequisites

There are no specific prerequisites required to learn Big Data.

Microsoft
tanfolyamok

Menedzsment
tanfolyamok

Python, Java, C++, Adatbázisok (Cassandra, NoSQL)
tanfolyamok

DevOps Mérnök Integrált képzési program
tanfolyamok

További
tanfolyamok

A tanfolyamról

Tematika

Kinek ajánljuk

Előfeltételek

Kapcsolódó tanfolyamok

Big Data and Hadoop Training Course

Hadoop Administration Course Certification Training

Apache Kafka Course Certification Training

Apache Spark and Scala Course Training

Comprehensive Hive Certification Training

Comprehensive Pig Certification Training