Várjuk 2025-ben is tanfolyamainkon és vizsgáinkon!

Big Data Analytics Training Course

BDAT-HV
8 nap
746 200 Ft + ÁFA
tanfolyamkezdési időpontok:
Jelentkezem!
oktatók:

A tanfolyamról

Big Data refers to large amounts of structured and unstructured data that can be analysed using traditional databases and multiple software techniques to reveal patterns that can be used to meet business objectives. Analyses of such large amounts of unstructured data helps in understanding and predicting human behaviour and solving complex business problems. Big Data is huge and consists of complex data sets that traditional data processing software cannot manage.
Big Data analytics is the process of gathering, managing, and analyzing large sets of data (Big Data) to uncover patterns and other useful information.  These patterns are a minefield of information and analysing them provide several insights that can be used by organizations to make business decisions.

Big Data analysis provides several advantages such as:

  • Make correct business decisions
  • Perform efficient operations
  • Higher profits
  • Ensure happy and satisfied customers

Benefits
Big data analytics certification is growing in demand and is most relevant in data science today than in other fields. The field of data analytics is new and there are not enough professionals with the right skills. Hence, the credibility of big data analytics certification promises many growth opportunities for organizations as well as individuals in the booming field of data science.

Individual benefits

  • An individual with Big Data analytics skills can make decisions more effectively
  • An individual with Big Data skills can earn a better salary, good career growth, and more chances of getting hired by top companies

Organizational benefits

  • Big Data allows organizations to understand consumer needs and make informed decisions
  • Big data tools can identify efficient ways of doing business through sentiment analysis
  • Businesses can get ahead of the competition by better understanding market conditions
  • With Big Data Analytics, organizations understand ongoing trends and develop products accordingly.

What you will learn

  • Understand the Fundamentals
    Learn the basics of Apache Hadoop & data ETL, ingestion, and processing with Hadoop tools.
  • Learn Pig framework
    Understand how to join multiple data sets and analyze disparate data with the Pig framework.
  • Understand the Hive framework
    How to organize data into tables, perform transformations, and simplify complex queries with Hive.
  • Perform Real-time analysis
    How to perform real-time interactive analyses on huge data sets stored in HDFS using SQL with Impala.
  • Choose the best tool
    How to pick the best tool in Hadoop, achieve interoperability, and manage repetitive workflows.

Who should attend

  • Data Architects
  • Data Scientists
  • Developers
  • Data Analysts
  • BI Analysts
  • BI Developers
  • SAS Developers
  • Project Managers
  • Mainframe and Analytics Professionals
  • Professionals who want to acquire knowledge on Big Data

We provide the course in English.

 

Tematika

Curriculum

1 Introducing Big Data & Hadoop
Learning Objective:
You will get introduced to real-world problems with Big data and will learn how to solve those problems with state-of-the-art tools. Understand how Hadoop offers solutions to traditional processing with its outstanding features. You will get to Know Hadoop background and different distributions of Hadoop available in the market. Prepare the Unix Box for the training.

Topics:

  • 1.1 Big Data Introduction

What is Big Data
Data Analytics
Big Data Challenges
Technologies supported by big data

  • 1.2 Hadoop Introduction

What is Hadoop?
History of Hadoop
Basic Concepts
Future of Hadoop
The Hadoop Distributed File System
Anatomy of a Hadoop Cluster
Breakthroughs of Hadoop
Hadoop Distributions:
Apache Hadoop
Cloudera Hadoop
Horton Networks Hadoop
MapR Hadoop
Hands On:

Installation of Virtual Machine using VMPlayer on Host Machine. And work with Some basics Unix Commands needs for Hadoop.

2 Hadoop Daemon Processes
Learning Objective:
You will learn what are the different Daemons and their functionality at a high level.

Topics:

  • Name Node
  • Data Node
  • Secondary Name Node
  • Job Tracker
  • Task Tracker

Hands On:

  • Creates a Unix Shell Script to run all the deamons at one time.
  • Starting HDFS and MR separately.

3 HDFS (Hadoop Distributed File System)
Learning Objective:
You will get to know how to Write and Read files in HDFS. Understand how Name Node, Data Node and Secondary Name Node take part in HDFS Architecture. You will also know different ways of Accessing HDFS data.

Topics:

  • Blocks and Input Splits
  • Data Replication
  • Hadoop Rack Awareness
  • Cluster Architecture and Block Placement
  • Accessing HDFS
  • JAVA Approach
  • CLI Approach

Hands On:

  • Writes a shell Script which write and read Files in HDFS. Changes Replication factor at three levels. Use Java for working with HDFS.
  • Writes different HDFS Commands and also Admin Commands.

4 Hadoop Installation Modes and HDFS
Learning Objective:
You will learn different modes of Hadoop, understand Pseudo Mode from scratch and work with Configuration. You will learn functionality of different HDFS operation and Visual Representation of HDFS Read and Write actions with their Daemons Namenode and Data Node.

Topics:

  • Local Mode
  • Pseudo-distributed Mode
  • Fully distributed mode
  • Pseudo Mode installation and configurations
  • HDFS basic file operations

Hands On:

  • Install Virtual Box Manager and install Hadoop in Pseudo distributed mode. Changes the different Configuration files required for Pseudo Distributed mode. Performs different File Operations on HDFS.

5 Hadoop Developer Tasks
Learning Objective:
Understand different Phases in Map Reduce including Map, Shuffling, Sorting and Reduce Phases.Get a deep understanding of Life Cycle of MR in YARN submission. Learn about Distributed Cache concept in detail with examples.
Write Wordcount MR Program and monitor the Job using Job Tracker and YARN Console. Also learn about more use cases.

Topics:

  • Basic API Concepts
  • The Driver Class
  • The Mapper Class
  • The Reducer Class
  • The Combiner Class
  • The Partitioner Class
  • Examining a Sample MapReduce Program with several examples
  • Hadoop's Streaming API

Hands On:

  • Learn about writing MR job from scratch, writing different Logics in Mapper and Reducer and submitting the MR Job in Standalone and Distributed mode.
  • Also learn about writing Word Count MR job, Calculating Average Salary of employee who meets certain conditions and Sales Calculation using MR.

6 Hadoop Ecosystems
6.1 PIG
Learning Objective:
Understand the importance of Pig in Big Data World, PIG architecture and PIG Latin commands for doing different complex operation on Relations, and also Pig UDF and Aggregation functions with piggy bank library. Learn how to pass dynamic arguments to Pig Scripts.

Topics

  • PIG concepts
  • Install and configure PIG on a cluster
  • PIG Vs MapReduce and SQL
  • Write sample PIG Latin scripts
  • Modes of running PIG
  • PIG UDFs.

Hands On:
Login to Pig Grunt shell to issue Pig Latin commands in different Execution modes. Different ways of loading and transformation on Pig relations lazily. Registering UDF in grunt shell and perform Replicated Join Operations

6.2 HIVE
Learning Objective:

Understand importance of Hive in Big Data World. Different ways of configuring HIVE Metastore. Learn different types of tables in hive. Learn how to optimize hive jobs using Partitioning and Bucketing and Passing dynamic Arguments to Hive scripts. You will get an understanding of Joins,UDFS,Views etc.

Topics:

  • Hive concepts
  • Hive architecture
  • Installing and configuring HIVE
  • Managed tables and external tables
  • Joins in HIVE
  • Multiple ways of inserting data in HIVE tables
  • CTAS, views, alter tables
  • User defined functions in HIVE
  • Hive UDF

Hands On:

  • Executes Hive Queries in different Modes. Creates Internal and External tables. Perform Query Optimization by creating tables with Partition and Bucketing Concepts. Run System defined and User Define Functions including Explode and Windows Functions.

6.3 SQOOP
Learning Objectives:

Learn how to import normally and Incrementally data from RDBMS to HDFS and HIVE tables, and also learn how to export the data from HDFS and HIVE table to RDBMS.Learns Architecture of Sqoop Import and Export.

Topics:

  • SQOOP concepts
  • SQOOP architecture
  • Install and configure SQOOP
  • Connecting to RDBMS
  • Internal mechanism of import/export
  • Import data from Oracle/MySQL to HIVE
  • Export data to Oracle/MySQL
  • Other SQOOP commands.

Hands On:
Triggers Shell script to call Sqoop import and Export Commands. Learn to automate Sqoop Incremental imports with entering the last value of the appended Column. Run Sqoop export from HIVE table directly to RDBMS.

6.4 HBASE
Learning Objectives:

Understand different types of NOSQL databases and CAP theorem. Learn different DDL and CRUD operations of HBASE. Understand Hbase Architecture and Zookeeper Importance in managing HBase. Learns Hbase Column Family optimization and client Side Buffering.

Topics:

  • HBASE concepts
  • ZOOKEEPER concepts
  • HBASE and Region server architecture
  • File storage architecture
  • NoSQL vs SQL
  • Defining Schema and basic operations
  • DDLs
  • DMLs
  • HBASE use cases

Hands On:

  • Create HBASE tables using Shell and perform CRUD operations with JAVA API. Change the column family properties and also perform sharding process. Also create tables with multiple splits to improve the performance of HBASE query.

6.5 OOZIE
Learning Objectives:

Understand Oozie Architecture and monitor Oozie Workflow using Oozie. Understand how Coordinator and Bundles work along with Workflow in Oozie. Also learn Oozie Commands to submit, Monitor and Kill the Workflow.

Topics:

  • OOZIE concepts
  • OOZIE architecture
  • Workflow engine
  • Job coordinator
  • Installing and configuring OOZIE
  • HPDL and XML for creating Workflows
  • Nodes in OOZIE
  • Action nodes and Control nodes
  • Accessing OOZIE jobs through CLI, and web console
  • Develop and run sample workflows in OOZIE
  • Run MapReduce programs
  • Run HIVE scripts/jobs.

Hands on:
Create the Workflow to incremental Imports of Sqoop. Create the Workflow for Pig, Hive and Sqoop Exports. And also execute Coordinator to Schedule the Workflows.

6.6 FLUME
Learning Objectives:

Understand Flume Architecture and its components Source, Channel and Sinks. Configure flume with Socket, File Sources and HDFS and Hbase Sink. Understand Fan In and Fan Out Architecture.

Topics:

  • FLUME Concepts
  • FLUME Architecture
  • Installation and configurations
  • Executing FLUME jobs

Hands on:
Create flume Configurations files and configure with Different Source and Sinks.Stream Twitter Data and create hive table.

7 Data Analytics using Pentaho as an ETL tool
Learning Objective:
You will learn Pentaho Big Data Best Practices, Guidelines, and Techniques documents.

Topics:

  • Data Analytics using Pentaho as an ETL tool
  • Big Data Integration with Zero Coding Required

Hands on:
You will use Pentaho as ETL tool for data analytics.

8 Integrations
Learning Objective:
You will see different Integrations among hadoop ecosystem in a Data engineering Flow. Also understand how important it is to create a flow for ETL process.

Topics:

  • MapReduce and HIVE integration
  • MapReduce and HBASE integration
  • Java and HIVE integration
  • HIVE - HBASE Integration

Hands On:
Uses Storage Handlers for integrating HIVE and HBASE. Integrates HIVE and PIG as well.

Kinek ajánljuk

Előfeltételek

Prerequisites

There are no specific prerequisites required to learn Big Data.

Kapcsolódó tanfolyamok



Ajánlja másoknak is!