A tanfolyamról
Businesses around the world are looking for ways to leverage data for business continuity. Apache Pig was developed to run queries on large data sets stored in HDFS and that runs on Hadoop. It’s best known for its simplistic syntax and ability to save time and so is widely used.
This training will introduce you to the world of Hadoop and MapReduce. You’ll learn through a series of practical hands-on exercises on writing complex MapReduce transformations about HDFS and writing scripts using the advanced features of Pig.
Who Should Attend This Training
- Analytics Professionals
- BI /ETL/DW Professionals
- Project Managers
- Testing Professionals
- Mainframe Professionals
- Software Developers and Architects
- Graduates aiming to build a career in Big Data and Hadoop
What You Will Learn
-
Hadoop Ecosystem
Get introduced to the world of Hadoop. Master the key concepts of Hadoop ecosystem and architecture. -
Analyse Data Sets
Analyse large sets of data in a short time by using Pig Latin scripts. Use MapReduce for data processing. -
Big Data Analytics
Discover the different advantages of Pig and learn how to leverage Pig efficiently for Big Data analytics. -
Implement Pig
Expert-led training to guide learners to efficiently implement the Pig technology for future projects. -
Data Flows with Pig
Ace key Pig configurations and understand Pig use cases to execute data flows with the Pig technology. -
Advanced Pig
Gain a complete understanding of advanced concepts like Pig Latin relational operators, Pig UDF, and more.
We provide the course in English.
Tematika
Curriculum
Module 1: The Hadoop Ecosystem
- Hadoop Overview
- Surveying the Hadoop components
- Defining the Hadoop Architecture
Module 2: Exploring HDFS and MapReduce
- Storing data in HDFS
- Achieving reliable and secure storage
- Monitoring storage metrics
- Controlling HDFS from the Command Line
- Parallel processing with MapReduce
- Detailing the MapReduce approach
- Transferring algorithms not data
- Dissecting the key stages of a MapReduce job
- Automating data transfer
- Facilitating data Ingress and Egress
- Aggregating data with Flume
- Configuring data fan in and fan-out
- Moving relational data with Sqoop
Module 3: Executing Data Flows with Pig
- Describing characteristics of Apache Pig
- Contrasting Pig with MapReduce
- Identifying Pig use cases
- Pinpointing key Pig configurations
Module 4: Advanced Pig
- Pig Latin: Relational Operators
- File Loaders
- Group Operator
- CO GROUP Operator
- Joins and CO GROUP
- Union, Diagnostic Operators
- Pig UDF
- Structuring unstructured data
- Representing data in Pig's data model
- Running Pig Latin commands at the Grunt Shell
- Expressing transformations in Pig Latin Syntax
- Invoking Load and Store functions
Module 5: Performing ETL with Pig
- Transforming data with Relational Operators
- Creating new relations with joins
- Reducing data size by sampling
- Extending Pig with user–defined functions
- Filtering data with Pig
- Consolidating data sets with unions
- Partitioning data sets with splits
- Injecting parameters into Pig scripts
Kinek ajánljuk
Előfeltételek
Prerequisites
There are no prerequisites to attend this course.