A tanfolyamról
Apache Hadoop™ is an effective and dynamic data platform that simplifies and allows for the distributed processing of large data sets across clusters of computers and servers. Hadoop is the perfect choice for organizations that have to deal with the challenges involved in handling vast amounts of structured and unstructured data. The Hadoop framework is used for analyzing data and helping them to make informed business decisions that are based on the insights gleaned from the data.
This ever-increasing data and the need to analyse it for favourable business outcomes has in turn increased the demand for professionals skilled in Hadoop and data analysis. A Hadoop Administrator’s primary responsibility is to manage the deployment and maintenance of Hadoop clusters. In other words, a Hadoop admin ensures smooth operation of Hadoop clusters, problem mitigation, safety and improved performance.
A training in Hadoop Administration will help prepare you for the demands of the industry. New innovations in technology have made it mandatory for IT professionals to be on par with the latest developments. A Hadoop Administrator training will ensure that there is no skill gap between what you know and what the industry wants, thus making you a valuable employee. Furthermore, the demand for data analysts has seen a meteoric rise in the past few years, thus making certified Hadoop Administrators a niche resource.
Individual benefits
- Become an in-demand resource who can manage Hadoop clusters and help the organization with big data analysis.
- Exposure to multiple industries like Healthcare, consumer, banking, energy, manufacturing, etc.
Organizational benefits
- Efficient and reliable professionals will help to manage large Hadoop clusters in the organization
- They take care of the day-to-day workings of Hadoop clusters and ensure data safety
- Excellent knowledge of Hadoop architecture helps to plan Hadoop clusters in an organized way
- Hadoop administrators make the work safe and secure
- Hadoop Administrators can help in the maintenance of highly scalable storage platforms
What you will learn
-
Build Powerful Applications
Understand how to use Apache Hadoop™ software to build powerful applications to analyze Big Data. -
Hadoop Distributed File System (HDFS)
Learn about Hadoop Distributed File System (HDFS) and its role in web-scale big data analytics. -
Cluster Management in Hadoop
Let’s see what is cluster management in Hadoop and how to set up, manage and monitor Hadoop cluster. -
Apache Hive installation
Know the basics of Apache Hive, how to install Hive, run HiveQL queries to create tables, & so on. -
Running scripts
Learn more on Apache Sqoop, how to run scripts to transfer data between Hadoop & relational databases. -
Apache HBase
Know the basics of Apache HBase, how to perform real-time read/write access to your Big Data.
Who should attend
- DevOps Engineers
- Architects
- Project Managers
- Linux / Unix Administrators
- Database Administrators
- Windows Administrators
- Infrastructure Administrators
- System Administrators
- Analytics Professionals
- Senior IT professionals
- Data Management Professionals
- Testing and Mainframe professionals
- Business Intelligence Professionals
- Anyone who wants to build the carrier in the Distributed World of Big Data
We provide the course in English.
Tematika
Curriculum
1 Introduction to Big Data and Hadoop
Learning Objective :
Understanding what is Big Data and its solution for traditional Problems. You will learn about Hadoop and its core components and you will know how to read and write happens in HDFS. You will also know the roles and responsibilities of a Hadoop Administrator.
Topics:
- Introduction to big data
- Limitations of existing solutions
- Common Big Data domain scenarios
- Hadoop Architecture
- Hadoop Components and Ecosystem
- Data loading & Reading from HDFS
- Replication Rules
- Rack Awareness theory
- Hadoop cluster Administrator: Roles and Responsibilities.
Hands-on:
Writing and Reading the Data from hdfs, how to submit the job in Hadoop 1.0 and YARN.
2 Hadoop Cluster and its Architecture
Learning Objectives:
Understanding different Configuration files and building Hadoop Multi Node Cluster. Differences in Hadoop 1.0 and Hadoop 2.0. You will also get to know the architecture of Hadoop 1.0 and Hadoop2.0(YARN).
Topics:
- Working of HDFS and its internals
- Hadoop Server roles and their usage
- Hadoop Installation and Initial configuration
- Different Modes of Hadoop Cluster.
- Deploying Hadoop in a Pseudo-distributed mode
- Deploying a Multi-node Hadoop cluster
- Installing Hadoop Clients
- Understanding the working of HDFS and resolving simulated problems.
- Hadoop 1 and its Core Components.
- Hadoop 2 and its Core Components.
Hands-on:
Creating Pseudo and Fully Distributed Hadoop Cluster. Changing different configuration Properties while submitting the Jobs and different hdfs admin commands.
3 Hadoop Cluster Administration and Understanding Different Processing Frameworks on Hadoop
Learning Objectives:
Understanding the various properties of Namenode, Data node, and Secondary Namenode. You will also learn how to add and decommission the data node to the cluster. You will also learn Various Processing frameworks in Hadoop and its Architecture in the context of Hadoop administrator and schedulers.
Topics:
- Properties of NameNode, DataNode and Secondary Namenode
- OS Tuning for Hadoop Performance
- Understanding Secondary Namenode
- Log Files in Hadoop
- Working with Hadoop distributed cluster
- Decommissioning or commissioning of nodes
- Different Processing Frameworks
- Understanding MapReduce
- Spark and its Features
- Application Workflow in YARN
- YARN Metrics
- YARN Capacity Scheduler and Fair Scheduler
- Understanding Schedulers and enabling them.
Hands-on:
Changing the configuration files of Secondary Namenode. Add and remove the data nodes in a Distributed Cluster. And also Changes Schedulers in run time while submitting the jobs to YARN.
4 Hadoop Cluster Administration and Maintenance
Learning Objectives:
You will learn regular Cluster Administration tasks like balancing data in the cluster, protecting data by enabling trash, attempting a manual failover, creating backup within or across clusters
Topics:
- Namenode Federation in Hadoop
- HDFS Balancer
- High Availability in Hadoop
- Enabling Trash Functionality
- Checkpointing in Hadoop
- DistCP and Disk Balancer.
Hands-on:
Works with Cluster Administration and Maintenance tasks. Runs DistCP and HDFS Balancer Commands to get even distribution of the data.
5 Backup, Recovery, And Maintenance
Learning Objectives:
You will learn how to take Backup and recovery of data in master and slaves. You will also learn about allocating Quota to the master and slaves files.
Topics:
- Key Admin commands like DFSADMIN
- Safemode
- Importing Check Point
- MetaSave command
- Data backup and recovery
- Backup vs Disaster recovery
- Namespace count quota or space quota
- Manual failover or metadata recovery.
Hands-on:
Do regular backup using MetaSave commands. You will also run commands to do data Recovery using Checkpoints.
6 Hadoop 2.0 Cluster: Planning and Management
Learning Objective:
You will understand about Cluster Planning and Managing, what are the aspects you need to think about when planning a setup of a new cluster.
Topics:
- Planning a Hadoop 2.0 cluster
- Cluster sizing
- Hardware
- Network and Software considerations
- Popular Hadoop distributions
- Workload and usage patterns
- Industry recommendations.
Hands-on:
Setting up a new Cluster and scaling Dynamically. Login to different Hadoop distributions online.
7 Hadoop Security and Cluster Monitoring
Learning Objectives:
You will get to know about the Hadoop cluster monitoring and security concepts. You will also learn how to secure a Hadoop cluster with Kerberos.
Topics:
- Monitoring Hadoop Clusters
- Authentication & Authorization
- Nagios and Ganglia
- Hadoop Security System Concepts
- Securing a Hadoop Cluster With Kerberos
- Common Misconfigurations
- Overview on Kerberos
- Checking log files to understand Hadoop clusters for troubleshooting.
Hands-on:
Monitor the cluster and also authorization of Hadoop resource by granting tickets using Kerberos.
8 Hadoop 2.0 With High Availability And Upgrading
Learning Objectives:
You will learn how to configure Hadoop2 with high availability and upgrading. You will also learn how to work with the Hadoop ecosystem.
Topics:
- Configuring Hadoop 2 with high availability
- Upgrading to Hadoop 2
- Working with Sqoop
- Understanding Oozie
- Working with Hive.
- Working with Pig.
Hands-on:
Login to the Hive and Pig shell with their respective commands. You will also schedule OOZIE Job.
9 Cloudera Manager And Cluster Setup
Learning Objectives:
You will see how to work with CDH and its administration tool Cloudera Manager. You will also learn ecosystem administration and its optimization.
Topics:
- Cloudera Manager and cluster setup
- Hive administration
- HBase architecture
- HBase setup
- Hadoop/Hive/Hbase performance optimization.
- Pig setup and working with a grunt.
Hands-on:
Install CDH and works with Cloudera Manager. Install new parcel in CDH machine.
Kinek ajánljuk
Előfeltételek
Prerequisites
There are no specific prerequisites for the Hadoop Administration Training, but a basic knowledge of Linux command-line interface will be beneficial.