Hadoop Trainings

Hadoop Training – Administrator

This training session will cover the system administration aspects of Hadoop from installation and configuration to load balancing and tuning, to diagnosing and solving problems in your deployment.

Introduction
About This Course
About Z Data
Course logistics/administration

An Introduction To Hadoop And HDFS
Why Hadoop?
HDFS
MapReduce
Hive, Pig, HBase and other sub-projects

Planning Your Hadoop Cluster
General Planning Considerations
Choosing The Right Hardware
Node Topologies
Choosing The Right Software

Deploying Your Cluster
Installing Hadoop
Typical Configuration Parameters
Hands-On Exercise: Install a pseudo-distributed Hadoop Cluster

Cluster Maintenance
Starting and stopping MapReduce jobs
Hands-On Exercise: Using the JobTracker UI to start and kill jobs
Checking HDFS with fsck
Copying data with distcp
Rebalancing cluster nodes
Demo
Adding and removing cluster nodes
Backup And Restore
Upgrading and Migrating

Scheduling Jobs
The FIFO Scheduler
The Fair Scheduler
Hands-On Exercise: Using Fair Scheduler

Cluster Monitoring and Troubleshooting
General system profiling
Using the NameNode UI to inspect the filesystem
Monitoring with Ganglia
Demo
Other monitoring tools
Hadoop Log Files
Benchmarking Your Cluster
Typical problems
Useful alerts
Dealing with a corrupt NameNode

Installing And Managing Other Hadoop Projects
Hive
HBase
Pig

Populating HDFS From Databases Using Sqoop
What is Sqoop?
Sqoop command-line options
Hands-On Exercise: Importing data from MySQL

Hadoop Developer

Augmenting Existing Systems with Hadoop
Hadoop rarely replaces existing infrastructure, but rather enables you to do more with your data by providing a scalable batch processing system. This lecture helps you understand how it all fits together.

Best Practices for Data Processing Pipelines
In order for Hadoop to crunch large volumes of data, first you’ll need to get that data into Hadoop. This lecture will help you understand how to import different types of data from various sources into Hadoop for further analysis.

Debugging MapReduce Programs
Debugging in the distributed environment is challenging. This lecture will expose you to best practices for program design to mitigate debugging challenges, as well as local testing tools and techniques for debugging at scale.

Advanced Hadoop API
This lecture probes into the API, covering custom data types and file formats, direct HDFS access, intermediate data partitioning, and other tools such as the DistributedCache.

Optimizing MapReduce Programs
We’ll use the Cloudera Training VM to work through an example where you write a MapReduce program and improve its performance using techniques explored earlier

zData-Chat

Contact us for further information