Hadoop Administrator Course Outline
Bigdata/ Hadoop Administrator Course Content for week end training program, we do offer week end / Online / Fast track training programs around Hadoop Administration, Interested pl contact @ 9840014739
Prerequisites - General Administration experience in any Rdbms, Unix or network experience is preferable.
Duration : 5 week-ends
Module 1
Big data Getting Started
|
What is Big Data?
What is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
|
Module 2
Hadoop Distributed File system
|
Eclipse Installation
Overview of HDFS
Communication Protocols
Hadoop cluster Topology Overview
Setting up SSH for Hadoop Cluster
Running Hadoop –
Pseudo-distributed mode
Linux basic admin commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
|
Module 3
MapReduce Framework
|
Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
|
Module 4
Advanced MapReduce Programming
|
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
|
Module 5 - Apache Hadoop Administration
| |
Level 1
|
Operating System Preparation
Deployment Setup
Software
Hostname, DNS, and Identification
Users, Groups and Privileges
Kernel Tuning
vm.overcommit_memory
Vm.swappiness
Best Practices for Hadoop setup and infrastructure
Hadoop multi-node cluster Installation preparation & Configuration
Ø Cluster network design
Ø Installation of Linux operating system
Ø Configuring SSH
Ø Understanding configuration files
Ø Understanging Rack topology
and implementation
Managing Hadoop cluster
Ø HDFS cluster management
Ø Secondary Name node
configuration
Ø Task Tracker management
Ø Configuring the HDFS quota
Ø Configuring Fair Scheduler
Ø Upgrading Hadoop
Ø Deploying and managing Hadoop clusters
with Ambari
Monitoring Hadoop cluster
Ø Monitoring Hadoop cluster with
Ganglia
Ø Monitoring Hadoop cluster with
Ambari
Ø Monitoring Hadoop cluster with
Nagios
Hadoop Cluster Performance Tuning
Ø Benchmarking and profiling
Ø Using compression for input and
output
Ø Configuring optimal map and
reduce slots for the TT
Ø Fine tuning Job Tracker config
Ø Fine tuning Task Tracker config
Ø Tuning Shuffle, merge and sort
parameters
Security Implementation
Kerberos security Implementation
Workflow Scheduler
FIFO Scheduler Configuration
Capacity Scheduler Configuration
Fair Scheduler Configuration
understanding dfsadmin & mradmin commands
Administration of Hcatalog and Hive
Backup and Recovery
-
|
Level 2 Cluster maintenance
|
Starting and stopping Processes with Init Scripts
Starting and Stopping processes manually
HDFS maintenance Tasks
- Data node failure & Recovery
- Name Node Failure & Recovery
- JT & TT failure & Recovery
- Removing data nodes
- Adding Data nodes
- Commissioning and decommissioning of nodes
Map Reduce maintenance Tasks
- Shared upon registration
|
Level 3 Monitoring
|
Hadoop Metrics
Health-check
Hadoop Processes
Rest of them shared upon request
|
Level 4 Backup and Recovery
|
Data Backup
Name Node backup
|
Module 6
Pig and Pig Latin
|
Installation and configuration
Running Pig Lating through grunt
Working with Scripts
Lab Exercises
|
Module 7
HBase and ZooKeeper
|
NoSQL Vs SQL
Cap Theorem
Architecture
Installation
Configuration
Java API
Performance Tuning
Lab Exercises
|
Module 8
Hive
|
Features of Hive
Architecture
Installation and configuration
HiveQL
HCatalog & Hive Administration
Lab Exercises
|
Module 9
Other Hadoop eco system components
|
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume
Lab Exercises
|
Module 10
Hadoop on Cloud
Hadoop Certification |
Hosting Hadoop on Amazon EC2
EMR Hands-on
Certification exam oriented tips specific to Hadoop distributions |
No comments:
Post a Comment