Monday 17 March 2014

Hadoop Administrator Course Outline


Bigdata/ Hadoop Administrator Course Content for week end training program, we do offer week end / Online / Fast track training programs around Hadoop Administration, Interested pl contact @ 9840014739

Prerequisites - General Administration experience in any Rdbms, Unix or network experience is preferable.

Duration :  5 week-ends

Module 1
Big data Getting Started
What is Big Data?
What  is Apache Hadoop ?
History of Hadoop
Understanding distributed file systems and Hadoop
Hadoop eco system components
Hadoop use cases
Ubuntu Installation
JDK Installation
Module 2
Hadoop Distributed File system
              
Eclipse Installation
Overview of HDFS
Communication Protocols
Hadoop cluster Topology Overview
Setting up SSH for Hadoop Cluster
Running Hadoop –
     Pseudo-distributed mode
Linux basic admin commands
HDFS file commands
Reading and writing to HDFS programmatically
Hands-on Lab Exercises
Module 3
MapReduce Framework

Java Basics
Anatomy of a MapReduce Program
Writables
InputFormat
OutputFormat
Streaming API
Inherent failure handling
Reading and writing
Hands-on Lab Exercises
Module 4
Advanced MapReduce  Programming
Input splits, Record Reader, Mapper, Partition & Shuffle, Reduce, OutputFormat
Writing MapReduce program
Streaming in Hadoop
Counters
Performance Tuning
Joins
Sorting
Determining Optimal number of reducers, partitions
Hadoop cluster – Performance tuning
Hands-on Lab Exercises
Module 5 - Apache Hadoop Administration

Level 1  
Operating System Preparation
      Deployment Setup
      Software
      Hostname, DNS, and Identification
      Users, Groups and Privileges

 Kernel Tuning
     vm.overcommit_memory
     Vm.swappiness

Best Practices for Hadoop setup and infrastructure

Hadoop multi-node cluster Installation preparation & Configuration
   Ø  Cluster network design
   Ø  Installation of Linux operating system
   Ø  Configuring SSH
   Ø  Understanding configuration
        files
   Ø  Understanging Rack topology
        and implementation

Managing Hadoop cluster
   Ø  HDFS cluster management
   Ø  Secondary Name node
        configuration
   Ø  Task Tracker management
   Ø  Configuring the HDFS quota
   Ø  Configuring Fair Scheduler      
   Ø  Upgrading Hadoop     
   Ø  Deploying and managing Hadoop clusters
          with Ambari

Monitoring Hadoop cluster
   Ø  Monitoring Hadoop cluster with
         Ganglia
   Ø  Monitoring Hadoop cluster with
        Ambari
   Ø  Monitoring Hadoop cluster with
        Nagios

Hadoop Cluster Performance Tuning
   Ø  Benchmarking and profiling
   Ø  Using compression for input and
        output
   Ø  Configuring optimal map and
        reduce slots  for the TT
   Ø  Fine tuning Job Tracker config
   Ø  Fine tuning Task Tracker config
   Ø  Tuning Shuffle, merge and sort
        parameters
Security Implementation
    Kerberos security Implementation
   Workflow Scheduler
    FIFO Scheduler Configuration
    Capacity Scheduler Configuration
    Fair Scheduler  Configuration

understanding dfsadmin & mradmin commands

Administration of Hcatalog and Hive

Backup and Recovery
-           
Level  2  Cluster maintenance
Starting and stopping Processes with Init Scripts
Starting and Stopping processes manually

  HDFS maintenance Tasks
-           Data node failure & Recovery
-          Name Node Failure & Recovery
-          JT & TT failure  & Recovery
-          Removing data nodes
-          Adding Data nodes
-           Commissioning and decommissioning of nodes
  Map Reduce  maintenance Tasks
-          Shared upon registration
Level 3  Monitoring
Hadoop Metrics

Health-check
        Hadoop Processes
     Rest of them shared upon request
Level 4 Backup and Recovery
Data Backup
 Name Node backup


Module  6
Pig and Pig Latin
Installation and configuration
Running Pig Lating through grunt
Working with Scripts
Lab Exercises
Module  7
HBase and ZooKeeper
NoSQL Vs SQL
Cap  Theorem
Architecture
Installation
Configuration
Java API
Performance Tuning
Lab Exercises
Module  8
Hive
Features of Hive
Architecture
Installation and configuration
HiveQL
HCatalog & Hive Administration 
Lab Exercises
Module  9
Other Hadoop eco system components
Overview of Ambari, Oozie ,Mahout
Installing & configuring Sqoop, mysql-server
Installing & configuring flume

Lab Exercises
Module 10
Hadoop on Cloud
Hadoop Certification
Hosting Hadoop on Amazon EC2
EMR Hands-on
Certification exam oriented tips specific to Hadoop distributions

No comments:

Post a Comment