The Hadoop Administration course is structured in such a way that anyone who works with data and planning to start (or has already started) working on BIG Data using Apache Hadoop framework can benefit and be productive after this training.

Duration: 6 weeks/3 days


Experience with databases, application servers, and programming.

Course Summary:

We will understand the benefits of distributed computing, the Hadoop architecture (including HDFS and Mapreduce). The trainee will define administrator participation in Big Data projects. Learn to plan, implement, and maintain Hadoop clusters, deploy and maintain additional Big Data tools (Pig, Hive, Flume etc), HBase on a Hadoop cluster. Lastly monitor and maintain hundreds of servers and learn to pinpoint performance bottlenecks and fix them.

Unit 1: Introduction

  • Hadoop history and concepts

  • Ecosystem

  • Distributions

  • High level architecture

  • Hadoop myths

  • Hadoop challenges (hardware / software)

  • Planning and installation

  • Selecting software and Hadoop distributions

  • Sizing the cluster and planning for growth

  • Selecting hardware and network

  • Rack topology

  • Installation

  • Multi-tenancy

  • Directory structure and logs

  • Benchmarking


Unit 2: HDFS operations

  • Concepts (horizontal scaling, replication, data locality, rack awareness)

  • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)

  • Health monitoring

  • Command-line and browser-based administration

  • Adding storage and replacing defective drives


Unit 3: MapReduce operations

  • Parallel computing before MapReduce: compare HPC versus Hadoop administration

  • MapReduce cluster loads

  • Nodes and Daemons (JobTracker, TaskTracker)

  • MapReduce UI walk through

  • MapReduce configuration

  • Job config

  • Job schedulers

  • Administrator view of MapReduce best practices

  • Optimizing MapReduce

  • Fool proofing MR: what to tell your programmers

  • YARN: architecture  and uses

Unit 4: Advanced topics

  • Hardware monitoring

  • System software monitoring

  • Hadoop cluster monitoring

  • Adding and removing servers and upgrading Hadoop

  • Backup, recovery, and business continuity planning

  • Cluster configuration tweaks

  • Hardware maintenance schedule

  • Oozie scheduling for administrators

  • Securing your cluster with Kerberos

  • The future of Hadoop





  • Facebook - Black Circle
  • Twitter - Black Circle
  • YouTube - Black Circle
  • Google+ - Black Circle
  • Instagram - Black Circle


Phone : 408-505-5499

©2016 by Siliconvalley4u  Privacy