HadoopAdministration

The Hadoop Administration course is structured in such a way that anyone who works with data and planning to start (or has already started) working on BIG Data using Apache Hadoop framework can benefit and be productive after this training.

Duration: 6 weeks/3 days

Prerequisites:

Experience with databases, application servers, and programming.

Course Summary:

We will understand the benefits of distributed computing, the Hadoop architecture (including HDFS and Mapreduce). The trainee will define administrator participation in Big Data projects. Learn to plan, implement, and maintain Hadoop clusters, deploy and maintain additional Big Data tools (Pig, Hive, Flume etc), HBase on a Hadoop cluster. Lastly monitor and maintain hundreds of servers and learn to pinpoint performance bottlenecks and fix them.

Unit 1: Introduction

  • Hadoop history and concepts

  • Ecosystem

  • Distributions

  • High level architecture

  • Hadoop myths

  • Hadoop challenges (hardware / software)

  • Planning and installation

  • Selecting software and Hadoop distributions

  • Sizing the cluster and planning for growth

  • Selecting hardware and network

  • Rack topology

  • Installation

  • Multi-tenancy

  • Directory structure and logs

  • Benchmarking

 

Unit 2: HDFS operations

  • Concepts (horizontal scaling, replication, data locality, rack awareness)

  • Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)

  • Health monitoring

  • Command-line and browser-based administration

  • Adding storage and replacing defective drives

 

Unit 3: MapReduce operations

  • Parallel computing before MapReduce: compare HPC versus Hadoop administration

  • MapReduce cluster loads

  • Nodes and Daemons (JobTracker, TaskTracker)

  • MapReduce UI walk through

  • MapReduce configuration

  • Job config

  • Job schedulers

  • Administrator view of MapReduce best practices

  • Optimizing MapReduce

  • Fool proofing MR: what to tell your programmers

  • YARN: architecture  and uses

Unit 4: Advanced topics

  • Hardware monitoring

  • System software monitoring

  • Hadoop cluster monitoring

  • Adding and removing servers and upgrading Hadoop

  • Backup, recovery, and business continuity planning

  • Cluster configuration tweaks

  • Hardware maintenance schedule

  • Oozie scheduling for administrators

  • Securing your cluster with Kerberos

  • The future of Hadoop

  • Facebook - Black Circle
  • Twitter - Black Circle
  • YouTube - Black Circle
  • Google+ - Black Circle
  • Instagram - Black Circle

(408)505-5499

2603 Camino Ramon, Ste 200 San Ramon CA United States 94583

©2016 by Siliconvalley4u  Privacy