Vous êtes sur la page 1sur 5

Cloudera Administrator Training for

Apache Hadoop
Learn to deploy, configure, and manage Cloudera's Apache Hadoop
implementation and HDFS.
In this interactive, hands-on Apache Hadoop course, you will gain a comprehensive
understanding of all the steps necessary to operate and maintain a Hadoop cluster. Covering
topics from installation and configuration through load balancing and tuning, this course is the
best preparation for the real-world challenges faced by Hadoop administrators.
This course covers concepts addressed on the Cloudera Certified Administrator for Apache
Hadoop (CCAH) exam and includes a CCAH exam voucher you'll receive at the end of class.

What You'll Learn

The internals of MapReduce and HDFS and how to build Hadoop architecture

Proper cluster configuration and deployment to integrate with systems and hardware in
the data center

How to load data into the cluster from dynamically generated files using Flume and
from RDBMS using Sqoop

Configuring the FairScheduler to provide service-level agreements for multiple users


of a cluster

Installing and implementing Kerberos-based security for your cluster

Best practices for preparing and maintaining Apache Hadoop in production

Troubleshooting, diagnosing, tuning, and solving Hadoop issues

Who Needs to Attend


System administrators and others responsible for managing Apache Hadoop clusters in
production or development environments

Prerequisites
This course is designed for system administrators and IT managers who have basic Linux
systems administration experience. Prior knowledge of Hadoop is not required.

Follow-On Courses

There are no follow-ons for this course.

Certification Programs and Certificate Tracks


This course is part of the following programs or tracks:

CCAH: Cloudera Certified Administrator for Apache Hadoop (CDH4)

Course Outline
1. The Case for Apache Hadoop

Why Hadoop?

A Brief History of Hadoop

Core Hadoop Components

Fundamental Concepts

2. HDFS

HDFS Features

Writing and Reading Files

NameNode Considerations

HDFS Security

Using the NameNode Web UI

Using the Hadoop File Shell

3. Getting Data into HDFS

Ingesting Data from External Sources with Flume

Ingesting Data from Relational Databases with Sqoop

REST Interfaces

Best Practices for Importing Data

4. MapReduce

Features of MapReduce

Basic Concepts

Architectural Overview

MapReduce Version 2

Failure Recovery

Using the JobTracker Web UI

5. Planning Your Hadoop Cluster

General Planning Considerations

Choosing the Right Hardware

Network Considerations

Configuring Nodes

Planning for Cluster Management

6. Hadoop Installation and Initial Configuration

Deployment Types

Installing Hadoop

Specifying the Hadoop Configuration

Performing Initial HDFS Configuration

Performing Initial MapReduce Configuration

Log File Locations

7. Installing and Configuring Hive, Impala, and Pig

Hive

Impala

Pig

8. Hadoop Clients

Installing and Configuring Hadoop Clients

Installing and Configuring Hue

Hue Authentication and Configuration

9. Cloudera Manager

The Motivation for Cloudera Manager

Cloudera Manager Features

Standard and Enterprise Versions

Cloudera Manager Topology

Installing Cloudera Manager

Installing Hadoop Using Cloudera Manager

Performing Basic Administration Tasks

Using Cloudera Manager

10. Advanced Cluster Configuration

Advanced Configuration Parameters

Configuring Hadoop Ports

Explicitly Including and Excluding Hosts

Configuring HDFS for Rack Awareness

Configuring HDFS High Availability

11. Hadoop Security

Why Hadoop Security Is Important

Hadoop's Security System Concepts

What Kerberos Is and How it Works

Securing a Hadoop Cluster with Kerberos

12. Managing and Scheduling Jobs

Managing Running Jobs

Scheduling Hadoop Jobs

Configuring the FairScheduler

13. Cluster Maintenance

Checking HDFS Status

Copying Data Between Clusters

Adding and Removing Cluster Nodes

Rebalancing the Cluster

NameNode Metadata Backup

Cluster Upgrading

14. Cluster Monitoring and Troubleshooting

General System Monitoring

Managing Hadoop's Log Files

Monitoring Hadoop Clusters

Common Troubleshooting Issues

15. Conclusion

Labs
Throughout the course, you'll participate in hands-on labs to help build your knowledge and
apply the concepts discussed.

Vous aimerez peut-être aussi