Intel White Paper Optimizing Hadoop Deployments

WHITE PAPER
Cloud Computing
Optimizing Hadoop* Deployments

Designing the solution stack to maximize productivity while limiting energy consumption
and total cost of ownership
Tuning Hadoop* clusters is vital to EXECUTIVE SUMMARY

This paper provides guidance to organizations as they make key choices in the planning
optimize the utilization of system
stages of Hadoop* deployments, based on extensive lab testing that has been
resources and minimize operating conducted with Hadoop at Intel. It begins with best practices for establishing server
hardware specifications, helping architects choose optimal combinations of components.
costs. Testing in Intel labs has
Next, it discusses the server software environment, including the choice of the
established a number of best operating system (OS) and version of Hadoop. Finally, it introduces some configuration
and tuning advice that can help improve results within Hadoop environments.
practices to help meet those goals.
1 Overview • Server hardware. The planning topic

Having moved beyond its origins in search discussed most extensively here, this set
and Web indexing, Hadoop is becoming of recommendations concerns choosing
increasingly attractive to a broad server components for an optimal
spectrum of organizations as a framework balance between performance and both
for running computational workloads initial and recurring costs.
across large clusters of servers. Because • System software. In addition to the
these deployments can have very large choice of OS and Java* Virtual Machine
infrastructure requirements, hardware (JVM*), the specific version of Hadoop
and software choices made at design and other software components have
time have significant impact in terms of implications for performance, stability,
performance and total cost of ownership. and other factors.
Intel is a major contributor to open • Configuration and tuning. Not
source initiatives such as Linux*, surprisingly, the settings made to the
Apache*, and Xen*. Intel has also devoted Hadoop environment itself are an
resources to Hadoop analysis and testing, important factor in getting the full
both internally and with fellow travelers, benefit of the rest of the hardware and
such as the work with HP and Yahoo! software solution stack.
as part of Open Cirrus*. Through these
technical efforts, Intel has observed It is important to note that Hadoop
many practical trade-offs in hardware, deployments will vary considerably from
software, and system settings that have customer to customer and from project to
October 2009 real-world impacts. This paper discusses project. The suggestions for optimization in
some of those optimizations, which fall this paper are meant to be widely relevant
Version 1.0 to Hadoop, but results may be quite
into three general categories:
different depending on actual workloads.
Optimizing Hadoop Deployments
Table of Contents 2 General Hadoop Cluster Topology 3 Server Hardware Configurations

A typical Hadoop cluster consists of a One of the most important decisions
1 Overview. . . . . . . . . . . . . . . . . . . . . . . 1
two- or three-level architecture made in planning a Hadoop infrastructure
2 General Hadoop Cluster Topology . . . 2 up of rack-mounted servers. Each rack deployment is the number, type, and
3 Server Hardware Configurations. . 2 of servers is interconnected using a 1 configuration of the servers specified.
Gigabit Ethernet (1GbE) switch. Each
3.1 Choosing a Server Platform. . . . . 2 As with other workloads, depending
rack-level switch is connected to a
on the specific Hadoop application,
3.2 Selecting and Configuring cluster-level switch, which is typically a
computation may be bound by I/O,
the Hard Drive. . . . . . . . . . . . . . . . . 2 larger port-density 10 Gigabit Ethernet
memory, or processor resources. Those
3.3 Memory Sizing. . . . . . . . . . . . . . . . . 3 (10GbE) switch that may span hundreds or
requirements will require the system-level
thousands of servers. Those cluster-level
3.4 Selecting a Motherboard . . . . . . . 3 hardware to be adjusted on a case-by-
switches may also interconnect with other
case basis, but the general guidelines
3.5 Specifying the Power Supply. . . 3 cluster-level switches, or even uplink to
suggested in this section provide a point
another level of switching infrastructure.
3.6 Choosing Processors. . . . . . . . . . . 4 of departure for that fine-tuning.
4 System Software Selection Servers in a Hadoop cluster can be
categorized in the following special- 3.1 Choosing a Server Platform
and Configuration. . . . . . . . . . . . . . . 4
purpose capacities: Typically, dual-socket servers are optimal
4.1 Selecting the Operating for Hadoop deployments. Servers of this
System and JVM. . . . . . . . . . . . . . . . 4 • JobTracker. Performs task assignment
type are generally more efficient than
4.2 Choosing Hadoop Versions • NameNode. Maintains all file system large-scale multi-processor platforms for
and Distributions. . . . . . . . . . . . . . . 5 metadata if the Hadoop Distributed File massively distributed implementations
System (HDFS) is used; preferably (but such as Hadoop, from a per-node, cost-
5 Hadoop Configurations
not required to be) a separate physical benefit perspective. Similarly, dual-socket
and Tuning. . . . . . . . . . . . . . . . . . . . . 5
server from JobTracker servers more than offset the added per-
5.1 General Configurations. . . . . . . . . 5 node hardware cost relative to entry-level
• Secondary NameNode. Periodically
5.2 HDFS-Specific Configurations. . . 5 servers through superior efficiencies in
check-points the file system metadata on
terms of load-balancing and parallelization
5.3 Map/Reduce-Specific the NameNode
overheads. Choosing hardware based on
Configurations. . . . . . . . . . . . . . . . . 6
• TaskTrackers. Perform map/reduce tasks the most current platform technologies
6 Conclusion. . . . . . . . . . . . . . . . . . . . . . 6 available helps to ensure the optimal intra-
• DataNodes. Store HDFS files and handle
server throughput and energy efficiency.
HDFS read/write requests; preferably
co-located with TaskTrackers for optimal 3.2 Selecting and Configuring
data locality the Hard Drive
The bulk of Hadoop servers will be A relatively large number of hard drives
configured in TaskTrackers and DataNodes, per server (typically four to six) is
and these are considered “slave nodes.” recommended. While it is possible to use
RAID 0 to concatenate smaller drives
For JobTracker and NameNodes, it is into a larger pool, using RAID on Hadoop
important to consider deploying additional servers is generally not recommended
RAM and secondary power supplies, to because Hadoop itself orchestrates data
ensure the highest performance and provisioning and redundancy across
reliability of these critical servers in the individual nodes. This approach provides
cluster. Given Hadoop’s data distribution good results across a wide spectrum
model, however, it may not make sense of workloads because of the way that
to deploy power redundancy on the Hadoop interacts with storage.
slave nodes.
2
Energy Consumption Comparison Between Server Motherboards
Intel® Server Board S5500WB

0% (Active Idle)
74
32W savings Intel Server Board S5520UR
106
Load (%)
up to 30% wall
power savings!
248
100%
42W savings
290
based on Intel Xeon processors with

those characteristics; they are typically
0 50 100 150 200 250 300 350 marketed to cloud computing or Internet
Wall Power (Watts) data center providers. One such product is
the Intel® Server Board S5500WB (formerly
Figure 1. A power-optimized server motherboard such as the Intel® Server Board code-named Willowbrook), which has been
S5500WB can deliver significant energy savings, relative to a typical enterprise specifically designed for high-density
configuration such as one based on Intel Server Board S5520UR.1 computing environments.
Selecting a server with the right

The optimal balance between cost and based on the Intel® Xeon® processor motherboard can have a positive financial
performance is generally achieved with 5500 series, it is recommended that impact to the bottom line compared to
7200 RPM SATA drives. This is likely to DIMMs (dual in-line memory modules) be using other enterprise-focused systems
evolve quickly with the evolution of drive populated in multiples of six to balance that lack similar optimizations. As shown
technologies, but it is a useful rule of thumb across available memory channels (that in Figure 1, lab testing demonstrates
at the time of this writing. Hard drives is, system configurations of 12 GB, 24 GB, that the Intel Server Board S5500WB
should run in the AHCI (Advanced Host and so on). As a final consideration, ECC consumes up to 30 percent less power
Controller Interface) mode with NCQ (Native (error-correcting code) memory is highly than a similar but non-power-optimized
Command Queuing) enabled, to improve recommended for Hadoop, to detect and Intel Server Board S5520UR.1
performance when multiple simultaneous correct errors introduced during storage
read/write requests are outstanding. and transmission of data. 3.5 Specifying the Power Supply
As a key means of reducing overall cost of
3.3 Memory Sizing 3.4 Selecting a Motherboard ownership, organizations should specify,
Sufficient memory capacity is critical for To maximize the energy efficiency as part of the design and planning process,
efficient operation of servers in a Hadoop and performance of a Hadoop cluster, their energy-efficiency requirements for
cluster, supporting high throughput by it is important to select the server server power supplies. Power supplies
allowing large numbers of map/reduce motherboard carefully. Hadoop certified by the 80 PLUS* Program
tasks to be carried out simultaneously. deployments do not require many of the (www.80plus.org) at various levels,
Typical Hadoop applications require features typically found in an enterprise including bronze, silver, and gold (with
approximately 1–2 GB of RAM per processor data center server, and the motherboard gold being the most efficient), provide
core, which corresponds to 8–16 GB for selected should use high efficiency voltage organizations with objective standards to
a dual-socket server using quad-core regulators and be optimized for airflow. use during the procurement process.
processors. When deploying servers Many vendors have designed systems
3
Hadoop* Benchmark Comparison

Between Two Generations of Intel® Xeon® Processors
Using Hadoop 0.19.1 (Lower Values are Better)
0.9 1x 1x 1x
0.8 .87x
0.7
.74x Intel® Xeon® processor 5400 series
0.6
Job .64x Intel Xeon processor 5500 series
0.5
Running
Time 0.4
0.3
0.2 4 System Software Selection
0.1 and Configuration
0 4.1 Selecting the Operating System
Sort WordCount TeraSort
and JVM
Hadoop Benchmarks Using a Linux* distribution based on kernel
version 2.6.30 or later is recommended
Figure 2. The Intel® Xeon® processor 5500 series improves on the performance of its when deploying Hadoop on current-
predecessors across a range of Hadoop* benchmarks.3 generation servers because of the
optimizations included for energy and
threading efficiency. For example, Intel has
observed that energy consumption can
3.6 Choosing Processors Intel Xeon processor 5500 series provides be up to 60 percent (42 watts) higher at
up to 86 percent more throughput than idle for each server using older versions of
The processor plays an important role
the previous generation processors, as Linux.6 Such power inefficiency, multiplied
in determining the speed, throughput,
shown in Figure 3. This characteristic over a large Hadoop cluster, could amount
and efficiency of Hadoop clusters.
allows Hadoop clusters to handle far larger to significant additional energy costs.
The Intel Xeon processor 5500 series
provides excellent performance for highly datasets and more operations in a given For better performance, the local
distributed workloads such as those amount of time.4 file systems (for example, ext3 or
associated with Hadoop applications.2 This testing helps to support Intel’s xfs) are usually mounted with the
recommendation of the Intel Xeon noatime attribute. In addition, Sun
Lab testing was performed to establish
processor 5500 series as the server Java* 6 is required to run Hadoop, and
the performance benefits of the Intel
engine-of-choice for Hadoop clusters. the latest version (Java 6u14 or later)
Xeon processor 5500 series relative to
is recommended to take advantage
previous-generation Intel processors It is important to note that each Intel of optimizations such as compressed
(Figure 2). The results show the Intel Xeon Xeon processor 5500 series typically ordinary object pointers.
processor 5500 series can provide up to has four cores, each of which can handle
36 percent faster performance for some two threads when Intel® Hyper-Threading The default Linux open file descriptor limit
key Hadoop workloads.3 Technology (Intel® HT Technology) is is set to 1024, which is usually too low
enabled. Some Hadoop workloads, such for Hadoop daemons. This setting should
The latest Intel Xeon processor is not only
as JavaSort*, can improve in performance be increased to approximately 64,000
faster at Hadoop tasks; it can also handle
by as much as 25 percent when running using the /etc/security/limits.conf file
more throughput, defined here as the
with Intel HT Technology enabled in the or alternate means. If the Linux kernel
number of tasks completed per minute
processor, versus having the capability 2.6.28 is used, the default open epoll
when the Hadoop cluster is at 100 percent
turned off.5 It is recommended that file descriptor limit is 128, which is too
utilization processing multiple Hadoop jobs.
customers running Hadoop clusters turn low for Hadoop and should be increased
Intel lab tests have demonstrated that the
on Intel HT Technology. to approximately 4096 using the
/etc/sysctl.conf file or alternate means.
4
4.2 Choosing Hadoop Versions The primary source for securing the 5.1 General Configurations
and Distributions latest distribution is the Apache Software • The numbers of NameNode and
When selecting a version of Hadoop Foundation Web site (www.apache.org). JobTracker server threads that handle
for the implementation, organizations For companies planning Hadoop remote procedure calls (RPCs), specified
must seek a balance between the installations, it may be worthwhile to by dfs.namenode.handler.count and
enhancements available from the most evaluate the Cloudera distribution, which mapred.job.tracker.handler.count,
recent available release and the stability includes RPM and Debian* packaging and respectively, both default to 10 and
available from more mature versions. tools for configuration. Intel has used should be set to a larger number (for
For example, at the time of this writing, Cloudera’s distribution on some of its lab example, 64) for large clusters.
the most recent stable version of Hadoop systems for performance testing.
• The number of DataNode server threads
is 0.18.3, while the latest release of
5 Hadoop Configurations that handle RPCs, as specified by
Hadoop, version 0.20.0, contains important
and Tuning dfs.datanode.handler.count, defaults
enhancements, including pluggable
to three and should be set to a larger
scheduling API, capacity scheduler, fair To achieve maximum results from
number (for example, eight) if there are a
scheduler, and multiple task assignment. Hadoop implementations, Intel lab testing
large number of HDFS clients. (Note: Every
has identified some key considerations
One other potential advantage of additional thread consumes more memory.)
for configuring the Hadoop environment
using Hadoop 0.20.0 is in the area of
itself, which are described in this section. • The number of work threads on
performance. Intel’s lab testing shows
Similar to the system hardware and the HTTP server that runs on each
that some workloads within Hadoop can
software recommendations given above, TaskTracker to handle the output of map
benefit from the multi-task assignment
these settings must be tailored to the tasks on that server, as specified by
features in 0.20.0. Although the Map
needs of the individual implementation, tasktracker.http.threads, should be set in
stage in v0.20.0 is slower and uses more
and users are encouraged to experiment the range of 40 to 50 for large clusters.
memory than v0.19.1, the overall job
with their own systems and environment.
runs at about the same speed or up to 8 5.2 HDFS-Specific Configurations
Nevertheless, there are factors in
percent faster in v0.20.0 in the case of
common among Hadoop deployments • The replication factor for each block
Hadoop Sort.7
that provide general guidance. of an HDFS file, as specified by
dfs.replication, is typically set to three
for fault-tolerance; setting it to a smaller
Hadoop Benchmark Comparison value is not recommended.
Between Two Generations of Intel® Xeon® Processors
Throughput Test (Higher Values are Better) • The default HDFS block size, as specified
by dfs.block.size, is 64 MB in HDFS, and it is
usually desirable to use a larger block size
2
(such as 128 MB) for large file systems.
1.5
Relative Intel® Xeon® processor 5400 series

Tasks per 1
Minute Intel Xeon processor 5500 series
0.5
0
Hadoop
Hadoop Sort
WordCount
Figure 3. The Intel® Xeon® processor 5500 series provides higher throughput than
its predecessors on selected Hadoop* benchmarks.4
5
5.3 Map/Reduce-Specific Configurations 6 Conclusion dramatically in terms of the performance

• The maximum number of map/ Achieving optimal results from a Hadoop and total cost of ownership associated
reduce tasks that run simultaneously implementation begins with choosing the with the environment. The testing at
on a TaskTracker, as specified by mapred. correct hardware and software stack. Intel labs that is summarized in this
tasktracker.{map/reduce}.tasks.maximum, Fine-tuning the environment calls for a paper provides a substantial advantage
should usually be set in the range of fairly in-depth configuration. The effort to organizations in the planning stages,
(cores_per_node)/2 to 2x(cores_per_ involved in the planning stages can pay off including the following composite system-
node), especially for large clusters. stack recommendation:
• The number of input streams (files) to

Server Processor Two-way Intel® Xeon® processor 5500 series
be merged at once in the map/reduce
tasks, as specified by io.sort.factor, should Hard Disks Four to six 7200 RPM SATA drives
be set to a sufficiently large value (for Memory 12-24 GB DDR3 ECC RAM
example, 100) to minimize disk accesses. Motherboard Intel® Server Board S5500WB
• The JVM settings should have the Power Supply 80 PLUS* Gold Certified
parameter java.net.preferIPv4Stack Operating System Linux* based on Kernel 2.6.30 or later
set to true, to avoid timeouts in cases
JVM* Sun Java* 6u14 or later
where the OS/JVM picks up an IPv6
address and must resolve the hostname. Hadoop* Version 0.18.3 or 0.20.0
5.4 Map Task-Specific Configurations Once the preliminary system configurations are complete, the tuning advice given in this
• The total size of result and metadata paper enables implementing organizations to improve their Hadoop environments further.
buffers associated with a map task,
as specified by io.sort.mb, defaults to
100 MB and can be set to a higher level, Join the conversation in The Server Room:
such as 200 MB. communities.intel.com/community/server
• The percentage of total buffer size that
is dedicated to the metadata buffer, as
specified by io.sort.record.percent, which
defaults to 0.05, should be adjusted
according to the key-value pair size of the
particular Hadoop job.
6
7
Optimizing Hadoop* Deployments
Benchmarking Detail and Disclaimers

Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel® products as
measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources
of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the
performance of Intel products, visit Intel Performance Benchmark Limitations (www.intel.com/performance/resources/benchmark_limitations.htm).
1 Source: Intel internal measurements as of January 2009. System configurations for each of the tests are the same (2 x 2.93 GHz NHM, 1 HDD, 6 x 2 GB DIMMs,
1 PSU.
2 For more information on the Intel® Xeon® processor 5500 series, see http://download.intel.com/products/processor/xeon/dc55kprodbrief.pdf.
3 Source: Intel internal measurement on September 1, 2009 running Hadoop* Sort, Word Count, and TeraSort benchmarks. Raw score of Intel ® Xeon® processor
54xx-based cluster versus cluster based on Intel Xeon processor 55xx series. Cluster architecture includes 5 nodes per server processor architecture under
comparison (Intel Xeon processor 54xx series and Intel Xeon processor 55xx series). Each has 1 NameNode/JobTracker and 4 DataNodes/TaskTrackers. 1xGbE
connectivity to a single GbE switch. Intel Xeon processor 54xx series servers: 2 x Intel Xeon processor X5460 3.16 GHz (8 cores per node), 16 GB DDR2 RAM,
1 SAS (for system and log files) and 4 SATA disks (for Hadoop Distributed File System (HDFS) and intermediate results) per node.
Intel Xeon processor 55xx series servers: 2 x Xeon processor X5570 2.93 GHz (8 cores per node), 16 GB DDR3 RAM, 5 SATA disks per node (1 for system and log
files, and the other 4 for HDFS and intermediate results). Both EIST (Enhanced Intel® SpeedStep Technology) and Turbo mode disabled. Both hardware prefetcher
and adjacent cache-line prefetch enabled. SMT (Simultaneous Multi-Threading) enabled.
Software: Red Hat Enterprise Linux* 5.2 (kernel 2.6.18-92.el5 SMP x86_64). Ext3 filesystem (mounted with noatime enabled). Sun JVM 1.6 (Java* version 1.6.0_02
Java SE Runtime Environment Java HotSpot* 32-bit server virtual machine). Hadoop version 0.19.1 with patch JIRA Hadoop-5191.
4 Source: Intel internal measurement on September 1, 2009 running Hadoop* Sort and Word Count benchmarks. Raw score of Intel ® Xeon® processor 54xx-based
cluster versus cluster based on Intel Xeon processor 55xx series. Cluster architecture includes 5 nodes per server processor architecture under comparison (Intel
Xeon processor 54xx series and Intel Xeon processor 55xx series). Each has 1 NameNode/JobTracker, 4 DataNodes/TaskTrackers.
1 x GbE connectivity to a single GbE switch. Intel Xeon processor 54xx series servers: 2 x Intel Xeon processor X5460 3.16 GHz (8 cores per node), 16 GB DDR2
RAM, 1 SAS (for system and log files) and 4 SATA disks (for Hadoop Distributed File System (HDFS) and intermediate results) per node, and Red Hat Enterprise
Linux* 5.2 (kernel 2.6.18-92.el5 SMP x86_64).
Intel Xeon processor 55xx series servers: 2 x Intel Xeon processor X5570 2.93 GHz (8 cores per node), 16 GB DDR3 RAM, 5 SATA disks per node (1 for system
and log files, and the other 4 for HDFS and intermediate results). Both EIST (Enhanced Intel ® SpeedStep Technology) and Turbo mode disabled. Both hardware
prefetcher and adjacent cache-line prefetch enabled. SMT (Simultaneous Multi-Threading) enabled. Red Hat Enterprise Linux 5.2 (kernel 2.6.30 x86_64).
Software: Ext3 filesystem (mounted with noatime enabled). Sun JVM 1.6 (Java version 1.6.0_02 Java SE Runtime Environment Java HotSpot* 32-Bit Server VM).
Hadoop version 0.20.0.
5 Source: Intel internal measurement based on the following cluster and server configuration: 7 nodes in each, configured with 2GbE connectivity to each server. Intel®
Xeon® processor 54xx series server configuration:Intel® whitebox server based on the Intel® Server System SR1560SF, 2 x Intel Xeon processor X5482 3.2 GHz, 16
GB DDR2, 1 x Seagate Barracuda* ES 250 GB SATA drives. Intel Xeon processor 55xx series server configuration: 2 x Intel Xeon processor X5560 2.8 GHz, 18 GB
DDR3, 1 x 500 GB. Intel® Hyper-Threading Technology (Intel® HT Technology) requires a computer system with an Intel® processor supporting Intel HT Technology
and an Intel HT Technology-enabled chipset, BIOS, and operating system. Performance will vary depending on the specific hardware and software you use.
See www.intel.com/products/ht/hyperthreading_more.htm for more information including details on which processors support Intel HT Technology.
6 Source: Intel internal measurement as of September 14, 2009 based on running the same server with two different Linux* distributions: CentOS* 5.2 and Fedora* 11.
Power (W) at idle was measured at 110W when running CentOS 5.2 and 68W when running Fedora 11. Server configuration was an Intel® whitebox server based on
the Intel® Server Board SB5000WB, 2 x Intel® Xeon® processor L5520, 16 GB RAM, 1 HDD.
7
Sources: Intel internal measurement on September 14, 2009 and September 1, 2009 running Hadoop* Sort benchmark. Raw score of cluster based on
Intel ® Xeon ® processor 55xx series. Cluster architecture includes 5 nodes of Intel Xeon processor 55xx series. Each has 1 NameNode/JobTracker,
4 DataNodes/TaskTrackers. 1 x GbE connectivity to a single GbE switch. Intel Xeon processor 55xx series servers: 2 x Intel Xeon processor X5570 2.93
GHz (8 cores per node), 16 GB DDR3 RAM, 5 SATA disks per node (1 for system and log files, and the other 4 for HDFS and intermediate results). Both EIST
(Enhanced Intel® SpeedStep Technology) and Turbo mode disabled. Both hardware prefetcher and adjacent cache-line prefetch enabled. SMT (Simultaneous
Multi-Threading) enabled.
Software: Red Hat Enterprise Linux* 5.2 (kernel 2.6.30 x86_64 x86_64). Ext3 filesystem (mounted with noatime enabled). Sun JVM 1.6 (Java* version 1.6.0_02 Java
SE Runtime Environment Java HotSpot* 32-bit server virtual machine). Hadoop version 0.19.1 with patch JIRA Hadoop-5191vs. Hadoop 0.20.0.
Disclaimers & Legal Notices

THE INFORMATION IS FURNISHED FOR INFORMATIONAL USE ONLY, IS SUBJECT TO CHANGE WITHOUT NOTICE, AND SHOULD NOT BE CONSTRUED
AS A COMMITMENT BY INTEL CORPORATION. INTEL CORPORATION ASSUMES NO RESPONSIBILITY OR LIABILITY FOR ANY ERRORS OR
INACCURACIES THAT MAY APPEAR IN THIS DOCUMENT OR ANY SOFTWARE THAT MAY BE PROVIDED IN ASSOCIATION WITH THIS DOCUMENT.
THIS INFORMATION IS PROVIDED “AS IS” AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO THE USE OF THIS
INFORMATION INCLUDING WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, COMPLIANCE WITH A SPECIFICATION OR STANDARD,
MERCHANTABILITY OR NONINFRINGEMENT.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL
OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND
CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED
WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A
PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. UNLESS
OTHERWISE AGREED IN WRITING BY INTEL, THE INTEL PRODUCTS ARE NOT DESIGNED NOR INTENDED FOR ANY APPLICATION IN WHICH THE
FAILURE OF THE INTEL PRODUCT COULD CREATE A SITUATION WHERE PERSONAL INJURY OR DEATH MAY OCCUR.
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any
features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or
incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published
specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications
and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be
obtained by calling 1-800-548-4725, or by visiting Intel’s Web Site http://www.intel.com/.
*Other names and brands may be claimed as the property of others.
Copyright © 2009 Intel Corporation. All rights reserved. Intel, the Intel logo, Intel SpeedStep, and Xeon are trademarks of Intel Corporation in the U.S. and other countries.
1009/KH/MESH/PDF 322723-002US

Intel White Paper Optimizing Hadoop Deployments

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Intel White Paper Optimizing Hadoop Deployments

Transféré par

Droits d'auteur :

Formats disponibles

WHITE PAPER

Optimizing Hadoop* Deployments

Tuning Hadoop* clusters is vital to EXECUTIVE SUMMARY

1 Overview • Server hardware. The planning topic

Table of Contents 2 General Hadoop Cluster Topology 3 Server Hardware Configurations

Energy Consumption Comparison Between Server Motherboards

Intel® Server Board S5500WB

based on Intel Xeon processors with

Selecting a server with the right

Hadoop* Benchmark Comparison

Relative Intel® Xeon® processor 5400 series

5.3 Map/Reduce-Specific Configurations 6 Conclusion dramatically in terms of the performance

• The number of input streams (files) to

Benchmarking Detail and Disclaimers

Disclaimers & Legal Notices

Vous aimerez peut-être aussi