Académique Documents
Professionnel Documents
Culture Documents
A computer cluster is a group of loosely coupled computers that work together closely
so that in many respects it can be viewed as though it were a single computer. Clusters
are commonly connected through fast local area networks. Clusters are usually
deployed to improve speed and/or reliability over that provided by a single computer,
while typically being much more cost-effective than single computers of comparable
speed or reliability. Cluster computing has emerged as a result of convergence of
several trends including the availability of inexpensive high performance
microprocessors and high speed networks, the development of standard software tools
for high performance distributed computing. Clusters have evolved to support
applications ranging from ecommerce, to high performance database applications.
Clustering has been available since the 1980s when it was used in DEC's VMS
systems. IBM's sysplex is a cluster approach for a mainframe system. Microsoft, Sun
Microsystems, and other leading hardware and software companies offer clustering
packages that are said to offer scalability as well as availability. Cluster computing
can also be used as a relatively low-cost form of parallel processing for scientific and
other applications that lend themselves to parallel operations.
CONTENTS
1. Introduction---------------------------------------------6 2.
History-----------------------------------------------------8 3.
Clusters----------------------------------------------------9 4. Why Clusters?
-------------------------------------------13 5. Comparing old and
new--------------------------------15 6. Logical view of
Clusters--------------------------------17 7.
Architecture-----------------------------------------------19 8. Components of Cluster
Computer--------------------29 9. Cluster Classifications-----------------------------------31
10. Issues to be considered---------------------------------32 11. Future
Trends-------------------------------------------34 12.
Conclusion------------------------------------------------36 13.
Reference-------------------------------------------------37
INTRODUCTION
Computing is an evolutionary process. Five generations of development history—
with each generation improving on the previous one’s technology, architecture,
software, applications, and representative systems—make that clear. As part of this
evolution, computing requirements driven by applications have always outpaced the
available technology. So, system designers have always needed to seek faster, more
cost effective computer systems. Parallel and distributed computing provides the best
solution, by offering computing power that greatly exceeds the technological
limitations of single processor systems. Unfortunately, although the parallel and
distributed computing concept has been with us for over three decades, the high cost
of multiprocessor systems has blocked commercial success so far. Today, a wide
range of applications are hungry for higher computing power, and even though single
processor PCs and workstations now can provide extremely fast processing; the even
faster execution that multiple processors can achieve by working concurrently is still
needed. Now, finally, costs are falling as well. Networked clusters of commodity PCs
and workstations using off-the-shelf processors and communication platforms such as
Myrinet, Fast Ethernet, and Gigabit Ethernet are becoming increasingly cost effective
and popular. This concept, known as cluster computing, will surely continue to
flourish: clusters can provide enormous computing power that a pool of users can
share or that can be collectively used to solve a single application. In addition, clusters
do not incur a very high cost, a factor that led to the sad demise of massively parallel
machines.
6
CLUSTER HISTORY
The first commodity clustering product was ARCnet, developed by Datapoint in 1977.
ARCnet wasn't a commercial success and clustering didn't really take off until DEC
released their VAXcluster product in the 1980s for the VAX/VMS operating system.
The ARCnet and VAXcluster products not only supported parallel computing, but
also shared file systems and peripheral devices. They were supposed to give you the
advantage of parallel processing while maintaining data reliability and uniqueness.
VAXcluster, now VMScluster, is still available on OpenVMS systems from HP
running on Alpha and Itanium systems. The history of cluster computing is intimately
tied up with the evolution of networking technology. As networking technology has
become cheaper and faster, cluster computers have become significantly more
attractive. How to run applications faster? There are 3 ways to improve performance:
Work Harder Work Smarter Get Help Era of Computing Rapid technical advances •
the recent advances in VLSI technology • software technology
• grand challenge applications have become the main driving force • Parallel
computing
CLUSTERS
Extraordinary technological improvements over the past few years in areas such as
microprocessors, memory, buses, networks, and software have made it possible to
assemble groups of inexpensive personal computers and/or workstations into a cost
effective system that functions in concert and posses tremendous processing power.
Cluster computing is not new, but in company with other technical capabilities,
particularly in the area of networking, this class of machines is becoming a
highperformance platform for parallel and distributed applications Scalable computing
clusters, ranging from a cluster of (homogeneous or heterogeneous) PCs or
workstations to SMP (Symmetric Multi Processors), are rapidly becoming the
standard platforms for highperformance and large-scale computing. A cluster is a
group of independent computer systems and thus forms a loosely coupled
multiprocessor system as shown in figure.
10
However, the cluster computing concept also poses three pressing research
challenges: A cluster should be a single computing resource and provide a single
system image. This is in contrast to a distributed system where the nodes serve only as
individual resources. It must provide scalability by letting the system scale up or
down. The scaled-up system should provide more functionality or better performance.
The system’s total computing power should increase proportionally to the increase in
resources. The main motivation for a scalable system is to provide a flexible, cost
effective Information-processing tool. The supporting operating system and
communication Mechanism must be efficient enough to remove the performance
Bottlenecks. The concept of Beowulf clusters is originated at the Center of Excellence
in Space Data and Information Sciences (CESDIS), located at the NASA Goddard
Space Flight Center in Maryland. The goal of building a Beowulf cluster is to create a
cost effective parallel computing system from commodity components to satisfy
specific computational requirements for the earth and space sciences community. The
first Beowulf cluster was built from 16 IntelDX4TM processors connected by a
channel bonded 10 Mbps Ethernet and it ran the Linux operating system. It was an
instant success, demonstrating the concept of using a commodity cluster as an
alternative
11
choice for high-performance computing (HPC). After the success of the first Beowulf
cluster, several more were built by CESDIS using several generations and families of
processors and network. Beowulf is a concept of clustering commodity computers to
form a parallel, virtual supercomputer. It is easy to build a unique Beowulf cluster
from components that you consider most appropriate for your applications. Such a
system can provide a cost-effective way to gain features and benefits (fast and reliable
services) that have historically been found only on more expensive proprietary shared
memory systems. The typical architecture of a cluster is shown in Figure 3. As the
figure illustrates, numerous design choices exist for building a Beowulf cluster.
12
WHY CLUTERS?
The question may arise why clusters are designed and built when perfectly good
commercial supercomputers are available on the market. The answer is that the latter
is expensive. Clusters are surprisingly powerful. The supercomputer has come to play
a larger role in business applications. In areas from data mining to fault tolerant
performance clustering technology has become increasingly important. Commercial
products have their place, and there are perfectly good reasons to buy a
commerciallyproduced supercomputer. If it is within our budget and our applications
can keep machines busy all the time, we will also need to have a data center to keep it
in. then there is the budget to keep up with the maintenance and upgrades that will be
required to keep our investment up to par. However, many who have a need to harness
supercomputing power don’t buy supercomputers because they can’t afford them.
Also it is impossible to upgrade them. Clusters, on the other hand, are cheap and easy
way to take off-the-shelf components and combine them into a single supercomputer.
In some areas of research clusters are actually faster than commercial supercomputer.
Clusters also have the distinct advantage in that they are simple to build using
components available from hundreds of sources. We don’t even have to use new
equipment to build a cluster. Price/Performance
13
The most obvious benefit of clusters, and the most compelling reason for the growth
in their use, is that they have significantly reduced the cost of processing power. One
indication of this phenomenon is the Gordon Bell Award for Price/Performance
Achievement in Supercomputing, which many of the last several years has been
awarded to Beowulf type clusters. One of the most recent entries, the Avalon cluster
at Los Alamos National Laboratory, "demonstrates price/performance an order of
magnitude superior to commercial machines of equivalent performance." This
reduction in the cost of entry to high-power computing (HPC) has been due to co
modification of both hardware and software over the last 10 years particularly. All the
components of computers have dropped dramatically in that time. The components
critical to the development of low cost clusters are: 1. Processors - commodity
processors are now capable of computational power previously reserved for
supercomputers, witness Apple Computer's recent add campain touting the G4
Macintosh as a supercomputer. 2. Memory - the memory used by these processors has
dropped in cost right with the processors. 3. Networking Components - the most
recent group of products to experience co modification and dramatic cost decreases is
networking hardware. High- Speed networks can now be assembled with these
products for a fraction of the cost necessary only a few years ago. 4. Motherboards,
busses, and other sub-systems - all of these have become commodity products,
allowing the assembly of affordable computers from off the shelf components
14
Vendor lock-in The age-old problem of proprietary vs. open systems that use
industryaccepted standards is eliminated. System manageability The installation,
configuration and monitoring of key elements of proprietary systems is usually
accomplished with proprietary technologies, complicating system management. The
servers of an HPC cluster can be easily managed from a single point using readily
available network infrastructure and enterprise management software. Reusability of
components Commercial components can be reused, preserving the investment. For
example, older nodes can be deployed as file/print servers, web servers or other
infrastructure servers. Disaster recovery Large SMPs are monolithic entities located in
one facility. HPC systems can be collocated or geographically dispersed to make them
less susceptible to disaster.
16
17
The master node acts as a server for Network File System (NFS) and as a gateway to
the outside world. As an NFS server, the master node provides user file space and
other common system software to the compute nodes via NFS. As a gateway, the
master node allows users to gain access through it to the compute nodes. Usually, the
master node is the only machine that is also connected to the outside world using a
second network interface card (NIC). The sole task of the compute nodes is to execute
parallel jobs. In most cases, therefore, the compute nodes do not have keyboards,
mice, video cards, or monitors. All access to the client nodes is
18
provided via remote connections from the master node. Because compute nodes do
not need to access machines outside the cluster, nor do machines outside the cluster
need to access compute nodes directly, compute nodes commonly use private IP
addresses, such as the 10.0.0.0/8 or 192.168.0.0/16 address ranges. From a user’s
perspective, a Beowulf cluster appears as a Massively Parallel Processor (MPP)
system. The most common methods of using the system are to access the master node
either directly or through Telnet or remote login from personal workstations. Once on
the master node, users can prepare and compile their parallel applications, and also
spawn jobs on a desired number of compute nodes in the cluster. Applications must be
written in parallel style and use the message-passing programming model. Jobs of a
parallel application are spawned on compute nodes, which work collaboratively until
finishing the application. During the execution, compute nodes use 10 standard
message-passing middleware, such as Message Passing Interface (MPI) and Parallel
Virtual Machine (PVM), to exchange information.
ARCHITECTURE
A cluster is a type of parallel or distributed processing system, which consists of a
collection of interconnected standalone computers cooperatively working together as
a single, integrated computing resource A node:
•
21
There are some key concepts that must be understood when forming a cluster
computing resource. Nodes or systems are the individual members of a cluster. They
can be computers, servers, and other such hardware although each node generally has
memory and processing capabilities. If one node becomes unavailable the other nodes
can carry the demand load so that applications or services are always available. There
must be at least two nodes to compose a cluster structure otherwise they are just called
servers. The collection of software on each node that manages all cluster specific
activity is called the cluster service. The cluster service manages all of the resources,
the canonical items in the system, and sees then as identical opaque objects.
Resources can be such things as physical hardware devices, like disk drives and
network cards, logical items, like logical disk volumes, TCP/IP addresses,
applications, and databases.
22
When a resource is providing its service on a specific node it is said to be on-line. A
collection of resources to be managed as a single unit is called a group. Groups
contain all of the resources necessary to run a specific application, and if need be, to
connect to the service provided by the application in the case of client systems. These
groups allow administrators to combine resources into larger logical units so that they
can be managed as a unit. This, of course, means that all operations performed on a
group affect all resources contained within that group. Normally the development of a
cluster computing system occurs in phases. The first phase involves establishing the
underpinnings into the base operating system and building the foundation of the
cluster components. These things should focus on providing enhanced availability to
key applications using storage that is accessible to two nodes. The following stages
occur as the demand increases and should allow for much larger clusters to be formed.
These larger clusters should have a true distribution of applications, higher
performance interconnects, widely distributed storage for easy accessibility and load
balancing. Cluster computing will become even more prevalent in the future because
of the growing needs and demands of businesses as well as the spread of the Internet.
Clustering Concepts
Clusters are in fact quite simple. They are a bunch of computers tied together with a
network working on a large problem that has been broken down into smaller pieces.
There are a number of different strategies we can use to tie them together. There are
also a number of different software packages that can be used to make the software
side of things work.
23
25
A modular architecture for SSI allows the use of services provided by lower level
layers to be used for the implementation of higher-level services. This unit discusses
design issues, architecture, and representative systems for job/resource management,
network RAM, software RAID, single I/O space, and virtual networking. A number of
operating systems have proposed SSI solutions, including MOSIX, Unixware, and
Solaris -MC. It is important to discuss one or more such systems as they help students
to understand architecture and implementation issues. Message Passing Primitives
Although new high-performance protocols are available for cluster computing, some
instructors may want provide students with a brief introduction to message passing
programs using the BSD Sockets interface Transmission Control Protocol/Internet
Protocol (TCP/IP) before introducing more complicated parallel programming with
distributed memory programming tools. If students have already had a course in data
communications or computer networks then this unit should be skipped. Students
should have access to a networked computer lab with the Sockets libraries enabled.
Sockets usually come installed on Linux workstations. Parallel Programming Using
MPI An introduction to distributed memory programming using a standard tool such
as Message Passing Interface (MPI)[23] is basic to cluster computing. Current
versions of MPI generally assume that programs will be written in C, C++, or Fortran.
However, Java-based versions of MPI are becoming available.
26
27
Provide a simple, straightforward view of all system resources and activities, from any
node of the cluster Free the end user from having to know where an application will
run Free the operator from having to know where a resource is located Let the user
work with familiar interface and commands and allows the administrators to manage
the entire clusters as a single entity Reduce the risk of operator errors, with the result
that end users see improved reliability and higher availability of the system Allowing
centralize/decentralize system management and control to avoid the need of skilled
administrators from system administration Present multiple, cooperating components
of an application to the administrator as a single application
•••
28
•••
29
30
31
CLUSTER CLASSIFICATIONS
Clusters are classified in to several sections based on the facts such as 1)Application
target 2) Node owner ship 3) Node Hardware 4) Node operating System 5) Node
configuration. Clusters based on Application Target are again classified into two: •
High Performance (HP) Clusters • High Availability (HA) Clusters Clusters based on
Node Ownership are again classified into two: • Dedicated clusters • Non-dedicated
clusters Clusters based on Node Hardware are again classified into three: • Clusters of
PCs (CoPs) • Clusters of Workstations (COWs) • Clusters of SMPs (CLUMPs)
Clusters based on Node Operating System are again classified into: • Linux Clusters
(e.g., Beowulf) • Solaris Clusters (e.g., Berkeley NOW) • Digital VMS Clusters • HP-
UX clusters
32
Homogeneous Clusters -All nodes will have similar architectures and run the same
OSs Heterogeneous Clusters- All nodes will have different architectures and run
different OSs
•
ISSUES TO BE CONSIDERED
Cluster Networking If you are mixing hardware that has different networking
technologies, there will be large differences in the speed with which data will be
accessed and how individual nodes can communicate. If it is in your budget make sure
that all of the machines you want to include in your cluster have similar networking
capabilities, and if at all possible, have network adapters from the same manufacturer.
Cluster Software You will have to build versions of clustering software for each kind
of system you include in your cluster. Programming Our code will have to be written
to support the lowest common denominator for data types supported by the least
powerful node in our cluster. With mixed machines, the more powerful machines will
have attributes that cannot be attained in the powerful machine. Timing This is the
most problematic aspect of heterogeneous cluster. Since these machines have different
performance profile our code will execute at
33
different rates on the different kinds of nodes. This can cause serious bottlenecks if a
process on one node is waiting for results of a calculation on a slower node. The
second kind of heterogeneous clusters is made from different machines in the same
architectural family: e.g. a collection of Intel boxes where the machines are different
generations or machines of same generation from different manufacturers. Network
Selection There are a number of different kinds of network topologies, including
buses, cubes of various degrees, and grids/meshes. These network topologies will be
implemented by use of one or more network interface cards, or NICs, installed into
the head-node and compute nodes of our cluster. Speed Selection No matter what
topology you choose for your cluster, you will want to get fastest network that your
budget allows. Fortunately, the availability of high speed computers has also forced
the development of high speed networking systems. Examples are 10Mbit Ethernet,
100Mbit Ethernet, gigabit networking, channel bonding etc.
34
35
examples of new applications that benefit from using Grid technology constitute a
coupling of advanced scientific instrumentation or desktop computers with remote
supercomputers; collaborative design of complex systems via high-bandwidth access
to shared resources; ultra-large virtual supercomputers constructed to solve problems
too large to fit on any single computer; rapid, large-scale parametric studies. The Grid
technology is currently under intensive development. Major Grid projects include
NASA’s Information Power Grid, two NSF Grid projects (NCSA Alliance’s Virtual
Machine Room and NPACI), the European DataGrid Project and the ASCI
Distributed Resource Management project. Also first Grid tools are already available
for developers. The Globus Toolkit [20] represents one such example and includes a
set of services and software libraries to support Grids and Grid applications.
36
CONCLUSION
Clusters are promising • Solve parallel processing paradox • Offer incremental growth
and matches with funding pattern
•
New trends in hardware and software technologies are likely to make clusters more
promising and fill SSI gap.
37
REFERENCE
www.buyya.com www.beowulf.org www.clustercomp.org www.sgi.com
www.thu.edu.tw/~sci/journal/v4/000407.pdf
www.dgs.monash.edu.au/~rajkumar/cluster
www.cfi.lu.lv/teor/pdf/LASC_short.pdf www.webopedia.com
www.howstuffworks.com