Vous êtes sur la page 1sur 8

FAQ/GeneralQuestions - Cluster Wiki Page 1 of 8

General Questions
l What is Cluster Project?
l What is Red Hat Cluster Suite (RHCS)?
l What's the history of the cluster project?
l What does the Cluster Project encompass?
l How do you configure a cluster?
l Why do my changes to cluster.conf file keep disappearing?
l What's the "right way" to propagate the cluster.conf file to a running cluster?
l What are all the possible options that can go into cluster.conf file?
l Are there any examples of cluster.conf I can look at?
l Which kernel does this code run on?
l Where is the source code?
l Is the Cluster Project code in CVS production ready?
l Does the cluster project run on OS foo?
l Where is the project page?
l What hardware do I need to run one of these "clusters"?
l What is the largest cluster of this type in existence?
l Do I really need shared storage?
l Are there any manuals or other documentation I can reference?
l What ports do I have to enable for the iptables firewall?
l Are there any public source tarballs that compile against specific kernels?
l What are the differences between the RHEL4 and RHEL5 versions?
l Can I have a mixed cluster with some RHEL4 and some RHEL5 nodes?
l Can I use cluster suite from xen or vmware?
l When I reboot a xen dom, I get cluster errors and it gets fenced. What's going on and how do I
fix it?
l My cluster.conf failed to validate but I don't understand the error. What should I do?

What is Cluster Project?

The Cluster Project is a set of components designed to enable clustering, which means a group of
computers all sharing resources, such as shared storage devices and services. Clustering ensures data
integrity when people are working on devices from multiple machines (or virtual machines) at the
same time.

What is Red Hat Cluster Suite (RHCS)?

Red Hat Cluster Suite is a marketing term under which some of this software is promoted. Red Hat
has bundled components from the cluster project together and made them available for its various

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 2 of 8

platforms.

What's the history of the cluster project?

Somewhere around 1996, Red Hat developed its first Cluster Suite, which primarily managed cluster-
cooperative services. That's the equivalent of rgmanager now.

From 1997 to 2003, Sistina Software was spun off from a project at the University of Minnesota, and
they developed a clustering file system which became the Global File System, GFS, which it sold to
customers.

In 2004, Red Hat, Inc. bought Sistina, merged GFS into its Cluster Suite, and open-sourced the whole
thing.

Today, the cluster project belongs to the people and is available for free to the public through Red
Hat's CVS repository. The open-source community continues to improve and develop the cluster
project with new clustering technology and infrastructures, such as OpenAIS.

What does the Cluster Project encompass?

That depends on what version you are using. Like all active technology, it is constantly evolving. The
Cluster Project involves development in many different areas including:

l CCS - cluster configuration system to manage the cluster.conf file


l Cluster Suite Deployment Tool - graphical tool to deploy Cluster Suite on multiple machines
l CLVM - clustering extensions to the LVM2 logical volume manager toolset
l CMAN - cluster manager
l Conga - gui-based cluster manager
l DLM - distributed lock manager
l Fence - I/O fencing system
l GFS* - shared-disk cluster file system (Global File System)
l GFS2* - shared-disk cluster file system (Global File System 2)
l GNBD - kernel module to share block devices to many machines over a network
l GULM - redundant server-based cluster and lock manager (alternative to CMAN and DLM)
l OpenAIS - open cluster infrastructure
l Magma - clustering/locking library used for transition between GULM and CMAN/DLM
l RGManager - resource group manager to monitor, start and stop applications, services,
resources
l system-config-cluster - graphical tool to manage Cluster Suite on multiple machines

How do you configure a cluster?

Assuming you have all the necessary pieces and/or RPMs in place, there are four ways to configure a
cluster:

Manually edit /etc/cluster/cluster.conf and propagate it to all nodes. Use system-config-cluster tool gui

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 3 of 8

(RHEL4) - For cluster configuration maintenance. Cluster Suite Deployment tool gui (cs-deploy-tool)
(RHEL4) - For initial cluster setup. Conga web-based cluster configuration tool (RHEL5 and FC6).

Why do my changes to cluster.conf file keep disappearing?

The cluster configuration system (ccs) tries to manage the cluster.conf file and keep all the nodes in
sync. If you make changes to the cluster.conf file, you have to tell ccs and cman that you did it, so
they can update the other nodes. If you don't, your changes are likely to be overwritten with an older
version of the cluster.conf file from a different node. See the next question.

What's the "right way" to propagate the cluster.conf file to a running cluster?

The cluster configuration guis take case of propagating changes to cluster.conf to your cluster. The
system-config-cluster gui has a big button that says "Send to Cluster". If you're maintaining your
cluster.conf file by hand and want to propagate it to the rest of the cluster, do this:

Edit /etc/cluster/cluster.conf using the editor of choice. Tell ccs about the change: ccs_tool
update /etc/cluster/cluster.conf Find out what version your cluster.conf file currently is from cman's
perspective: cman_tool status | grep "Config version" It should come back with something like this:
Config version: 37

Tell cman your newer cluster.conf is a newer version: cman_tool version -r 38

Note: For RHEL5 and similar, cman_tool -r is no longer necessary.

What are all the possible options that can go into cluster.conf file?

A list of options can be found at the following link. I won't guarantee it's complete or comprehensive,
but it's pretty close:

http://sources.redhat.com/cluster/doc/cluster_schema.html

Are there any examples of cluster.conf I can look at?

Take a look at the man page for cluster.conf (5). There's also a small example in the usage.txt file:
cluster/doc/usage.txt

Which kernel does this code run on?

The GFS 6.0 cluster code runs on the 2.4.xx (for Red Hat Enterprise Linux 3). The GFS 6.1 code runs
on the 2.6.xx series kernels for Red Hat Enterprise Linux 4, Fedora Core and other distributions.

Where is the source code?

The source code for the current development tree is kept in a git repository on sources.redhat.com.
You can view it through a web browser by following this link:

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 4 of 8

http://sourceware.org/git/?p=cluster.git;a=tree

You can check the entire source code tree out from git with this command:

git clone git://sources.redhat.com/git/cluster.git

You may first need to install git or git-core (e.g. yum -y install git). If you plan to make changes to the
source, you may want to configure your git environment:

git config --global user.name "Your Name Comes Here"


git config --global user.email you@yourdomain.example.com

For more information about using the cluster git repository, see ClusterGit.

Is the Cluster Project code in git production ready?

The git repository contains all the source code for all the branches. Initially, the branch shown is the
"master" branch, which is development (not stable) code. To get a list of branches, you can use this
command:

git branch -a

If you want to switch to a different branch, use the "git checkout" command. For example, to switch
to the STABLE2 branch (which should work with the current upstream kernel from kernel.org), use
this command:

git checkout STABLE2

The source code for the GFS2 file system is in a different git repository because it's now part of the
upstream kernel. You can check out the most recent source for it by doing:

git clone git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw.git gfs2


gfs2-2.6.git

The source code for openais is currently kept in a subversion repository. You can check out the most
recent source for it by doing:

svn checkout http://svn.osdl.org/openais

Does the cluster project run on OS foo?

The cluster project was primarily designed to run on linux. Some of the cluster infrastructure, such as

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 5 of 8

OpenAIS has been successfully ported to FreeBSD and possibly Darwin.

Where is the project page?

The project page is: here

What hardware do I need to run one of these "clusters"?

It depends on which components you need to use. For a basic cluster, all you need is two or more
computers and a network between them. If you want to use GFS, you'll need shared storage.

What is the largest cluster of this type in existence?

The current architecture for general purpose high availability clusters is known to scale to 32 nodes.
At the present time we recommend a maximum size of 16 node clusters as 16 nodes or less provides
the most stable configuration.

For GFS and GFS2 filesystems, the current limitation is also 16 node clusters.

Do I really need shared storage?

It depends on what you're planning to do. The point of using GFS and CLVM is that you have storage
you want to share between machines concurrently. Without shared storage, you have a local
filesystem and lvm2, neither of which need the cluster infrastructure. If you want to use the cluster
infrastructure for High Availability services, you don't need shared storage.

Are there any manuals or other documentation I can reference?

Yes. They are here:

l cluster/doc/usage.txt (frequently updated in cvs)


l http://www.redhat.com/docs/manuals/csgfs/ - Manuals and other docs
l http://www.redhat.com/docs/manuals/csgfs/Oracle_GFS-en-US/index.html - Oracle RAC
10gR2 & GFS installation guide
l http://sources.redhat.com/cluster/doc/nfscookbook.pdf - The Unofficial NFS/GFS Cookbook.

And, of course, this FAQ.

What ports do I have to enable for the iptables firewall?

These ports should be enabled:


port component release protocol
41966 rgmanager RHEL4/STABLE TCP

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 6 of 8

41967 rgmanager RHEL4/STABLE TCP


41968 rgmanager RHEL4/STABLE TCP
41969 rgmanager RHEL4/STABLE TCP
50006 ccsd * TCP
50007 ccsd * UDP
50008 ccsd * TCP
50009 ccsd * TCP
21064 dlm * TCP
6809 cman RHEL4/STABLE UDP
5405, 5404 openais/corosync RHEL5 and up UDP
14567 gnbd * TCP

Are there any public source tarballs that compile against specific kernels?

Yes. From time to time, we build the STABLE branch against different kernels and release the
tarballs. You'll find them here: ftp://sources.redhat.com/pub/cluster/releases/

What are the differences between the RHEL4 and RHEL5 versions?

The cluster software isn't specific to any Linux distribution or release. However, many of the users are
running the software on Red Hat Enterprise Linux (RHEL). Most customers are currently running on
RHEL4 (or the RHEL4 equivalent of CentOS, or at least the RHEL4 branch of the source tree in
CVS). So they may want to know the differences between the way things work now in RHEL4 and
how they'll work in RHEL5.

This list is by no means complete, but these are the differences I know about offhand:

l There's a new web-based cluster configuration tool for RHEL4.5 and RHEL5 called "Conga".
l Cluster.conf in RHEL5 has a new "nodeid=X" requirement for each node. (X is the node
number).
l The ccsd, cman and fenced init scripts in RHEL4 were combined into a single init script for
RHEL5: service cman start.
l However, since users can use a cluster without clvmd, gfs or rgmanager, they are still separate
init scripts.
l RHEL5 has a new "locking_type = 3" in /etc/lvm/lvm.conf
l The logical volume manager can take a new "locking_type = 3" to figure out the appropriate
locking for clustered and
l non-clustered volumes.
l RHEL5 has no more lock_gulm locking protocol. Users are encouraged to use dlm locking.
l The new GFS2 file system will be available as a "tech preview" (not production ready) in
RHEL5.

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 7 of 8

l See question "[GFS#gfs2 What improvements will GFS2 have over GFS(1)?]"

Some of the less noticed internal changes:

A lot of code has been moved from the kernel to userland for better system integrity and easier
debugging. New group daemons will run: groupd, gfs_controld, dlm_controld. The new cluster
infrastructure is built on top of openais clustering technology. Openais uses multicast rather than
broadcast packets for better efficiency. DLM and GFS2 are now part of the base kernel.They were
accepted into the 2.6.18 upstream kernel by kernel.org. Lots of little improvements.

Can I have a mixed cluster with some RHEL4 and some RHEL5 nodes?

It's definitely not a good idea to mix the two within a single cluster. With the introduction of RHEL5,
there are now two distinct and separate cluster infrastructures. The older (RHEL4 or STABLE branch
in CVS) infrastructure passes cluster messages using a kernel module (cman or the one internal to
gulm). The newer infrastructure (RHEL5 or HEAD branch in CVS) passes cluster messages using
openais and userland daemons. If you try to mix and match the two, it will not work.

That said, you could probably still fetch the STABLE branch of the cluster code from CVS, compile it
on a RHEL5 system, and have it interact properly in a RHEL4 cluster through the old infrastructure.
Since the STABLE branch tracks the upstream kernel, you may also need to build a newer kernel
from source code as well on the RHEL5 system.

It would be extremely difficult, if not impossible, to go the other way around (i.e. to get the new
infrastructure and openais running on a RHEL4 system so it could interact with a RHEL5 cluster).

Can I use cluster suite from xen or vmware?

Yes you can. For example, you could have a single computer, running Xen virtualization, act as a
complete cluster consisting of several xen guests. There are special fencing issues to consider. For
example, if you use power fencing, one guest could cause the whole machine to be powered off and
never come back (because it wouldn't be alive to tell the power switch to power back on). There is a
special fencing agent designed to reboot xen guests as needed.

You can also create clusters made of several computers, each of which has several virtual xen guest
nodes. This has other fencing complications. For example, a xen guest can't use a simple xen fencing
agent to reboot a xen guest that's physically running on a different physical computer.

When I reboot a xen dom, I get cluster errors and it gets fenced. What's going on and how do I
fix it?

As I understand it, the problem is due to the fact that xen nodes tear down and rebuild the ethernet nic
after cluster suite has started. We're working on a more permanent solution. In the meantime, here is a
workaround:

Edit the file: /etc/xen/xend-config.sxp line. Locate the line that reads:

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011
FAQ/GeneralQuestions - Cluster Wiki Page 8 of 8

(network-script network-bridge)

Change that line to read:

(network-script /bin/true)

Create and/or edit file /etc/sysconfig/network-scripts/ifcfg-eth0 to look something like:

DEVICE=eth0
ONBOOT=yes
BRIDGE=xenbr0
HWADDR=XX:XX:XX:XX:XX:XX

Create and/or edit file /etc/sysconfig/network-scripts/ifcfg-xenbr0 to look something like:

DEVICE=xenbr0
ONBOOT=yes
BOOTPROTO=static
IPADDR=10.0.0.116
NETMASK=255.255.255.0
GATEWAY=10.0.0.254
TYPE=Bridge
DELAY=0

Substitute your appropriate IP address, netmask and gateway information.

My cluster.conf failed to validate but I don't understand the error. What should I do?

As it turns out, the RelaxNG validator in libxml2, while efficient, also has a side effect of producing
rather odd errors sometimes. We are looking in to ways to make this output more user-friendly. Until
this is resolved, the easiest thing to do is use Jing, a stand-alone RelaxNG validator written in Java.
For a concrete example and error comparison, see bugzilla #531489.

None: FAQ/GeneralQuestions (last edited 2010-02-13 17:14:23 by PerryMyers)

http://sources.redhat.com/cluster/wiki/FAQ/GeneralQuestions?action=print 3/18/2011

Vous aimerez peut-être aussi