Académique Documents
Professionnel Documents
Culture Documents
1 8/05 Software
Differences
Introduction
This document introduces the new features of Sun™ Cluster 3.1 8/05 (Sun
Cluster 3.1 Update 4) software and describes the differences between this
version and Sun Cluster 3.1 9/04 (Sun Cluster 3.1 Update 3) software.
Contents
This document contains:
Product Description......................................................................... page 2
Differences in Detail ........................................................................ page 3
Solaris™ 10 Operating System Support.............................. page 3
Solaris 10 OS Zones and the Solaris OS Container Agent page 7
Support for Network-Attached Storage – NetApp Filer page 17
Cluster Installation: Sun Java™ Enterprise System (Java ES)
Installer Standardization...................................................... page 24
New scinstall Features: “Easy Install” ......................... page 24
Shared Physical LAN Interconnects .................................. page 29
Infiniband as Sun Cluster Transport (Solaris 10 OS) ...... page 36
New Data Service Agents.................................................... page 37
VERITAS Foundation Suite 4.1 (VxVM 4.1 and VxFS4.1).............
....................................................................................................page 38
Product Documentation................................................................ page 39
Further Training ............................................................................. page 39
Product Description
Sun Cluster 3.1 Update 4 software is the first Sun Cluster version to
support the Solaris™ 10 Operating System (Solaris OS). Therefore, the
most important feature of Sun Cluster 3.1 Update 4 software is Solaris 10
OS co-existence. Sun Cluster Update 4 software makes use of the new
Solaris Service Management Facility (SMF) in Solaris 10 OS to manage the
cluster framework daemons and cluster services.
Sun Cluster 3.1 Update 4, while not yet providing full integration of the
Solaris 10 OS zones feature, does support an agent which makes it possible
to automate booting zones and failing over of zones between cluster
nodes. The zones themselves are ordinary Solaris 10 OS zones and do not
have any knowledge of the cluster.
Sun Cluster 3.1 Update 4 has a number of other noteworthy features. The
software includes a new general framework for support of network-
attached storage as the only cluster data storage. The current
implementation provides specific support only for the Network
Appliance (NetApp) Filer product. However, the fact that this feature is
designed using an extensible plug-in architecture will simplify support for
other network-attached storage in the future.
A feature that allows both private transport network traffic and public
network traffic to travel over the same physical network adapter, using a
technology called tagged VLANs, will enable future support for clustering
blade or blade-like hardware that can support only two physical network
adapters.
Note – Official support for Infiniband may post-date the first customer
ship of Sun Cluster 3.1 8/05. This is just a support and testing issue. The
Infiniband feature is already integrated into the cluster product.
Differences in Detail
The following sections provide details on the new features of Sun Cluster
3.1 Update 4 software.
On Solaris 8 and 9 OS, the Sun Cluster 3.1 Update 4 framework daemons
are launched from boot scripts. On Solaris 10 OS, they are registered as
SMF services. While SMF continues to support legacy boot scripts (SMF
essentially emulates the earlier /sbin/rc2 and /sbin/rc3 functionality if
such boot scripts still exist), the proper way to run services in Solaris 10
OS is to register them as SMF objects. Therefore, the decision was made to
run the cluster framework daemons in the so-called ‘correct’ Solaris 10 OS
fashion. The exception is the Network Time Protocol (NTP) configuration
that uses the cluster private node names. The reason for this is that the
standard NTP is already registered as an SMF service. If you manually
configure and enable the standard Solaris 10 OS SMF (to use clock
synchronization off an external source, for example) before you run
scinstall, then the cluster legacy NTP script will not be installed.
Note that the service is automatically enabled and that the listed
dependencies are merely startup dependencies. This ensures that the
dependees (pnm, rpc-fed, and rpc-pmf) are started before rgm is started.
You do not need to specify actual restart dependencies, because the
daemons are protected by failfasts as they have been in previous updates
of the Sun Cluster 3.1 software.
Finally, the service definition points to the actual scripts used to start and
stop the cluster daemon, as follows:
<exec_method name='start' type='method'
exec='/usr/cluster/lib/svc/method/svc_rgm start'
timeout_seconds='18446744073709551615'>
<method_context/>
</exec_method>
<exec_method name='stop' type='method'
exec='/usr/cluster/lib/svc/method/svc_rgm stop'
timeout_seconds='18446744073709551615'>
<method_context/>
</exec_method>
Actual cluster applications being started by data service agents are not
likely to be placed under control of SMF because this would make it
difficult for SMF to make failover applications behave properly. While
SMF can support dependencies, SMF does not have any particular
knowledge about which node in a cluster is likely to be among the correct
primaries for a cluster application.
Even when a data service agent references software whose native Solaris
OS version is already registered in SMF, it is likely to ignore the SMF
versions of the services. The Network File System (NFS) agent, for
example, ignores the SMF-registered service by keeping it disabled even
when NFS is registered and enabled as a cluster service.
Sun Cluster 3.1 Update 4 does not provide full zone integration. A full
zone integration feature would allow you to essentially have clustered
Solaris 10 OS zones, with cluster framework software installed in the
zones, and cluster configuration and agents working together inside the
zones to provide resource failover and scalability in zones. There could be
clustered relationships between zones running on different physical
machines or even between different local zones running on the same
physical machine.
What Sun Cluster 3.1 Update 4 does provide is an agent that allows you to
boot and control cluster-unaware local zones, that live inside physical
machines (global zones) which are, of course, running the cluster
framework.
The official name of the agent is Sun Cluster Data Service for Solaris
Containers. A Solaris Container is just a zone that is managed with the
Solaris Resource Manager (SRM). The agent actually does not provide any
automated management of SRM.
If you choose to configure only whole root zones, that are really initialized
with an entire copy of the global zone’s OS, then you could use the
/etc/system entry unmodified.
The Sun Cluster 3.1 zones agent for Solaris 10 OS provides two different
models for booting and controlling zones.
The sczsh and sczsmf resource flavors are not required, and it is
perfectly legal to just configure your zone manually to run all the boot
scripts and SMF services that you like. However, by using the zone agent
resources you gain the following benefits:
● You can have a fault-monitoring component. You can provide a
custom fault probe for your resource, and its exit code can indicate a
desire to have the entire resource group fail over (exit code 201) or
restart (other non-zero exit code).
● You can have dependencies, even on resources that live in other
resource groups that can be online on a different node.
Setting up the resource group first is an easy way to control the failover
storage. Assume you have a filesystem on the shared storage created
ready to be the zonepath for the new zone. If using a volume manager, the
storage needs to be in a device group dedicated for this zone:
pecan:/# df -k
.
.
/dev/md/zoneds/dsk/d100 5160782 98161 5011014 2% /fro-zone
The zone is configured and installed from one node only. The only
configuration options given are the zonepath and an IP (not required, just
to show an example) controlled by the zone.
You must not set the autoboot parameter for the zone to true. The default
is false, which is correct, and which is why you do not see it mentioned
at all in the following correct example:
pecan:/# zonecfg -z fro-zone
fro-zone: No such zone configured
Use 'create' to begin configuring a new zone.
zonecfg:fro-zone> create
zonecfg:fro-zone> set zonepath=/fro-zone
zonecfg:fro-zone> add net
zonecfg:fro-zone:net> set address=192.168.1.189
zonecfg:fro-zone:net> set physical=qfe2
zonecfg:fro-zone:net> end
zonecfg:fro-zone> commit
zonecfg:fro-zone> exit
Boot the zone and connect to its console. You will be prompted to
configure the zone (it looks just like a standard Solaris OS that is booting
after a sys-unconfig:
pecan:/etc/zones# zoneadm -z fro-zone boot
pecan:/etc/zones# zlogin -C fro-zone
[Connected to zone 'fro-zone' console]
Log in. You might want to do any other configuration of the zone at this
time, such as commenting out the CONSOLE=/dev/console line in
/etc/default/login.
Note that in the example the zone already has the IP address that you
configured with zonecfg:
fro-zone console login: root
Password:
May 12 16:15:16 fro-zone login: ROOT LOGIN /dev/console
Last login: Thu May 12 10:34:13 from 192.168.1.39
Sun Microsystems Inc. SunOS 5.10 Generic January 2005
# ifconfig -a
lo0:1: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
8232 index 1 inet 127.0.0.1 netmask ff000000
qfe2:1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index
3 inet 192.168.1.189 netmask ffffff00 broadcast 192.168.1.255
Make sure you can boot the zone on each intended failover node:
pecan# zoneadm -z fro-zone halt
pecan# scswitch -z -g fro-zone-rg -h grape
There is a parameter file that needs persist and is read every time the
resource is stopped, started, or validated. Note that in the sczbt_config
file you specify a directory name for the parameter file. This directory could
be part of a global file system, or it could be a local directory available on
each node. If you happen to have a global file system available, it is most
convenient to place the parameter file directory there. If not, you will have
to manually copy the parameter file that is created by sczbt_register to
other nodes.
#
# The following variable will be placed in the parameter file
#
# Parameters for sczbt (Zone Boot)
#
Zonename=fro-zone
Zonebootopt=
Milestone=multi-user-server
Now enabling the zone boot resource instance will automatically boot the
zone on the node where the failover group is primary. It will also
automatically place the IP address controlled by the
SUNW.LogicalHostname resource into the zone. This is demonstrated by
accessing the zone through this logical IP address.
Instances of each of these resource flavors must live in the same resource
group as a zone boot resource (sczbt). The configuration and registration
scripts provided by the zone agent specifically for these resources will
automatically place a restart dependency on the zone boot resource. Any
other dependencies, as mentioned above, are fully customizeable.
pecan# ./sczsh_register
pecan# cd /etc/zoneagentparams
pecan# rcp sczsh_frozone-myd-rs grape:/etc/zoneagentparams
This is a fairly simplistic example. The new resource will cause the
/marc/mydaemon to be launched in the zone after it is booted. Since the
probe command used by the fault monitor is just /usr/bin/pgrep
mydaemon >/dev/null, which will never return the value 201, failure of
the probe will always just cause a restart of the daemon rather than
failover of the entire zone.
In order to provide a script or SMF resource whose failure could cause the
whole resource group, and hence the whole zone, to fail over, you would
have to do the following:
1. Provide a fault probe that could return the result 201 in order to
suggest an entire zone failover.
2. Modify properties of the resource to allow it to failover as follows:
pecan:/# scrgadm -pvv -j frozone-myd-rs |grep -i 'failover.*value'
(frozone-rg:frozone-myd-rs:Failover_mode) Res property value: NONE
(frozone-rg:frozone-myd-rs:Failover_enabled) Res property value:
FALSE
pecan:/# scrgadm -c -j frozone-myd-rs -y Failover_mode=SOFT
pecan:/# scrgadm -c -j frozone-myd-rs -x Failover_enabled=TRUE
The new value for Failover_mode will cause resource group failover if
the script resource fails to start. The Failover_enabled property is an
extension property of the SUNW.gds resource type and will enable the
SUNW.gds fault monitor to request an entire resource group failover when
the fault probe returns value 201.
A NAS device provides file services to all of the cluster nodes through the
Network File System (NFS) or any other network file-sharing protocol.
File services can run between the cluster nodes and the network storage
server on a dedicated subnet or—less likely— on the same public subnet
providing access to the clients of the cluster services. In other words, file
traffic is supported on any network except those that make up the cluster
interconnect.
There are some specific requirements for using the NetApp Filer as a Sun
Cluster storage device. On the file server device itself:
● The filer must be a NetApp clustered filer. There is nothing that the
Sun Cluster software can do to enforce this, but it makes sense to
require high availability for a device which is going to provide file
services to your high-availability cluster.
You must use the scnas command to register the NAS device into the Sun
Cluster Configuration Repository (CCR). This is true whether you want to
use the NAS storage device as a quorum device or not. The important
information being recorded is the NAS identity (IP address), login name,
and password which are used for failure fencing.
You must also use the scnasdir command to register on the NAS device
the specific directories that are being used to serve cluster data. The Sun
Cluster client implementation is then able to perform data fencing on
these specific directories. In the NetApp Filer implementation, data
fencing is accomplished by removing the name of a node from the
NetApp Filer exports list as it is being fenced out of the cluster.
Registering a NAS device with the scnas command looks like the
following:
# scnas -a -h netapps25 -t netapp -o userid=root
Please enter password:
Registering the specific NAS directories for failure fencing looks like the
following:
# scnasdir -r -h netapps25 -d /vol/vol_01_03
# scnasdir -r -h netapps25 -d /vol/vol_01_04
You can verify the configuration of the NAS device into the CCR using the
-p option to the commands as in the following example:
# scnas -p
# scnasdir -p
The stored password is not shown. It is stored in the CCR with a very
basic encryption scheme known as ROT-13 (alphabetic characters shifted
13 places).
The NAS quorum architecture, like the rest of the NAS architecture, is a
general architecture with specific support in Sun Cluster Update 4
software only for the NetApp Filer. On the NetApp Filer side, the
requirements for operation as a Sun Cluster quorum device are as follows:
● You must install the iSCSI license from your NAS device vendor.
● You must configure an iSCSI Logical Unit (LUN) for use as the
quorum device.
● When booting the cluster, you must always boot the NAS device
before you boot the cluster nodes.
If you want to use NAS storage as the quorum device, it can not be
configured by the new scinstall feature that automatically selects a
quorum device (see ‘‘Auto-Configuration of the Quorum Device’’ on page
25). Instead, you use new options to the scconf command, which are also
instrumented into the scsetup utility.
q)
The NAS quorum device must be setup before configuring it with Sun
Cluster. For more information on setting up Netapp NAS filer,
creating the device, and installing the license and the Netapp
binaries, see the Sun Cluster documentation.
What name do you want to use for this quorum device? netapps
scconf -a -q name=netapps,type=netapp_nas,filer=netapps25,lun_id=0
# scstat -q
-- Quorum Summary --
For this specific application, you should use the following mount options
in the /etc/vfstab file of the cluster nodes:
● forcedirectio
● noac
● proto=tcp
There are no other NAS-specific instructions for using NAS devices with
RAC. You simply choose filesystem storage as you build your RAC
database, and you point it to the directories where your NAS directories
are mounted on the cluster nodes.
While internally the scinstall command still has the ability to do the
Sun Cluster framework pkgadd’s, this will not be a supported mechanism
for installing the cluster. The only supported mechanism is to use the Java
ES installer.
The Java ES installer does not have the ability to actually configure the
cluster. This will still be done through scinstall. You will get an error if
you try to use the Configure Now option of the Java ES installer as you
install the Sun cluster framework.
The interactive procedure that allows you to configure the Sun Cluster
software one node at a time now supports a Typical configuration. This
is very similar to the Typical configuration that has been supported with
the procedure that allows you to configure the entire cluster from one
node since Sun Cluster 3.1 Update 1.
For all four modes of configuring the cluster (one-at-a-time and all-at-
once, with custom and typical options for each of these), the scinstall
utility asks you if you want to disable auto-configuration of the quorum
device. The default is to use the autoconfiguration mode. You must
disable the quorum auto-configuration if you want to use a NAS device as
the quorum (see ‘‘Using a NAS device as a Quorum Device’’ on page 19)
or if your shared device —the device which will be assigned the lowest
number DID—cannot be supported as a quorum device. The dialog looks
like the following:
The only time that you must disable this feature is when ANY of the
shared storage in your cluster is not qualified for use as a Sun
Cluster quorum device. If your storage was purchased with your
cluster, it is qualified. Otherwise, check with your storage vendor
to determine whether your storage device is supported as Sun Cluster
quorum device.
In Solaris 10 OS, as the last node boots into the cluster, you get the login
prompt on the last node booting into the cluster before the quorum auto-
configuration runs. This is because the boot environment is controlled by
the SMF of Solaris 10 OS, which runs boot services in parallel and gives
you the login prompt before many of the services are complete. The auto-
configuration of the quorum device does not complete until a minute or
so later. You should have time to log in on the last node, run the scstat -
q command, notice you still had no quorum device and still see the
installmode flag set. Do not attempt to configure the quorum device by
hand, as the auto-configuration will eventually run to completion.
Updating "/etc/hostname.qfe1".
Note that this is the same singleton IPMP group that would be
automatically created if you added an instance of
SUNW.LogicalHostname or SUNW.SharedAddress onto an adapter that
was not part of an IPMP group. The functionality has now been moved
up into scinstall, although it remains in the validation methods of the
virtual IP resources as well, just in case you have adapters (perhaps these
are new adapters) that are not configured with IPMP when you add the
resources.
Using this feature, such servers could use each physical adapter both as a
single transport adapter and a single public network adapter. For the
public network part, this adapter could be in the same IPMP group as a
separate physical adapter, so that it would be possible for public network
IP addresses to failover between the two.
The only adapters that support the tagged VLAN device driver and that
are also supported in the Sun Cluster environment are the Cassini
Ethernet (ce) adapters and the Broadcom Gigabit Ethernet (bge) adapters.
Thus the shared physical interconnects feature is only available with those
adapters.
The network adapters that are capable of tagged VLANs in the Solaris OS
use a VLAN-related instance number. For example, if physical adapter
ce1 is configured with tagged VLAN-ID 22, it will use instance
ce22001— that is, 1000 times the VLAN-ID, plus the normal instance
number.
The network adapters supported with tagged VLANs in the Sun Cluster
environment also support a related standard which allows an adapter to
specify prioritization of network traffic, from a lowest priority of 0 to a
highest priority of 7.
gabi:/# ifconfig -a
lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu
8232 index 1
inet 127.0.0.1 netmask ff000000
bge2000: flags=201000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS> mtu
1500 index 2
inet 192.168.1.21 netmask ffffff00 broadcast 192.168.1.255
groupname therapy
ether 0:9:3d:0:a7:12
bge2000:1:flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,
NOFAILOVER,CoS> mtu 1500 index 2
inet 192.168.1.121 netmask ffffff00 broadcast 192.168.1.255
ce2000:flags=209040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOF
AILOVER,CoS> mtu 1500 index 3
inet 192.168.1.221 netmask ffffff00 broadcast 192.168.1.255
groupname therapy
ether 0:3:ba:e:bd:e0
The interactive install (all four variations, including custom, typical, the
one-at-a-time, and the all-at-once methods) detects if you choose a VLAN-
capable private network adapter from the transport adapters menu. If
VLAN is already enabled on this physical adapter for the public network,
you are required to enter a VLAN-ID for the transport; otherwise you will
be asked if you want to.
The following example shows that since the public network is already
VLAN-capable, the adapters menu shows the same physical adapters
already used by the public network:
>>> Cluster Transport Adapters and Cables <<<
You must configure at least two cluster transport adapters for each
node in the cluster. These are the adapters which attach to the
private cluster interconnect.
1) bge0
2) bge1
3) ce0
4) ce1
5) Other
Option: 1
This adapter is used on the public network also, you will need to
configure it as a tagged VLAN adapter for cluster transport.
The autodiscovery method for transport adapter entry (on all but the first
node or all but the node you are driving from in the all-at-once method)
will be able to auto-detect the correct VLAN-ID to use on VLAN-capable
adapters. This is shown in the following example:
Probing .......
The scconf options that allow you to add private network adapters can
be used either with a VLAN-enabled instance number, or with a new
property named vlan_id. In other words, these are equivalent, valid
commands.
scconf -a -A trtype=dlpi,node=phys-node-1,name=ce1001
or
scconf -a -A trtype=dlpi,node=phys-node-1,name=ce1,vlan_id=1
scconf -a -A trtype=dlpi,name=bge1,node=gabi,vlanid=5
scconf -a -m endpoint=gabi:bge5001,endpoint=switch1
The scstat command shows the virtual adapter names, for example,
ce5000, for a private network adapter using the VLAN feature.
dani:/# scstat -W
The Sun Cluster framework will not use uDAPL for any framework
transport activity, such as heartbeats. In this respect, Infiniband will be a
purely TCP/IP transport—at this time. However, uDAPL is available for
application usage, and Sun Cluster will support an ORACLE RAC
implementation using this API, when it exists.
On Solaris 8 and 9 OS, Sun Cluster 3.1 Update 4 continues to support both
the 3.5 and 4.0 versions of VxVM and VxFS. However, the 4.1 version is
recommended for new installations since VERITAS has already declared
end-of-life for 3.5 and may do so soon for 4.0.
Product Documentation
Table 1 provides a list of related documentation. Refer to the
documentation for detailed product information:
Downloadable Versions of
Title Part Number
Documentation
Further Training
There will not be any additional support readiness training (SRT) for the
Sun Cluster 3.1 Update 4 software framework. Because this is an
incremental release, not enough has changed to warrant a dedicated SRT.
The purpose of this document is to provides support personnel with
enough information to support the product.