Sun Cluster Cheatsheet

Sun Cluster 3.
2 - Cheat Sheet
http://www.datadisk.co.uk/html_docs/sun/sun_cluster_cs.htm
Sun Cluster Cheat Sheet

This cheatsheet contains common commands and information for both Sun Cluster 3.1 and 3.2, there is some missing information and over time I hope to complete this i.e zones, NAS devices, etc Also both versions of Cluster have a text based GUI tool, so don't be afraid to use this, especially if the task is a simple one scsetup (3.1) clsetup (3.2) Also all the commands in version 3.1 are available to version 3.2 Daemons and Processes At the bottom of the installation guide I listed the daemons and processing running after a fresh install, now is the time explain what these processes do, I have managed to obtain informtion on most of them but still looking for others.
Versions 3.1 and 3.2 clexecd
This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds. This daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped. The cluster event daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed. cluster event log daemon logs cluster events into a binary log file. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped. This daemon is the failfast proxy server.The failfast daemon allows the kernel to panic if certain essential daemons have failed The resource group management daemon which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds. This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specific data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds. This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and10 OS). A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds. Public managment network service daemon manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped. Disk path monitoring daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped.
cl_ccrad cl_eventd cl_eventlogd failfastd rgmd rpc.fed
rpc.pmfd
pnmd
scdpmd
Multi-threaded DPM daemon runs on each node. It is automatically started by an rc script when a node boots. It monitors the availibility of logical path that is visiable through various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies.
Version 3.2 only qd_userd cl_execd ifconfig_proxy_serverd rtreg_proxy_serverd cl_pnmd scprivipd sc_zonesd cznetd rpc.fed scqdmd pnm mod serverd
is a daemon for the public network management (PMN) module. It is started at boot time and starts the PMN service. It keeps track of the local host's IPMP state and facilities inter-node failover for all IPMP groups. This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones. This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting failure It is used for reconfiguring and plumbing the private IP address in a local zone after virtual cluster is created, also see the cznetd.xml file. This is the "fork and exec" daemin which handles requests from rgmd to spawn methods for specfic data services. Failfast will hose the box if this is killed and not restarted in 30 seconds The quorum server daemon, this possibly use to be called "scqsd" This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland i.e a NAS quorum device
File locations
Both Versions (3.1 and 3.2) man pages log files Configuration files (CCR, eventlog, etc) Cluster and other commands sccheck logs Cluster infrastructure file sccheck logs Cluster infrastructure file Command Log
/usr/cluster/man /var/cluster/logs /var/adm/messages /etc/cluster/ /usr/cluser/lib/sc
Version 3.1 Only

/var/cluster/sccheck/report.<date> /etc/cluster/ccr/infrastructure
Version 3.2 Only

/var/cluster/logs/cluster_check/remote.<date> /etc/cluster/ccr/global/infrastructure /var/cluster/logs/commandlog
SCSI Reservations
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inkeys -d /dev/did/rdsk/d4s2
Display reservation keys

scsi3: /usr/cluster/lib/sc/scsi -c inkeys -d /dev/did/rdsk/d4s2
scsi2: /usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2
determine the device owner

scsi3: /usr/cluster/lib/sc/scsi -c inresv -d /dev/did/rdsk/d4s2
1 of 5
10/10/2011 5:28 PM
Sun Cluster 3.2 - Cheat Sheet
Command shortcuts In version 3.2 there are number of shortcut command names which I have detailed below, I have left the full command name in the rest of the document so it is obvious what we are performing, all the commands are located in /usr/cluster/bin
shortcut cldevice cldevicegroup clinterconnect clnasdevice clquorum clresource clresourcegroup clreslogicalhostname clresourcetype clressharedaddress
cldev cldg clintr clnas clq clrs clrg clrslh clrt clrssa
Shutting down and Booting a Cluster

3.1
##other nodes in cluster scswitch -S -h <host> shutdown -i5 -g0 -y ## Last remaining node scshutdown -g0 -y
3.2
cluster shutdown -g0 -y
shutdown entire cluster
shutdown single node reboot a node into non-cluster mode
scswitch -S -h <host> shutdown -i5 -g0 -y ok> boot -x
clnode evacuate <node> shutdown -i5 -g0 -y ok> boot -x
Cluster information
3.1 Cluster
scstat -pv cluster list -v cluster show cluster status clnode list -v clnode show clnode status cldevice list cldevice show cldevice status clquorum list -v clquorum show clqorum status clinterconnect show clinterconnect status clresource list -v clresource show clresource status clresourcegroup list -v clresourcegroup show clresourcegroup status clresourcetype list -v clresourcetype list-props -v clresourcetype show scstat i scinstall pv clnode status -m clnode show-rev -v
3.2
Nodes
scstat n
Devices
scstat D
Quorum Transport info Resources
scstat q
scstat W
scstat g
Resource Groups
scsat -g scrgadm -pv
Resource Types IP Networking Multipathing Installation info (prints packages and version)
Cluster Configuration
3.1 Release Integrity check Configure the cluster (add nodes, add data services, etc) Cluster configuration utility (quorum, data sevices, resource groups, etc) Rename Set a property
sccheck scinstall scsetup cat /etc/cluster/release cluster check scinstall clsetup cluster rename -c <cluster_name> cluster set -p <name>=<value> ## List cluster commands cluster list-cmds ## Display the name of the cluster cluster list
3.2
List
## List the checks cluster list-checks ## Detailed configuration cluster show -t global
Status Reset the cluster private network settings Place the cluster into install mode Add a node Remove a node Prevent new nodes from entering
scconf a T node=<host><host> scconf r T node=<host><host> scconf a T node=. scconf -c -q node=<node>,maintstate
cluster status cluster restore-netprops <cluster_name> cluster set -p installmode=enabled clnode add -c <clustername> -n <nodename> -e endpoint1,endpoint2 -e endpoing3,endpoint4 clnode remove
Put a node into maintenance state
Note: use the scstat -q command to verify that the node is in maintenance mode, the vote count should be zero for that node.
Node Configuration
scconf -c -q node=<node>,reset Note: use the scstat -q command to verify that the node is in maintenance mode, the 3.1 vote count should be one for that node.
Get a node out of maintenance state
3.2
2 of 5
10/10/2011 5:28 PM
Add a node to the cluster
clnode add [-c <cluster>] [-n <sponsornode>] \ -e <endpoint> \ -e <endpoint> <node> ## Make sure you are on the node you wish to remove clnode remove scswitch -S -h <node> clnode evacuate <node> clnode clear <node>
Remove a node from the cluster Evacuate a node from the cluster Cleanup the cluster configuration (used after removing nodes)
## Standard list clnode list [+|<node>]
List nodes
## Destailed list clnode show [+|<node>]
Change a nodes property Status of nodes
clnode set -p <name>=<value> [+|<node>] clnode status [+|<node>]
Admin Quorum Device Quorum devices are nodes and disk devices, so the total quorum will be all nodes and devices added together. You can use the scsetup(3.1)/clsetup(3.2) interface to add/remove quorum devices or use the below commands.
3.1
scconf a q globaldev=d11
3.2
Adding a SCSI device to the quorum
Note: if you get the error message "uable to scrub device" use scgdevs to add device to the global device namespace. n/a n/a scconf r q globaldev=d11 ## Evacuate all nodes ## Put cluster into maint mode scconf c q installmode
clquorum add [-t <type>] [-p <name>=<value>] [+|<devicename>]
Adding a NAS device to the quorum Adding a Quorum Server Removing a device to the quorum
clquorum add -t netapp_nas -p filer=<nasdevice>,lun_id=<IDnumdevice nasdevi clquorum add -t quorumserver -p qshost<IPaddress>,port=<portnumber> <quorum clquorum remove [-t <type>] [+|<devicename>] ## Place the cluster in install mode cluster set -p installmode=enabled ## Remove the quorum device clquorum remove <device> ## Verify the device has been removed clquorum list -v
Remove the last quorum device
## Remove the quorum device scconf r q globaldev=d11 ## Check the quorum devices scstat q
## Standard list clquorum list -v [-t <type>] [-n <node>] [+|<devicename>]
List
## Detailed list clquorum show [-t <type>] [-n <node>] [+|<devicename>] ## Status clquorum status [-t <type>] [-n <node>] [+|<devicename>]
scconf c q reset
Resetting quorum info

Note: this will bring all offline quorum devices online
clquorum reset
Bring a quorum device into maintenance mode scdidadm L (3.2 known as enabled) scconf c q globaldev=<device>,maintstate Bring a quorum device out of maintenance mode (3.2 known as disabled)
scconf c q globaldev=<device><device>,reset
## Obtain the device number
clquorum enable [-t <type>] [+|<devicename>]
clquorum disable [-t <type>] [+|<devicename>]
Device Configuration
3.1 Check device Remove all devices from node
cldevice check [-n <node>] [+] cldevice clear [-n <node>] ## Turn on monitoring cldevice monitor [-n <node>] [+|<device>]
3.2
Monitoring
## Turn off monitoring cldevice unmonitor [-n <node>] [+|<device>]
Rename Replicate Set properties of a device
cldevice rename -d <destination_device_name> cldevice replicate [-S <source-node>] -D <destination-node> [+] cldevice set -p default_fencing={global|pathcount|scsi3} [-n <node>] <devic ## Standard display cldevice status [-s <state>] [-n <node>] [+|<device>]
Status
## Display failed disk paths cldevice status -s fail ## Standard List cldevice list [-n <node>] [+|<device>] ## Detailed list cldevice show [-n <node>] [+|<device>] see above cldevice populate cldevice refresh [-n <node>] [+] cldevice repair [-n <node>] [+|<device>]
Lists all the configured devices including paths scdidadm L across all nodes. List all the configured devices including paths scdidadm l on node only. Reconfigure the device database, creating new instances numbers if required. Perform the repair procedure for a particular path (use then when a disk gets replaced)
scdidadm r scdidadm R <c0t0d0s0> - device scdidadm R 2 - device id
Disks group
3.1 Create a device group Remove a device group Adding Removing Set a property
n/a n/a scconf -a -D type=vxvm,name=appdg,nodelist=<host>:<host>,preferenced=true scconf r D name=<disk group>
3.2
cldevicegroup create -t vxvm -n <node-list> -p failback= cldevicegroup delete <devgrp> cldevicegroup add-device -d <device> <devgrp> cldevicegroup remove-device -d <device> <devgrp> cldevicegroup set [-p <name>=<value>] [+|<devgrp>]
3 of 5
10/10/2011 5:28 PM
## Standard list cldevicegroup list [-n <node>] [-t <type>] [+|<devgrp>]
List
scstat ## Detailed configuration report cldevicegroup show [-n <node>] [-t <type>] [+|<devgrp>]
status adding single node Removing single node Switch Put into maintenance mode take out of maintenance mode onlining a disk group offlining a disk group Resync a disk group
scstat scconf -a -D type=vxvm,name=appdg,nodelist=<host> scconf r D name=<disk group>,nodelist=<host> scswitch z D <disk group> -h <host> scswitch m D <disk group> scswitch -z -D <disk group> -h <host> scswitch -z -D <disk group> -h <host> scswitch -F -D <disk group> scconf -c -D name=appdg,sync
cldevicegroup status [-n <node>] [-t <type>] [+|<devgrp> cldevicegroup add-node [-n <node>] [-t <type>] [+|<devgr cldevicegroup remove-node [-n <node>] [-t <type>] [+|<de cldevicegroup switch -n <nodename> <devgrp> n/a n/a cldevicegroup online <devgrp> cldevicegroup offline <devgrp> cldevicegroup syn [-t <type>] [+|<devgrp>]
Transport Cable
3.1 Add Remove Enable Disable
Note: it gets deleted scconf c m endpoint=<host>:qfe1,state=enabled scconf c m endpoint=<host>:qfe1,state=disabled clinterconnect disable [-n <node>] [+|<endpoint>,<endpoint>] ## Standard and detailed list clinterconnect show [-n <node>][+|<endpoint>,<endpoint>] clinterconnect status [-n <node>][+|<endpoint>,<endpoint>] clinterconnect add <endpoint>,<endpoint> clinterconnect remove <endpoint>,<endpoint> clinterconnect enable [-n <node>] [+|<endpoint>,<endpoint>]
3.2
List Status
scstat scstat
Resource Groups
3.1 Adding (failover) Adding (scalable) Adding a node to a resource group
scrgadm -a -g <res_group> -h <host>,<host> clresourcegroup create <res_group> clresourcegroup create -S <res_group> clresourcegroup add-node -n <node> <res_group> ## Remove a resource group clresourcegroup delete <res_group>
3.2
Removing
scrgadm r g <group> ## Remove a resource group and all its resources clresourcegroup delete -F <res_group>
Removing a node from a resource group changing properties Status Listing Detailed List Display mode type (failover or scalable)
scrgadm -c -g <resource group> -y <propety=value> scstat -g scstat g scrgadm pv g <res_group> scrgadm -pv -g <res_group> | grep 'Res Group mode'
clresourcegroup remove-node -n <node> <res_group> clresourcegroup set -p Failback=true + <name=value> clresourcegroup status [-n <node>][-r <resource][-s <state>][-t <resourcety clresourcegroup list [-n <node>][-r <resource][-s <state>][-t <resourcetype clresourcegroup show [-n <node>][-r <resource][-s <state>][-t <resourcetype
## All resource groups clresourcegroup offline +
Offlining
scswitch F g <res_group>
## Individual group clresourcegroup offline [-n <node>] <res_group> clresourcegroup evacuate [+|-n <node>]
## All resource groups clresourcegroup online +
Onlining
scswitch -Z -g <res_group> ## Individual groups clresourcegroup online [-n <node>] <res_group>
Evacuate all resource groups from a node (used when shutting down a node)
scswitch u g <res_group>
clresourcegroup evacuate [+|-n <node>]
Unmanaging
Note: (all resources in group must be disabled)
clresourcegroup unmanage <res_group>
Managing Switching Suspend Resume Remaster (move the resource group/s to their preferred node) Restart a resource group (bring offline then online)
scswitch o g <res_group> scswitch z g <res_group> h <host> n/a n/a n/a n/a
clresourcegroup manage <res_group> clresourcegroup switch -n <node> <res_group> clresourcegroup suspend [+|<res_group>] clresourcegroup resume [+|<res_group>] clresourcegroup remaster [+|<res_group>] clresourcegroup restart [-n <node>] [+|<res_group>]
Resources
3.1 Adding failover network resource Adding shared network resource adding a failover apache application and attaching the network resource adding a shared apache application and attaching the network resource
scrgadm a L g <res_group> -l <logicalhost> scrgadm a S g <res_group> -l <logicalhost> scrgadm a j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=False y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin scrgadm a j apache_res -g <res_group> \ -t SUNW.apache -y Network_resources_used = <logicalhost> -y Scalable=True y Port_list = 80/tcp \ -x Bin_dir = /usr/apache/bin scrgadm -a -g rg_oracle -j hasp_data01 -t SUNW.HAStoragePlus \ > -x FileSystemMountPoints=/oracle/data01 \ > -x Affinityon=true scrgadm r j res-ip clresource create -t HAStorage -g <res_group> \ -p FilesystemMountPoints=<mount-point-list> \ -p Affinityon=true <rs-hasp>
3.2
clreslogicalhostname create -g <res_group> <lh-resource> clressharedaddress create -t -g <res_group> <sa-resource>
Create a HAStoragePlus failover resource
Removing
Note: must disable the resource first
clresource delete [-g <res_group>][-t <resourcetype>][+|<resourc
4 of 5
10/10/2011 5:28 PM
## Changing clresource set -t <type> -p <name>=<value> +
changing or adding properties
scrgadm -c -j <resource> -y <property=value> ## Adding clresource set -p <name>+=<value> <resource> clresource list [-g <res_group>][-t <resourcetype>][+|<resource>
List
scstat -g
## List properties clresource list-props [-g <res_group>][-t <resourcetype>][+|<res clresurce show [-n <node>] [-g <res_group>][-t <resourcetype>][+ clresource status [-s <state>][-n <node>] [-g <res_group>][-t <r clresource monitor [-n <node>] [-g <res_group>][-t <resourcetype clresource unmonitor [-n <node>] [-g <res_group>][-t <resourcety clresource disable <resource> clresource enable <resource> clresource clear -f STOP_FAILED <resource>
Detailed List Status Disable resoure monitor Enable resource monitor Disabling Enabling Clearing a failed resource Find the network of a resource
scrgadm pv j res-ip scrgadm pvv j res-ip scstat -g scrgadm n M j res-ip scrgadm e M j res-ip scswitch n j res-ip scswitch e j res-ip scswitch c h<host>,<host> -j <resource> -f STOP_FAILED scrgadm pvv j <resource> | grep I network ## offline the group scswitch F g rgroup-1
## offline the group clresourcegroup offline <res_group> ## remove the resource clresource [-g <res_group>][-t <resourcetype>][+|<resource>] ## remove the resource group clresourcegroup delete <res_group>
Removing a resource and resource group
## remove the resource scrgadm r j res-ip ## remove the resource group scrgadm r g rgroup-1
Resource Types
3.1 Adding (register in 3.2) Register a resource type to a node Deleting (remove in 3.2) Deregistering a resource type from a node Listing Listing resource type properties Show resource types Set properties of a resource type
scrgadm a t <resource type> n/a scrgadm r t <resource type> n/a scrgadm pv | grep Res Type name i.e SUNW.HAStoragePlus clresourcetype register <type> clresourcetype add-node -n <node> <type> clresourcetype unregister <type> clresourcetype remove-node -n <node> <type> clresourcetype list [<type>] clresourcetype list-props [<type>] clresourcetype show [<type>] clresourcetype set [-p <name>=<value>] <type>
3.2
5 of 5
10/10/2011 5:28 PM

Sun Cluster Cheatsheet

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Sun Cluster Cheatsheet

Transféré par

Droits d'auteur :

Formats disponibles

Sun Cluster 3.

Sun Cluster Cheat Sheet

cl_ccrad cl_eventd cl_eventlogd failfastd rgmd rpc.fed

Version 3.1 Only

Version 3.2 Only

Display reservation keys

scsi2: /usr/cluster/lib/sc/pgre -c pgre_inresv -d /dev/did/rdsk/d4s2

determine the device owner

Sun Cluster 3.2 - Cheat Sheet

Shutting down and Booting a Cluster

shutdown entire cluster

shutdown single node reboot a node into non-cluster mode

scswitch -S -h <host> shutdown -i5 -g0 -y ok> boot -x

clnode evacuate <node> shutdown -i5 -g0 -y ok> boot -x

Quorum Transport info Resources

scsat -g scrgadm -pv

Put a node into maintenance state

Get a node out of maintenance state

Sun Cluster 3.2 - Cheat Sheet

Add a node to the cluster

## Standard list clnode list [+|<node>]

Change a nodes property Status of nodes

clnode set -p <name>=<value> [+|<node>] clnode status [+|<node>]

Adding a SCSI device to the quorum

clquorum add [-t <type>] [-p <name>=<value>] [+|<devicename>]

Remove the last quorum device

## Standard list clquorum list -v [-t <type>] [-n <node>] [+|<devicename>]

Resetting quorum info

## Obtain the device number

clquorum enable [-t <type>] [+|<devicename>]

clquorum disable [-t <type>] [+|<devicename>]

Rename Replicate Set properties of a device

Sun Cluster 3.2 - Cheat Sheet

## Standard list cldevicegroup list [-n <node>] [-t <type>] [+|<devgrp>]

## All resource groups clresourcegroup offline +

## All resource groups clresourcegroup online +

scswitch -Z -g <res_group> ## Individual groups clresourcegroup online [-n <node>] <res_group>

clresourcegroup evacuate [+|-n <node>]

clresourcegroup unmanage <res_group>

scswitch o g <res_group> scswitch z g <res_group> h <host> n/a n/a n/a n/a

Create a HAStoragePlus failover resource

clresource delete [-g <res_group>][-t <resourcetype>][+|<resourc

Sun Cluster 3.2 - Cheat Sheet

## Changing clresource set -t <type> -p <name>=<value> +

changing or adding properties

Removing a resource and resource group

Vous aimerez peut-être aussi