Académique Documents
Professionnel Documents
Culture Documents
Student Guide
Sun Microsystems, Inc. UBRM05-104 500 Eldorado Blvd. Broomeld, CO 80021 U.S.A. Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
December 14, 2006 2:05 pm
Copyright 2007 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Sun, Sun Microsystems, the Sun logo, Sun Fire, Solaris, Sun Enterprise, Solstice DiskSuite, JumpStart, Sun StorEdge, SunPlex, SunSolve, Java, and OpenBoot are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Xylogics product is protected by copyright and licensed to Sun by Xylogics. Xylogics and Annex are trademarks of Xylogics, Inc. Federal Acquisitions: Commercial Software Government Users Subject to Standard License Terms and Conditions Export Laws. Products, Services, and technical data delivered by Sun may be subject to U.S. export controls or the trade laws of other countries. You will comply with all such laws and obtain all licenses to export, re-export, or import as may be required after delivery to You. You will not export or re-export to entities on the most current U.S. export exclusions lists or to any country subject to U.S. embargo or terrorist controls as specified in the U.S. export laws. You will not use or provide Products, Services, or technical data for nuclear, missile, or chemical biological weaponry end uses. DOCUMENTATION IS PROVIDED AS IS AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS, AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. THIS MANUAL IS DESIGNED TO SUPPORT AN INSTRUCTOR-LED TRAINING (ILT) COURSE AND IS INTENDED TO BE USED FOR REFERENCE PURPOSES IN CONJUNCTION WITH THE ILT COURSE. THE MANUAL IS NOT A STANDALONE TRAINING TOOL. USE OF THE MANUAL FOR SELF-STUDY WITHOUT CLASS ATTENDANCE IS NOT RECOMMENDED. Export Commodity Classification Number (ECCN) assigned: 7 August 2006
Please Recycle
Copyright 2007 Sun Microsystems Inc. 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits rservs.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Ce produit ou document est protg par un copyright et distribu avec des licences qui en restreignent lutilisation, la copie, la distribution, et la dcompilation. Aucune partie de ce produit ou document ne peut tre reproduite sous aucune forme, par quelque moyen que ce soit, sans lautorisation pralable et crite de Sun et de ses bailleurs de licence, sil y en a. Le logiciel dtenu par des tiers, et qui comprend la technologie relative aux polices de caractres, est protg par un copyright et licenci par des fournisseurs de Sun. Sun, Sun Microsystems, le logo Sun, Sun Fire, Solaris, Sun Enterprise, Solstice DiskSuite, JumpStart, Sun StorEdge, SunPlex, SunSolve, Java, et OpenBoot sont des marques de fabrique ou des marques dposes de Sun Microsystems, Inc. aux Etats-Unis et dans dautres pays. Toutes les marques SPARC sont utilises sous licence sont des marques de fabrique ou des marques dposes de SPARC International, Inc. aux Etats-Unis et dans dautres pays. Les produits portant les marques SPARC sont bass sur une architecture dveloppe par Sun Microsystems, Inc. UNIX est une marques dpose aux Etats-Unis et dans dautres pays et licencie exclusivement par X/Open Company, Ltd. Lgislation en matire dexportations. Les Produits, Services et donnes techniques livrs par Sun peuvent tre soumis aux contrles amricains sur les exportations, ou la lgislation commerciale dautres pays. Nous nous conformerons lensemble de ces textes et nous obtiendrons toutes licences dexportation, de r-exportation ou dimportation susceptibles dtre requises aprs livraison Vous. Vous nexporterez, ni ne r-exporterez en aucun cas des entits figurant sur les listes amricaines dinterdiction dexportation les plus courantes, ni vers un quelconque pays soumis embargo par les Etats-Unis, ou des contrles anti-terroristes, comme prvu par la lgislation amricaine en matire dexportations. Vous nutiliserez, ni ne fournirez les Produits, Services ou donnes techniques pour aucune utilisation finale lie aux armes nuclaires, chimiques ou biologiques ou aux missiles. LA DOCUMENTATION EST FOURNIE EN LETAT ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A LAPTITUDE A UNE UTILISATION PARTICULIERE OU A LABSENCE DE CONTREFAON. CE MANUEL DE RFRENCE DOIT TRE UTILIS DANS LE CADRE DUN COURS DE FORMATION DIRIG PAR UN INSTRUCTEUR (ILT). IL NE SAGIT PAS DUN OUTIL DE FORMATION INDPENDANT. NOUS VOUS DCONSEILLONS DE LUTILISER DANS LE CADRE DUNE AUTO-FORMATION.
Please Recycle
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Table of Contents
About This Course ...........................................................Preface-xxiii Course Goals...................................................................... Preface-xxiii Course Map.........................................................................Preface-xxiv Topics Not Covered............................................................Preface-xxv How Prepared Are You?...................................................Preface-xxvi Introductions .....................................................................Preface-xxvii How to Use Course Materials ....................................... Preface-xxviii Conventions ........................................................................Preface-xxix Icons ............................................................................Preface-xxix Introducing Sun Cluster Hardware and Software ......................1-1 Objectives ........................................................................................... 1-1 Relevance............................................................................................. 1-2 Additional Resources ........................................................................ 1-3 Defining Clustering ........................................................................... 1-4 High-Availability (HA) Platforms.......................................... 1-4 Platforms for Scalable Applications ....................................... 1-5 Sun Cluster 3.2 Hardware and Software Environment............... 1-6 Sun Cluster 3.2 Hardware Environment ........................................ 1-8 Cluster Host Systems................................................................ 1-9 Cluster Transport Interface...................................................... 1-9 Public Network Interfaces ..................................................... 1-10 Boot Disks................................................................................. 1-11 Administrative Workstation.................................................. 1-12 Cluster in a Box ....................................................................... 1-19 Sun Cluster 3.2 Software Support.................................................. 1-21 Software Revisions.................................................................. 1-22 Types of Applications in the Sun Cluster Software Environment ..................................................................................... 1-25 Cluster-Unaware (Off-the-Shelf) Applications................... 1-25 Sun Cluster 3.2 Software Data Service Support........................... 1-30 HA and Scalable Data Service Support ............................... 1-30 Exploring the Sun Cluster Software HA Framework................. 1-32 Node Fault Monitoring and Cluster Membership ............. 1-32
vii
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Network Fault Monitoring .................................................... 1-32 Application Traffic Striping................................................... 1-33 Global Storage Services ................................................................... 1-35 Global Naming (DID Devices) .............................................. 1-35 Global Devices ......................................................................... 1-36 Device Files for Global Devices............................................. 1-38 Global File Systems................................................................. 1-39 Failover (Non-Global) File Systems in the Cluster............. 1-40 Exercise: Guided Tour of the Training Lab .................................. 1-41 Preparation............................................................................... 1-41 Task ........................................................................................... 1-41 Exercise Summary............................................................................ 1-42 Exploring Node Console Connectivity and the Cluster Console Software............................................................................................ 2-1 Objectives ........................................................................................... 2-1 Relevance............................................................................................. 2-2 Additional Resources ........................................................................ 2-3 Accessing the Cluster Node Consoles ............................................ 2-4 Accessing Serial Port Consoles on Traditional Nodes......... 2-4 Accessing Serial Port Node Consoles Using a Terminal Concentrator ........................................................................... 2-5 Sun Terminal Concentrator (Sun NTS).................................. 2-6 Other Terminal Concentrators ................................................ 2-7 Alternatives to a Terminal Concentrator (TC)...................... 2-8 Accessing the Node Consoles on Domain-Based Servers............ 2-9 Sun Enterprise 10000 Servers .................................................. 2-9 Sun Fire High-End Servers ...................................................... 2-9 Sun Fire Midrange Servers ...................................................... 2-9 Describing Sun Cluster Console Software for an Administration Workstation .......................................................... 2-11 Console Software Installation ............................................... 2-11 Cluster Console Window Variations.................................... 2-11 Cluster Console Tools Look and Feel................................... 2-13 Cluster Console Common Window ..................................... 2-14 Cluster Control Panel ............................................................. 2-15 Configuring Cluster Console Tools............................................... 2-16 Configuring the /etc/clusters File .................................. 2-16 Exercise: Configuring the Administrative Console .................... 2-22 Preparation............................................................................... 2-22 Task 1 Updating Host Name Resolution.......................... 2-23 Task 2 Installing the Cluster Console Software............... 2-23 Task 3 Verifying the Administrative Console Environment ............................................................................ 2-24 Task 4 Configuring the /etc/clusters File................... 2-24 Task 5 Configuring the /etc/serialports File ............ 2-25
viii
Task 6 Starting the cconsole Tool .................................... 2-26 Task 7 Using the ccp Control Panel .................................. 2-26 Exercise Summary............................................................................ 2-27 Preparing for Installation and Understanding Quorum Devices..3-1 Objectives ........................................................................................... 3-1 Relevance............................................................................................. 3-2 Additional Resources ........................................................................ 3-3 Configuring Cluster Servers............................................................. 3-4 Boot Device Restrictions .......................................................... 3-4 Boot Disk JumpStart Software Profile................................ 3-5 Server Hardware Restrictions ................................................. 3-5 Configuring Cluster Storage Connections ..................................... 3-7 Cluster Topologies .................................................................... 3-7 Clustered Pairs Topology ........................................................ 3-8 Single-Node Cluster Topology ............................................. 3-14 Describing Quorum Votes and Quorum Devices ....................... 3-15 Why Have Quorum Voting at All?....................................... 3-15 Failure Fencing ........................................................................ 3-16 Amnesia Prevention ............................................................... 3-16 Quorum Mathematics and Consequences .......................... 3-17 Two-Node Cluster Quorum Devices ................................... 3-18 Clustered-Pair Quorum Disk Devices ................................. 3-19 Pair+N Quorum Disks ........................................................... 3-20 N+1 Quorum Disks................................................................. 3-21 Quorum Devices in the Scalable Storage Topology........... 3-22 Quorum Server Quorum Devices.................................................. 3-26 Preventing Cluster Amnesia With Persistent Reservations....... 3-28 SCSI-2 and SCSI-3 Reservations............................................ 3-31 SCSI-3 Persistent Group Reservation (PGR) ....................... 3-32 SCSI-3 PGR Scenario With More Than Two Nodes........... 3-32 NAS Quorum and Quorum Server Persistent Reservations............................................................................. 3-34 Intentional Reservation Delays for Partitions With Fewer Than Half of the Nodes ....................................................... 3-34 Data Fencing ..................................................................................... 3-36 Configuring a Cluster Interconnect............................................... 3-37 Point-to-Point Cluster Interconnect...................................... 3-37 Switch-based Cluster Interconnect....................................... 3-38 Cluster Transport Interface Addresses and Netmask ....... 3-38 Choosing the Cluster Transport Netmask Based on Anticipated Nodes and Private Subnets........................... 3-39 Identifying Public Network Adapters .......................................... 3-43 Configuring Shared Physical Adapters ........................................ 3-44 Configuring the Public Network .......................................... 3-44
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
ix
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Allocating a Different VLAN ID for the Private Network.................................................................................... 3-44 Exercise: Preparing for Installation ............................................... 3-45 Preparation............................................................................... 3-45 Task 1 Verifying the Solaris OS.......................................... 3-45 Task 3 Selecting Quorum Devices ..................................... 3-46 Task 5 Selecting Public Network Interfaces ..................... 3-48 Exercise Summary............................................................................ 3-50 Installing and Configuring the Sun Cluster Software Framework........................................................................................ 4-1 Objectives ........................................................................................... 4-1 Relevance............................................................................................. 4-2 Additional Resources ........................................................................ 4-3 Sun Cluster Software Installation and Configuration .................. 4-4 Introduction to Sun Cluster Package Installation ................ 4-4 Sun Cluster Packaging ............................................................. 4-4 Spooling Together CDROM 1of 2 and CDROM 2 of 2 ........ 4-5 Patches for the OS and for the Sun Cluster Software .......... 4-6 Installing the Sun Cluster Packages With the Java ES Installer ................................................................................................ 4-7 Prerequisites for Installing Sun Cluster Software ................ 4-7 Configuring the User root Environment............................ 4-14 Sun Cluster Framework Configuration ........................................ 4-15 Understanding the installmode Flag ................................ 4-15 Automatic Quorum Configuration (Two-Node Cluster Only) ........................................................................... 4-16 Automatic Reset of installmode Without Quorum Devices (Clusters With More Than Two Nodes Only)...... 4-17 Configuration Information Required to Run scinstall................................................................................ 4-17 Variations in Interactive scinstall.................................... 4-21 Configuring the Entire Cluster at Once ............................... 4-21 Typical Installation Compared to Custom Installation ..... 4-22 Configuring Using All-at-Once and Typical Modes: Example ............................................................................................. 4-23 All-at-Once Introduction and Choosing Typical Compared to Custom Installation ........................................ 4-24 Configuring Using One-at-a-Time and Custom Modes: Example (First Node) ...................................................................... 4-31 Configuring Additional Nodes for One-at-a-Time Method: Example .......................................................................................... 4-42 Solaris OS Files and Settings Automatically Configured by scinstall...................................................................................... 4-48 Changes to the /etc/hosts File........................................... 4-48 Changes to the /etc/nsswitch.conf File ......................... 4-48
Modifying /etc/hostname.xxx Files to Include IPMP .. 4-49 Modifying the /etc/vfstab File ........................................ 4-50 Inserting the /etc/notrouter File...................................... 4-50 Modifying the local-mac-address? EEPROM variable ..................................................................................... 4-50 Automatic Quorum Configuration and installmode Resetting ............................................................................................ 4-51 Manual Quorum Selection.............................................................. 4-52 Verifying DID Devices ........................................................... 4-52 Choosing Quorum and Resetting the installmode Attribute (Two-Node Cluster) .............................................. 4-53 Performing Post-Installation Verification..................................... 4-58 Verifying General Cluster Status .......................................... 4-58 Exercise: Installing the Sun Cluster Server Software.................. 4-63 Task 1 Verifying the Boot Disk .......................................... 4-63 Task 2 Verifying the Environment .................................... 4-64 Task 3 Updating Local Name Resolution......................... 4-64 Task 4 Installing the Sun Cluster Packages...................... 4-65 Task 5 Configuring a New Cluster The All-Nodes-at-Once Method ........................................... 4-66 Task 6 Configuring a New Cluster The One-Node-at-a-Time Method............................................. 4-67 Task 7 Verifying an Automatically Selected Quorum Device (Two-Node Cluster)................................................... 4-69 Task 8 Configuring a Quorum Device (Three-Node Cluster or Two-Node Cluster With No Automatic Selection) .................................................................................. 4-70 Exercise Summary............................................................................ 4-72 Performing Basic Cluster Administration ......................................5-1 Objectives ........................................................................................... 5-1 Relevance............................................................................................. 5-2 Additional Resources ........................................................................ 5-3 Identifying Cluster Daemons ........................................................... 5-4 Using Cluster Commands................................................................. 5-7 Commands Relating to Basic Cluster Administration ........ 5-7 Additional Commands............................................................. 5-8 Cluster Command Self-Documentation ................................ 5-8 Viewing and Administering Cluster Global Properties ............. 5-10 Renaming the Cluster............................................................. 5-10 Setting Other Cluster Properties........................................... 5-11 Viewing and Administering Nodes .............................................. 5-12 Viewing Node Status and Configuration ............................ 5-12 Modifying Node Information................................................ 5-13 Viewing Software Release Information on a Node............ 5-14 Viewing and Administering Quorum .......................................... 5-16
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xi
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Viewing Quorum Status and Configuration....................... 5-16 Adding and Removing (and Replacing) Quorum Devices...................................................................................... 5-17 Installing a Quorum Server (Outside the Cluster) ............. 5-17 Adding a Quorum Server Device to a Cluster.................... 5-18 Registering NAS Devices and Choosing a NAS Device as a Quorum Device .................................................. 5-19 Registering NAS Mounted Directories (for Data Fencing) .................................................................................... 5-20 Viewing and Administering Disk Paths and Settings ................ 5-22 Displaying Disk Paths ............................................................ 5-22 Displaying Disk Path Status .................................................. 5-23 Changing Disk Path Monitoring Settings............................ 5-24 Unmonitoring All Non-Shared Devices and Enabling reboot_on_path_failure .................................. 5-25 Viewing Settings Related to SCSI-2 and SCSI-3 Disk Reservations.......................................................................... 5-26 Modifying Properties to use SCSI-3 Reservations for Disks With Two Paths ............................................................ 5-26 Viewing and Administering Interconnect Components ............ 5-29 Viewing Interconnect Status.................................................. 5-29 Adding New Private Networks ............................................ 5-29 Using the clsetup Command ....................................................... 5-31 Sun Cluster Manager ....................................................................... 5-33 Logging Into the Sun Java Web Console ............................. 5-34 Accessing Sun Cluster Manager ........................................... 5-35 Running Cluster Commands as Non-root User or Role Using Role Based Access Control (RBAC) ................................... 5-38 Controlling Clusters ........................................................................ 5-39 Starting and Stopping Cluster Nodes .................................. 5-39 Booting a SPARC Platform Machine With the -x Command................................................................................. 5-41 Maintenance Mode Example................................................. 5-45 Modifying Private Network Address and Netmask .................. 5-46 Exercise: Performing Basic Cluster Administration ................... 5-48 Preparation............................................................................... 5-48 Task 1 Verifying Basic Cluster Configuration and Status......................................................................................... 5-48 Task 2 Reassigning a Quorum Device .............................. 5-49 Task 3 Adding a Quorum Server Quorum Device ......... 5-50 Task 5 Preventing Cluster Amnesia .................................. 5-51 Task 6 Changing the Cluster Private IP Address Range ........................................................................................ 5-52 Task 7 (Optional) Navigating Sun Cluster Manager.................................................................................... 5-53 Exercise Summary............................................................................ 5-54
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xii
Using VERITAS Volume Manager With Sun Cluster Software .....6-1 Objectives ........................................................................................... 6-1 Relevance............................................................................................. 6-2 Additional Resources ........................................................................ 6-3 Introducing VxVM in the Sun Cluster Software Environment ....................................................................................... 6-4 Exploring VxVM Disk Groups......................................................... 6-5 Shared Storage Disk Groups ................................................... 6-5 VERITAS Management on Local Disks (Optional in VxVM 4.x and Above) .............................................................. 6-5 Sun Cluster Management of Disk Groups ............................ 6-7 Sun Cluster Global Devices Within a Disk Group ............... 6-7 VxVM Cluster Feature Used Only for Oracle RAC ............. 6-8 Initializing a VERITAS Volume Manager Disk ............................. 6-9 Traditional Solaris OS Disks and Cross-Platform Data Sharing (CDS) Disks ................................................................. 6-9 Reviewing the Basic Objects in a Disk Group.............................. 6-11 Disk Names or Media Names ............................................... 6-11 Subdisk ..................................................................................... 6-11 Plex ............................................................................................ 6-11 Volume ..................................................................................... 6-12 Layered Volume...................................................................... 6-12 Exploring Volume Requirements in the Sun Cluster Environment ..................................................................................... 6-13 Simple Mirrors......................................................................... 6-13 Mirrored Stripe (Mirror-Stripe) ............................................ 6-14 Striped Mirrors (Stripe-Mirror)............................................. 6-15 Dirty Region Logs for Volumes in the Cluster ................... 6-16 Viewing the Installation and bootdg/rootdg Requirements in the Sun Cluster Environment .................................................... 6-17 Requirements for bootdg/rootdg ....................................... 6-17 DMP Restrictions in Sun Cluster 3.2 .................................... 6-18 Installing Supported Multipathing Software...................... 6-19 Installing VxVM in the Sun Cluster 3.2 Software Environment ..................................................................................... 6-20 Using the installer or installvm Utility........................ 6-20 Manually Using vxdiskadm to Encapsulate the OS Disk ..................................................................................... 6-24 Configuring a Pre-Existing VxVM for Sun Cluster 3.2 Software................................................................................. 6-24 Creating Shared Disk Groups and Volumes................................ 6-26 Listing Available Disks .......................................................... 6-26 Initializing Disks and Putting Them Into a New Disk Group........................................................................................ 6-27 Verifying Disk Groups Imported on a Node ...................... 6-27 Building a Mirrored Striped Volume (RAID 0+1).............. 6-29
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xiii
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Building a Striped Mirrored Volume (RAID 1+0).............. 6-30 Examining Hot Relocation.............................................................. 6-31 Registering VxVM Disk Groups .................................................... 6-32 Using the clsetup Command to Register Disk Groups ...................................................................................... 6-33 Viewing and Controlling Registered Device Groups ........ 6-34 Managing VxVM Device Groups .................................................. 6-35 Resynchronizing Device Groups .......................................... 6-35 Making Other Changes to Device Groups .......................... 6-35 Putting a Device Group Offline and Back Online .............. 6-35 Using Global and Failover File Systems on VxVM Volumes............................................................................................. 6-37 Creating File Systems ............................................................. 6-37 Mounting File Systems........................................................... 6-37 Mirroring the Boot Disk With VxVM............................................ 6-39 ZFS as Failover File System Only .................................................. 6-40 ZFS Includes Volume Management Layer.......................... 6-40 ZFS Removes Need for /etc/vfstab Entries .................... 6-40 Example: Creating a Mirrored Pool and Some Filesystems ............................................................................... 6-40 Exercise: Configuring Volume Management............................... 6-42 Task 1 Selecting Disk Drives .............................................. 6-44 Task 4 Adding vxio on Any Non-Storage Node on Which You Have Not Installed VxVM........................... 6-46 Task 5 Rebooting All Nodes ............................................... 6-46 Task 7 Registering Demonstration Disk Groups............. 6-48 Task 8 Creating a Global nfs File System ........................ 6-50 Task 9 Creating a Global web File System ........................ 6-50 Task 10 Testing Global File Systems ................................. 6-51 Task 11 Managing Disk Device Groups ........................... 6-51 Task 12 (Optional) Viewing and Managing VxVM Device Groups Using Sun Cluster Manager ....................... 6-54 Task 13 (Optional) Encapsulating the Boot Disk on a Cluster Node............................................................................ 6-54 Exercise Summary............................................................................ 6-56 Using Solaris Volume Manager With Sun Cluster Software ........ 7-1 Objectives ........................................................................................... 7-1 Relevance............................................................................................. 7-2 Additional Resources ........................................................................ 7-3 Viewing Solaris Volume Manager in the Sun Cluster Software Environment ...................................................................... 7-4 Exploring Solaris Volume Manager Disk Space Management ....................................................................................... 7-5 Solaris Volume Manager Partition-Based Disk Space Management ........................................................................... 7-5
xiv
Solaris Volume Manager Disk Space Management With Soft Partitions................................................................... 7-6 Exploring Solaris Volume Manager Disksets ................................ 7-9 Solaris Volume Manager Multi-Owner Disksets (for Oracle RAC) ...................................................................................... 7-10 Using Solaris Volume Manager Database Replicas (metadb Replicas) ......................................................................................... 7-11 Local Replica Management.................................................... 7-11 Shared Diskset Replica Management................................... 7-12 Shared Diskset Replica Quorum Mathematics................... 7-12 Shared Diskset Mediators ...................................................... 7-12 Installing Solaris Volume Manager and Tuning the md.conf File...................................................................................... 7-14 Modifying the md.conf File (Solaris 9 OS Only)................ 7-14 Initializing the Local metadb Replicas on Boot Disk and Mirror................................................................................................. 7-15 Using DIDs Compared to Using Traditional c#t#d#........ 7-15 Adding the Local metadb Replicas to the Boot Disk ......... 7-15 Repartitioning a Mirror Boot Disk and Adding metadb Replicas....................................................................... 7-16 Using the metadb or metadb -i Command to Verify metadb Replicas .......................................................... 7-16 Creating Shared Disksets and Mediators ..................................... 7-17 Automatic Repartitioning and metadb Placement on Shared Disksets ....................................................................... 7-18 Using Shared Diskset Disk Space .................................................. 7-20 Building Volumes in Shared Disksets With Soft Partitions of Mirrors ........................................................................ 7-21 Using Solaris Volume Manager Status Commands.................... 7-22 Checking Volume Status........................................................ 7-22 Managing Solaris Volume Manager Disksets and Sun Cluster Device Groups .................................................................... 7-24 Managing Solaris Volume Manager Device Groups .................. 7-26 Device Group Resynchronization......................................... 7-26 Other Changes to Device Groups ......................................... 7-26 Putting a Device Group Offline ............................................ 7-26 Using Global and Failover File Systems on Shared Diskset Volumes.......................................................................................... 7-27 Creating File Systems ............................................................. 7-27 Mounting File Systems........................................................... 7-27 Using Solaris Volume Manager to Mirror the Boot Disk ........... 7-28 Verifying Partitioning and Local metadbs.......................... 7-28 Building Volumes for Each Partition Except for Root ....... 7-29 Building Volumes for Root Partition ................................... 7-29 Running the Commands ........................................................ 7-30 Rebooting and Attaching the Second Submirror ............... 7-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xv
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
ZFS as Failover File System Only .................................................. 7-32 ZFS Includes Volume Management Layer.......................... 7-32 ZFS Removes Need for /etc/vfstab Entries .................... 7-32 Example: Creating a Mirrored Pool and some Filesystems ............................................................................... 7-32 Exercise: Configuring Solaris Volume Manager ......................... 7-34 Preparation............................................................................... 7-34 Task 1 Initializing the Solaris Volume Manager Local metadb Replicas ............................................................ 7-36 Task 2 Selecting the Solaris Volume Manager Demo Volume Disk Drives................................................................ 7-37 Task 3 Configuring Solaris Volume Manager Disksets..................................................................................... 7-38 Task 4 Configuring Solaris Volume Manager Demonstration Volumes ........................................................ 7-39 Task 5 Creating a Global nfs File System ........................ 7-40 Task 6 Creating a Global web File System ........................ 7-41 Task 8 Managing Disk Device Groups ............................. 7-42 Task 9 Viewing and Managing Solaris Volume Manager Device Groups Using Sun Cluster Manager ...... 7-43 Task 10 (Optional) Managing and Mirroring the Boot Disk With Solaris Volume Manager............................ 7-43 Exercise Summary............................................................................ 7-46 Managing the Public Network With IPMP ...................................... 8-1 Objectives ........................................................................................... 8-1 Relevance............................................................................................. 8-2 Additional Resources ........................................................................ 8-3 Introducing IPMP .............................................................................. 8-4 Describing General IPMP Concepts ................................................ 8-5 Defining IPMP Group Requirements..................................... 8-5 Configuring Standby Adapters in a Group........................... 8-7 Examining IPMP Group Examples ................................................. 8-8 Single IPMP Group With Two Members and No Standby....................................................................................... 8-8 Single IPMP Group With Three Members Including a Standby....................................................................................... 8-9 Two IPMP Groups on Different Subnets............................... 8-9 Two IPMP Groups on the Same Subnet .............................. 8-10 Describing IPMP .............................................................................. 8-11 Network Path Failure Detection ........................................... 8-11 Network Path Failover ........................................................... 8-12 Network Path Failback........................................................... 8-12 Configuring IPMP............................................................................ 8-13 Examining New ifconfig Options for IPMP.................... 8-13
xvi
Putting Test Addresses on Physical or Virtual Interfaces .................................................................................. 8-14 Using ifconfig Commands to Configure IPMP .............. 8-14 Configuring the /etc/hostname.xxx Files for IPMP....... 8-15 Using IPV6 Test Address Only.............................................. 8-16 Performing Failover and Failback Manually ............................... 8-18 Configuring IPMP in the Sun Cluster 3.2 Environment............. 8-19 Configuring IPMP Before or After Cluster Installation .... 8-19 Using Same Group Names on Different Nodes ................. 8-19 Understanding Standby and Failback ................................. 8-20 Integrating IPMP Into the Sun Cluster 3.2 Software Environment ..................................................................................... 8-21 Capabilities of the pnmd Daemon in Sun Cluster 3.2 Software.................................................................................... 8-21 Summary of IPMP Cluster Integration ................................ 8-22 Exercise: Configuring and Testing IPMP ..................................... 8-24 Preparation............................................................................... 8-24 Task 1 Verifying the local-mac-address? Variable .... 8-24 Task 2 Verifying the Adapters for the IPMP Group ....... 8-25 Task 3 Verifying or Entering Test Addresses in the /etc/hosts File ............................................................... 8-25 Task 4 Creating /etc/hostname.xxx Files ..................... 8-26 Task 6 Verifying IPMP Failover and Failback ................. 8-27 Introducing Data Services, Resource Groups, and HA-NFS ........9-1 Objectives ........................................................................................... 9-1 Relevance............................................................................................. 9-2 Additional Resources ........................................................................ 9-3 Introducing Data Services in the Cluster........................................ 9-4 Solaris 10 OS Non-Global Zones Act as Virtual Nodes for Data Services........................................................................ 9-4 Off-the-Shelf Application......................................................... 9-4 Sun Cluster 3.2 Software Data Service Agents ..................... 9-6 Reviewing Components of a Data Service Agent ......................... 9-7 Fault Monitor Components ..................................................... 9-7 Introducing Data Service Packaging, Installation, and Registration ......................................................................................... 9-8 Data Service Packages and Resource Types.......................... 9-8 Introducing Resources, Resource Groups, and the Resource Group Manager .................................................................................. 9-9 All Configuration Performed in the Global Zone ................ 9-9 Resources.................................................................................. 9-10 Resource Groups ..................................................................... 9-10 Resource Group Manager...................................................... 9-12 Describing Failover Resource Groups .......................................... 9-13 Resources and Resource Types ............................................. 9-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xvii
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Resource Type Versioning..................................................... 9-14 Using Special Resource Types........................................................ 9-15 The SUNW.LogicalHostname Resource Type..................... 9-15 The SUNW.SharedAddress Resource Type ......................... 9-15 Guidelines for Using Global and Failover File Systems ..................................................................................... 9-17 HAStoragePlus and ZFS ........................................................ 9-18 Generic Data Service............................................................... 9-18 Understanding Resource Dependencies and Resource Group Dependencies ....................................................................... 9-20 Resource Group Dependencies............................................. 9-21 Configuring Resource and Resource Groups Through Properties .......................................................................................... 9-22 Standard Resource Properties ............................................... 9-22 Some Significant Standard Resource Properties ................ 9-23 Resource Group Properties.................................................... 9-25 Some Significant Resource Group Properties ..................... 9-26 Specifying Non-Global Zone Names in Place of Node Names ................................................................................................ 9-28 Using the clresourcetype (clrt) Command ........................... 9-29 Registering Resource Types .................................................. 9-29 Unregistering Types ............................................................... 9-30 Configuring Resource Groups Using the clresourcegroup (clrg) Command .......................................................................... 9-31 Displaying Group Configuration Information ................... 9-31 Configuring a LogicalHostname or a SharedAddress Resource ............................................................................................ 9-33 Examples of Using clrslh to Add a LogicalHostname.. 9-33 Configuring Other Resources Using the clresource (clrs) Command....................................................................................... 9-34 Displaying Resource Configuration Information............... 9-34 Complete Resource Group Example for NFS .............................. 9-36 Controlling the State of Resources and Resource Groups ......... 9-38 Introduction to Resource Group State ................................. 9-38 Introduction to Resource State.............................................. 9-38 Suspended Resource Groups ......................................................... 9-42 Displaying Resource and Resource Group Status Using the clrg status and clrs status Commands ....................... 9-43 Example of Status Commands for a Single Failover Application ........................................................................... 9-43 Using the clsetup Utility for Resource and Resource Group Operations ............................................................................ 9-44 Using the Data Service Wizards in clsetup and Sun Cluster Manager............................................................................... 9-45 Exercise: Installing and Configuring HA-NFS ............................ 9-46 Preparation............................................................................... 9-46
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xviii
Task 1 Installing and Configuring the HA-NFS Agent and Server .................................................................... 9-47 Task 2 Registering and Configuring the Sun Cluster HA-NFS Data Services ........................................................ 9-49 Task 4 Observing Sun Cluster HA-NFS Failover Behavior.................................................................................... 9-51 Task 5 Generating Cluster Failures and Observing Behavior of the NFS Failover ................................................ 9-52 Task 9 Viewing and Managing Resources and Resource Groups Using Sun Cluster Manager ................... 9-57 Exercise Summary............................................................................ 9-58 Configuring Scalable Services and Advanced Resource Group Relationships ..................................................................................10-1 Objectives ......................................................................................... 10-1 Relevance........................................................................................... 10-2 Additional Resources ...................................................................... 10-3 Using Scalable Services and Shared Addresses........................... 10-4 Exploring Characteristics of Scalable Services............................. 10-5 File and Data Access............................................................... 10-5 File Locking for Writing Data ............................................... 10-5 Using the SharedAddress Resource............................................. 10-6 Client Affinity.......................................................................... 10-6 Load-Balancing Weights ........................................................ 10-6 Exploring Resource Groups for Scalable Services....................... 10-7 Understanding Properties for Scalable Groups and Services .............................................................................................. 10-9 The Desired_primaries and Maximum_primaries Properties .............................................................................. 10-9 The Load_balancing_policy Property ............................. 10-9 The Load_balancing_weights Property........................... 10-9 The Resource_dependencies Property ........................... 10-10 Adding Auxiliary Nodes for a SharedAddress Property ....... 10-11 Reviewing Command Examples for a Scalable Service ........... 10-12 Controlling Scalable Resources and Resource Groups............. 10-13 Resource Group Operations ................................................ 10-13 Fault Monitor Operations .................................................... 10-14 Using the clrg status and clrs status Commands for a Scalable Application ............................................................. 10-15 Advanced Resource Group Relationships ................................. 10-16 Weak Positive Affinities and Weak Negative Affinities ................................................................................. 10-16 Strong Positive Affinities ..................................................... 10-17 Strong Positive Affinity With Failover Delegation .......... 10-18 Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache ......................................................................... 10-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xix
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Preparation............................................................................. 10-20 Task 1 Preparing for Apache Data Service Configuration ........................................................................ 10-21 Task 2 Configuring the Apache Environment............... 10-21 Task 3 Testing the Server on Each Node Before Configuring the Data Service Resources ........................... 10-23 Task 4 Registering and Configuring the Sun Cluster Apache Data Service............................................................. 10-25 Task 5 Verifying Apache Web Server Access ................ 10-26 Task 6 Observing Cluster Failures................................... 10-26 Task 7 Configuring Advanced Resource Group Relationships ...................................................................... 10-27 Exercise Summary.......................................................................... 10-29 Performing Supplemental Exercises for Sun Cluster 3.2 Software............................................................... 11-1 Objectives ......................................................................................... 11-1 Relevance........................................................................................... 11-2 Additional Resources ...................................................................... 11-3 Exercise 1: Running a Clustered Scalable Application in Non-global Zones.......................................................................... 11-4 Preparation............................................................................... 11-4 Task 1 Configuring and Installing the Zones................... 11-5 Task 2 Booting the Zones .................................................... 11-6 Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover Application................................................ 11-9 Preparation............................................................................... 11-9 Task 1 Creating a Logical IP Entry in the /etc/hosts File ........................................................................................... 11-11 Task 2 Creating oracle and dba Accounts.................... 11-11 Task 3 Creating a Shared Storage File System for Oracle Software (VxVM)...................................................... 11-12 Task 4 Creating a Shared Storage File System for Oracle Software (SVM)......................................................... 11-13 Task 5 Preparing the oracle User Environment .......... 11-14 Task 6 Disabling Access Control of X Server on the Admin Workstation .............................................................. 11-14 Task 7 Running the runInstaller Installation Script ....................................................................................... 11-14 Task 8 Preparing an Oracle Instance for Cluster Integration.............................................................................. 11-18 Task 9 Verifying That Oracle Software Runs on Other Nodes........................................................................... 11-21 Task 10 Registering the SUNW.HAStoragePlus Type... 11-22 Task 11 Installing and Registering Oracle Data Service..................................................................................... 11-22
xx
Task 12 Creating Resources and Resource Groups for Oracle ...................................................................................... 11-22 Task 13 Verifying That Oracle Runs Properly in the Cluster..................................................................................... 11-24 Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software........................................................................................... 11-25 Preparation............................................................................. 11-26 Task 1 Selecting the Nodes That Will Run Oracle RAC......................................................................................... 11-28 Task 3A (If Using VxVM) Installing RAC Framework Packages for Oracle RAC With VxVM Cluster Volume Manager..................................................... 11-30 Task 3B (If Using Solaris Volume Manager) Installing RAC Framework Packages for Oracle RAC With Solaris Volume Manager Multi-Owner Disksets................................................................................... 11-31 Task 4 Installing Oracle Distributed Lock Manager ..... 11-31 Task 5B (If Using Solaris Volume Manager) Creating and Enabling the RAC Framework Resource Group..................................................................... 11-32 Task 6B Creating Raw Volumes (Solaris Volume Manager) ................................................................................ 11-35 Task 7 Configuring Oracle Virtual IPs............................ 11-38 Task 9A Creating the dbca_raw_config File (VxVM) ................................................................................... 11-40 Task 9B Creating the dbca_raw_config File (Solaris Volume Manager) ................................................... 11-41 Task 10 Disabling Access Control on X Server of the Admin Workstation ....................................................... 11-41 Task 13 Create Sun Cluster Resources to Control Oracle RAC Through CRS ................................................... 11-49 Task 14 Verifying That Oracle RAC Works Properly in a Cluster............................................................. 11-52 Exercise Summary.......................................................................... 11-55 Terminal Concentrator .................................................................... A-1 Viewing the Terminal Concentrator .............................................. A-2 Setup Port.................................................................................. A-3 Terminal Concentrator Setup Programs............................... A-3 Setting Up the Terminal Concentrator........................................... A-4 Connecting to Port 1 ................................................................ A-4 Enabling Setup Mode .............................................................. A-4 Setting the Terminal Concentrator Load Source ................. A-5 Specifying the Operating System Image .............................. A-6 Setting the Serial Port Variables............................................. A-6 Setting the Port Password....................................................... A-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
xxi
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Setting a Terminal Concentrator Default Route ........................... A-9 Using Multiple Terminal Concentrators...................................... A-11 Troubleshooting Terminal Concentrators ................................... A-12 Using the telnet Command to Manually Connect to a Node ................................................................................. A-12 Using the telnet Command to Abort a Node.................. A-12 Using the Terminal Concentrator help Command .......... A-13 Identifying and Resetting a Locked Port ............................ A-13 Configuring Multi-Initiator SCSI .....................................................B-1 Multi-Initiator Overview ..................................................................B-2 Installing a Sun StorEdge Multipack Device .................................B-3 Installing a Sun StorEdge D1000 Array ..........................................B-7 The nvramrc Editor and nvedit Keystroke Commands...........B-11 Quorum Server.................................................................................C-1 Quorum Server Software Installation ............................................ C-2 Configuring the Quorum Server..................................................... C-3 Starting and Stopping a Quorum Server Daemon ....................... C-4 Clearing a Cluster From the Quorum Server................................ C-6 Role-Based Access Control............................................................D-1 Brief Review of RBAC Terminology .............................................. D-2 Role............................................................................................. D-2 Authorization ........................................................................... D-2 Assigning Authorizations....................................................... D-3 Command Privileges ............................................................... D-3 Profiles ....................................................................................... D-3 The Basic Solaris User Profile............................................. D-4 RBAC Relationships ................................................................ D-4 Simplified RBAC Authorizations in the Sun Cluster 3.2 Environment ................................................................................... D-5 RBAC in the Sun Cluster 3.1 Environment (for Backward Compatibility)........................................................ D-5 Assigning Sun Cluster Command Privileges to a User ............................................................................................ D-6 Assigning Sun Privileges to a Role........................................ D-6
xxii
Preface
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Describe the major Sun Cluster software components and functions Congure access to node consoles and the cluster console software Understand the prerequisites for installing Sun Cluster software, with particular attention to understanding quorum devices Install and congure the Sun Cluster 3.2 software Manage the Sun Cluster framework Congure VERITAS Volume Manager for Sun Cluster software Congure Solaris Volume Manager for Sun Cluster software Manage public network adapters using IPMP in the Sun Cluster 3.2 environment Use resources and resource groups to manage clustered applications Use storage resources to manage both global and failover le systems, including ZFS Congure a failover application (Network File System (NFS)) Congure a load-balanced application (Apache) Congure applications in non-global zones Congure HA-Oracle and Oracle Real Application Clusters (RAC)
q q
q q q q q
q q
q q q q
Preface-xxiii
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Course Map
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Course Map
The following course map enables you to see what you have accomplished and where you are going in reference to the course goals.
Product Introduction
Introducing Sun Cluster Hardware and Software
Installation
Exploring Node Console Connectivity and the Cluster Console Software Preparing for Installation and Understanding Quorum Devices Installing and Conguring the Sun Cluster Software Framework
Operation
Performing Basic Cluster Administration
Customization
Using VERITAS Volume Manager for Volume Management Managing Volumes With Solaris Volume Manager Managing the Public Network With IPMP
Supplemental Exercises
Performing Supplemental Exercises for Sun Cluster 3.2 Software
Preface-xxiv
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Solaris 10 Operating System (OS) Advanced Features or Differences. Network administration. Solaris OS administration Described in SA-239: Intermediate System Administration for the Solaris 9 Operating System and SA-299: Advanced System Administration for the Solaris 9 Operating System, and the Solaris 10 OS equivalents (SA-200-S10, SA-202-S10, and so on). Disk storage management Described in ES-222: Solaris Volume Manager Administration and ES-310: VERITAS Volume Manager Administration.
q q
Refer to the Sun Services catalog for specic information and registration.
Preface-xxv
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Can you explain virtual volume management terminology, such as mirroring, striping, concatenation, volumes, and mirror synchronization? Can you perform basic Solaris OS administration tasks, such as using tar and ufsdump commands, creating user accounts, formatting disk drives, using vi, installing the Solaris OS, installing patches, and adding packages? Do you have prior experience with Sun hardware and the OpenBoot programmable read-only memory (PROM) technology? Are you familiar with general computer hardware, electrostatic precautions, and safe handling practices?
Preface-xxvi
Introductions
Introductions
Now that you have been introduced to the course, introduce yourself to each other and the instructor, addressing the items shown in the following bullets.
q q q q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Name Company afliation Title, function, and job responsibility Experience related to topics presented in this course Reasons for enrolling in this course Expectations for this course
Preface-xxvii
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Goals You should be able to accomplish the goals after nishing this course and meeting all of its objectives. Objectives You should be able to accomplish the objectives after completing a portion of instructional content. Objectives support goals and can support other higher-level objectives. Lecture The instructor will present information specic to the objective of the module. This information should help you learn the knowledge and skills necessary to succeed with the activities. Activities The activities take on various forms, such as an exercise, self-check, discussion, and demonstration. Activities help to facilitate mastery of an objective. Visual aids The instructor might use several visual aids to convey a concept, such as a process, in a visual form. Visual aids commonly contain graphics, animation, and video.
Preface-xxviii
Conventions
Conventions
The following conventions are used in this course to represent various training elements and alternative learning resources.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Icons
Additional resources Indicates other references that provide additional information on the topics described in the module.
!
?
Discussion Indicates a small-group or class discussion on the current topic is recommended at this time.
Note Indicates additional information that can help students but is not crucial to their understanding of the concept being described. Students should be able to understand the concept or complete the task without this information. Examples of notational information include keyword shortcuts and minor system adjustments. Caution Indicates that there is a risk of personal injury from a nonelectrical hazard, or risk of irreversible damage to data, software, or the operating system. A caution indicates that the possibility of a hazard (as opposed to certainty) might happen, depending on the action of the user.
Preface-xxix
Conventions
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Typographical Conventions
Courier is used for the names of commands, les, directories, programming code, and on-screen computer output; for example: Use ls -al to list all les. system% You have mail. Courier is also used to indicate programming constructs, such as class names, methods, and keywords; for example: Use the getServletInfo method to get author information. The java.awt.Dialog class contains Dialog constructor. Courier bold is used for characters and numbers that you type; for example: To list the les in this directory, type: # ls Courier bold is also used for each line of programming code that is referenced in a textual description; for example: 1 import java.io.*; 2 import javax.servlet.*; 3 import javax.servlet.http.*; Notice the javax.servlet interface is imported to allow access to its life cycle methods (Line 2). Courier italic is used for variables and command-line placeholders that are replaced with a real name or value; for example: To delete a le, use the rm filename command. Courier italic bold is used to represent variables whose values are to be entered by the student as part of an activity; for example: Type chmod a+rwx filename to grant read, write, and execute rights for lename to world, group, and users. Palatino italic is used for book titles, new words or terms, or words that you want to emphasize; for example: Read Chapter 6 in the Users Guide. These are called class options.
Preface-xxx
Module 1
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Dene the concept of clustering Describe the Sun Cluster 3.2 hardware and software environment Explain the Sun Cluster 3.2 hardware environment View the Sun Cluster 3.2 software support Describe the types of applications in the Sun Cluster 3.2 software environment Identify the Sun Cluster 3.2 software data service support Explore the Sun Cluster 3.2 software high-availability (HA) framework Dene global storage services differences
q q
1-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
Are there similarities between the redundant hardware components in a standalone server and the redundant hardware components in a cluster? Why is it impossible to achieve industry standards of high availability on a standalone server? Why are a variety of applications that run in the Sun Cluster software environment said to be cluster-unaware? How do cluster-unaware applications differ from cluster-aware applications? What services does the Sun Cluster software framework provide for all the applications running in the cluster? Do global devices and global le systems have uses other than for scalable applications in the cluster?
1-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971. Sun Cluster Software Installation Guide for Solaris OS, part number 8192970. Sun Cluster Concepts Guide for Solaris OS, part number 819-2969.
1-3
Defining Clustering
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Dening Clustering
Clustering is a general term that describes a group of two or more separate servers operating as a harmonious unit. Clusters generally have the following characteristics:
q
Separate server nodes, each booting from its own, non-shared, copy of the OS Dedicated hardware interconnect, providing a private transport only between the nodes of the same cluster Multiported storage, providing paths from at least two nodes in the cluster to each physical storage device that stores data for the applications running in the cluster Cluster software framework, providing cluster-specic knowledge to the nodes in the cluster about the health of the hardware and the health and state of their peer nodes General goal of providing a platform for HA and scalability for the applications running in the cluster Support for a variety of cluster-unaware applications and cluster-aware applications
HA Standards
HA standards are usually phrased with wording such as provides 5 nines availability. This means 99.999 percent uptime for the application or about ve minutes of downtime per year. One clean server reboot often already exceeds that amount of downtime.
1-4
Defining Clustering
How Clusters Provide HA (Inter-Node Failover) Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Clusters provide an environment where, in the case of any single hardware or software failure in the cluster, application services and data are recovered automatically (without human intervention) and quickly (faster than a server reboot). The existence of the redundant servers in the cluster and redundant server-storage paths makes this possible.
1-5
Sun Cluster 3.2 Hardware and Software Environment Clusters generally do not require a choice between availability and performance. HA is generally built-in to scalable applications as well as non-scalable ones. In scalable applications, you might not need to relocate failed applications because other instances are already running on other nodes. You might still need to perform recovery on behalf of failed instances.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Support for two to 16 nodes Nodes can be added, removed, and replaced from the cluster without any interruption of application service. Global device implementation While data storage must be physically connected on paths from at least two different nodes in the Sun Cluster 3.2 hardware and software environment, all the storage in the cluster is logically available from every node in the cluster using standard device semantics. This provides the exibility to run applications on nodes that use data that are not even physically connected to the nodes. More information about global device le naming and location can be found in Global Storage Services on page 1-35.
Global le system implementation The Sun Cluster software framework provides a global le service independent of any particular application running in the cluster, so that the same les can be accessed on every node of the cluster, regardless of the storage topology. Cluster framework services implemented in the kernel The Sun Cluster software is tightly integrated with the Solaris 9 OS and Solaris 10 OS kernels. Node monitoring capability, transport monitoring capability, and the global device and le system implementation are implemented in the kernel to provide higher reliability and performance. Off-the-shelf application support The Sun Cluster product includes data service agents for a large variety of cluster-unaware applications. These are tested programs and fault monitors that
1-6
Sun Cluster 3.2 Hardware and Software Environment make applications run properly in the cluster environment. A full list of data service agents is in Sun Cluster 3.2 Software Data Service Support on page 1-30.
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Support for some off-the-shelf applications as scalable applications with built-in load balancing (global interfaces) The scalable application feature provides a single Internet Protocol (IP) address and loadbalancing service for some applications, such as Apache Web Server and Sun Java System Web Server. Clients outside the cluster see the multiple node instances of the service as a single service with a single IP address. Tight integration with Solaris 10 zones The Sun Cluster 3.2 framework is aware of Solaris 10 zones and can manage failover and scalable applications running in non-global zones.
1-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster nodes that are running Solaris 9 8/05 (Update 8) OS, or Solaris 10 OS 11/06 (Update 3), or later Each node must run the same revision and same update of the OS Separate boot disks on each node (with a preference for mirrored boot disks) One or more public network interfaces per system per subnet (with a preferred minimum of at least two) A redundant private cluster transport interface Dual-hosted, mirrored disk storage One terminal concentrator (or any other console access method) Administrative workstation
q q q q
Administration Workstation Network Serial port A Public net Terminal Concentrator IPMP group
Redundant Transport
Multihost Storage
Multihost Storage
Figure 1-1
1-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster-wide monitoring and recovery Global data access (transparent to applications) Application-specic transport for cluster-aware applications (such as Oracle RAC)
Clusters must be dened with a minimum of two separate private networks that form the cluster transport. You can have more than two private networks (and you can add more later). These private networks can provide a performance benet in certain circumstances because global data access trafc is striped across all of the transports. Crossover cables are often used in a two-node cluster. Switches are optional when you have two nodes and are required for more than two nodes. The following types of cluster transport hardware are supported:
q q q
Fast Ethernet Gigabit Ethernet Scalable coherent interface (SCI) intended for remote shared memory (RSM) applications. The SCI interconnect is supported only in clusters with a maximum of four nodes.
1-9
Inniband (Solaris 10 OS only) This is a relatively new industry standard interconnect used outside of the Sun Cluster environment for interconnecting a variety of hosts and storage devices. In the Sun Cluster environment it is supported only as an interconnect between hosts, not between hosts and storage devices.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
1-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Boot Disks
The Sun Cluster 3.2 environment requires that boot disks for each node be local to the node. That is, the boot disks are not connected or not visible to any other node. For example, if the boot device was connected through a Storage Area Network (SAN), it would still be supported if the LUN is not visible to any other nodes. It is recommended to use two local drives for disks and mirroring with VERITAS Volume Manager or Solaris Volume Manager software. Some of the newer server types already have a built in hardware RAID mirror for the OS disk; this is supported in the cluster and thus you do not need any additional software volume management for the OS disk.
1-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
If you are using a serial port console access mechanism (ttya), then you might likely have a terminal concentrator in order to provide the convenience of remote access to your node consoles.A terminal concentrator (TC) is a device that provides data translation from the network to serial port interfaces. Each of the serial port outputs connects to a separate node in the cluster through serial port A. There is always a trade-off between convenience and security. You might prefer to have only dumb-terminal console access to the cluster nodes, and keep these terminals behind locked doors requiring stringent security checks to open them. This is acceptable (although less convenient to administer) for Sun Cluster 3.2 hardware as well.
Administrative Workstation
Included with the Sun Cluster software is the administration console software, which can be installed on any SPARC or x86 Solaris OS workstation. The software can be a convenience in managing the multiple nodes of the cluster from a centralized location. It does not affect the cluster in any other way.
1-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switches are required for the transport. There must be at least two switches. Each node is connected to each switch to form a redundant transport. A variety of storage topologies are supported. Figure 1-2 on page 1-13 shows an example of a Pair + N topology. All of the nodes in the cluster have access to the data in the shared storage through the global devices and global le system features of the Sun Cluster environment. See Module 3, Preparing for Installation and Understanding Quorum Devices for more information about these subjects.
Public net Redundant Transport
Node
Node
Node
Node
Figure 1-2
Pair + N Topology
1-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1
Node 2
Node 3
Node 4
Figure 1-3
1-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 1-4
At the time of writing this course, the only supported mechanism for performing the data replication is the Hitachi TrueCopy product, so this topology is limited to Hitachi storage. The data replication topology is ideal for wider area clusters where the data replication solution is preferred to the extra connectivity that would be involved to actually connect the storage to nodes that are far apart. The reasons for this preference can be:
q
There might be an existing data replication framework in place, and using it within the cluster might be cheaper than physically crosscabling all nodes to all storage. You might benet from each node having a physically local copy of the data because it can guarantee optimal data read behavior.
1-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
NOTES:
1-16
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Using this feature, such servers could use each physical adapter as both a single transport adapter and a single public network adapter.
1-17
Illustration of Shared Private and Public Networks Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 1-5 is a diagram of a two-node cluster where each node has only two physical network adapters that are capable of tagged VLANs.
To public net V3 Adapter 1 V1 Switch 0 V1 V3 Adapter 1
Node 1
Node 2
V3 Adapter 2 V2
Private VLAN V2
Figure 1-5
As shown in Figure 1-5, the switches are interconnected to support the two adapters on each node existing in the same IPMP group for public network address failover. In addition, each switch is being used as a private network interconnect. The isolation of trafc between the private network and public network adapters is controlled by the VLAN-ID that is assigned to the adapters themselves, which pass the information to the network fabric. This requires switches that are capable of tagged VLANs.
1-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Redundant server nodes are required. Redundant transport is required. HA access to and from each node to data storage is required. That is, at least one of the following is required.
q
Mirroring across controllers for Just a Bunch Of Disks (JBOD) or for hardware RAID devices without multipathing (such as single-brick 6120s, or 3310 RAID boxes each with only a single RAID controller) Multipathing from each connected node to HA hardware RAID devices
q q
Redundant public network interfaces per subnet are recommended. Mirrored boot disks are recommended.
You should locate redundant components as far apart as possible. For example, on a system with multiple input/output (I/O) boards, you should put the redundant transport interfaces, the redundant public nets, and the redundant storage array controllers on two different I/O boards.
Cluster in a Box
The Sun Cluster 3.2 hardware environment supports clusters with both (or all) nodes in the same physical frame for the following domain-based servers (current at the time of writing this course):
q q
Sun Fire 25K/20K/15K/12K servers (high-end servers) Sun Fire 3800 through 6800 and 4900/6900 servers (midrange servers) Sun Enterprise 10000 servers
You should take as many redundancy precautions as possible, for example, running multiple domains in different segments on Sun Fire 3x00 through 6x00 servers and segmenting across the power plane on Sun Fire 6x00 servers.
1-19
Sun Cluster 3.2 Hardware Environment If a customer chooses to congure the Sun Cluster 3.2 hardware environment in this way, they face a greater chance of a single hardware failure causing downtime for the entire cluster than when clustering independent standalone servers, or domains from domain-based servers across different physical platforms.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
1-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Solaris OS software Sun Cluster software Data service application software Logical volume management
An exception is a conguration that uses hardware RAID. This conguration might not require a software volume manager. Figure 1-6 provides a high-level overview of the software components that work together to create the Sun Cluster software environment.
Cluster-Unaware Applications
userland
Data Service Agents
Sun Cluster Framework (userland daemons) Sun Cluster Framework (kernel Portions)
kernel
Figure 1-6
1-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Software Revisions
The following software revisions are supported by Sun Cluster 3.2 software:
q
Solaris 9 OS 8/05 Update 8 (SPARC only) Solaris 10 OS 11/06 (Update 3) and later (SPARC and x86) VERITAS Volume Manager 4.1 (for SPARC and x86) VERITAS Volume Manager 5.0 (for SPARC only)
Solaris Volume Manager revisions Part of the base OS in the Solaris 9 OS and Solaris 10 OS
Note New Solaris 10 OS updates might have some lag before being ofcially supported in the cluster. If there is any question, consult Sun support personnel who have access to the latest support matrices.
1-22
Solaris 10 OS Differences
Sun Cluster 3.2 is able to take full advantage of the advanced features of the Solaris 10 OS.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Solaris 10 OS Zones
The Solaris 10 OS zones feature is a powerful facility that allows you to create multiple virtual instances of the Solaris 10 OS, referred to as zones, each running inside a physical instance of the Solaris OS. Each congured non-global zone (the main Solaris 10 OS instance is known as the global zone) appears to be a complete OS instance isolated from every other zone and from the global zone. Each zone has its own zonepath, that is, its own completely separate directory structure for all of its les starting with the root directory. When conguring a non-global zone, its zonepath is expressed as a subdirectory somewhere inside a global zone mounted le system. When you log in to a non-global zone, however, you see the le systems belonging only to that non-global zone, users and processes belonging only to that non-global zone, and so on. Root users logged into one non-global zone have superuser privileges only in that zone. The Sun Cluster 3.2 software framework is fully aware of Solaris 10 OS zones. You can choose to run cluster applications in non-global zones rather than in the global zone. From the point of view of the cluster framework, each zone that you create acts like a virtual node on which you can potentially run your clustered applications.
ZFS
All Solaris 10 versions supported in Sun Cluster 3.2 include ZFS, Suns next generation le system. ZFS includes a built-in volume management layer that makes any other software volume manager unnecessary. Sun Cluster 3.2 supports HA-ZFS. Application data for cluster failover applications can be placed on ZFS le systems, and ZFS storage pools and all their included le systems can fail over between nodes of the cluster.
1-23
Solaris 10 OS Service Management Facility (SMF) Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The Solaris 10 Service Management Facility provides a new way of managing OS services. SMF replaces (more or less) both the entire /etc/rc?.d boot script scheme and the entire inetd framework. Instead, SMF provides a single unied framework for launching and restarting daemons, network services, and in fact anything that might be considered a service. The main advantages of SMF are that:
q
It provides a framework for specifying dependencies and ordering among services. It provides a framework for restarting services, while taking dependencies into account. It can speed up the boot process signicantly because SMF can automatically graph the dependencies and launch the appropriate non-dependent services in parallel.
Sun Cluster 3.2 framework daemons are dened as SMF services. The fact that cluster daemons are dened as SMF services in the Solaris 10 OS makes very little difference in how you manage the cluster. You should never start or stop Sun Cluster framework daemons by hand.
Sun Cluster 3.2 can integrate applications designed as SMF services into the cluster as clustered services. Sun Cluster has SMF-specic proxy agents that turn SMF-based applications, dened outside of the cluster using SMF, into clustered applications, while retaining all of the logic and dependencies implied by the SMF denition.
1-24
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The clusters resource group manager (RGM) coordinates all stopping and starting of the applications, which are never started and stopped by traditional Solaris OS boot methods. A data service agent for the application provides the glue to make it work properly in the Sun Cluster software environment. This includes methods to start and stop the application appropriately in the cluster, as well as fault monitors specic to that application.
1-25
Failover Applications Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The failover model is the easiest to support in the cluster. Failover applications run on only one node of the cluster at a time. The cluster provides HA by providing automatic restart on the same node or a different node of the cluster. Failover services are usually paired with an application IP address. This is an IP address that always fails over from node to node along with the application. In this way, clients outside the cluster see a logical host name with no knowledge of which node a service is running on. The client should not even be able to tell that the service is running in a cluster. Note Both IPV4 addresses and IPV6 addresses are supported. Multiple failover applications in the same resource group can share an IP address, with the restriction that they must all fail over to the same node together as shown in Figure 1-7.
Node 1
Figure 1-7
Node 2
Note In Sun Cluster 3.2, the nodes hosting failover applications might actually be zones congured on different nodes rather than the physical node (or global zone). Each zone looks like a virtual node from the point of view of application resource group management.
1-26
Scalable Applications Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Scalable applications involve running multiple instances of an application in the same cluster and making it look like a single service by means of a global interface that provides a single IP address and load balancing. While scalable applications (Figure 1-8) are still off-the-shelf, not every application can be made to run as a scalable application in the Sun Cluster 3.2 software environment. Applications that write data without any type of locking mechanism might work as failover applications but do not work as scalable applications.
Client 1 Requests Network Node 1 Global Interface xyz.com Request Distribution Transport
Client 2 Requests
Client 3
Web Page
Node 2
Node 3
HTTP Application
HTTP Application
HTTP Application
<HTML> </HTML>
<HTML> </HTML>
<HTML> </HTML>
Figure 1-8
1-27
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster-Aware Applications
Cluster-aware applications are applications where knowledge of the cluster is built-in to the software. They differ from cluster-unaware applications in the following ways:
q
Multiple instances of the application running on different nodes are aware of each other and communicate across the private transport. It is not required that the Sun Cluster software framework RGM start and stop these applications. Because these applications are cluster-aware, they can be started in their own independent scripts, or by hand. Applications are not necessarily logically grouped with external application IP addresses. If they are, the network connections can be monitored by cluster commands. It is also possible to monitor these cluster-aware applications with Sun Cluster 3.2 software framework resource types.
1-28
Remote Shared Memory (RSM) Applications Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Applications that run on Sun Cluster 3.2 hardware, can make use of an application programming interface (API) called Remote Shared Memory (RSM). This API maps data from an application instance running on one node into the address space of an instance running on another node. This can be a highly efcient way for cluster-aware applications to share large amounts of data across the transport. This requires the SCI interconnect. Oracle RAC is the only application that is supported in the Sun Cluster 3.2 software environment that can make use of RSM. Note This course focuses on the cluster-unaware applications rather than the types presented on this page. Module 11, Performing Supplemental Exercises for Sun Cluster 3.2 Software contains procedures and an optional exercise for running Oracle 10g RAC (without RSM).
1-29
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster HA for Oracle Server (failover) Sun Cluster HA for Oracle E-Business Suite (failover) Sun Cluster HA for Oracle 9i Application Server (failover) Sun Cluster HA for Sun Java System Web Server (failover or scalable) Sun Cluster HA for Sun Java System Application Server (failover) Sun Cluster HA for Sun Java System Application Server EE HADB (failover) Sun Cluster HA for Sun Java System Directory Server (failover) Sun Cluster HA for Sun Java System Messaging Server (failover) Sun Cluster HA for Sun Java System Calendar Server (failover) Sun Cluster HA for Sun Instant Messaging (failover) Sun Cluster HA for Sun Java System Message Queue software (failover) Sun Cluster HA for Apache Web Server (failover or scalable) Sun Cluster HA for Apache Tomcat (failover or scalable) Sun Cluster HA for Domain Name Service (DNS) (failover) Sun Cluster HA for Network File System (NFS) (failover) Sun Cluster HA for Dynamic Host Conguration Protocol (DHCP) (failover) Sun Cluster HA for SAP (failover or scalable) Sun Cluster HA for SAP LiveCache (failover) Sun Cluster HA for SAP Web Application Server (failover or scalable)
q q
q q q q q
q q q q q
q q q
1-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster HA for Sybase ASE (failover) Sun Cluster HA for Siebel (failover) Sun Cluster HA for Samba (failover) Sun Cluster HA for BEA WebLogic Application Server (failover) Sun Cluster HA for IBM Websphere MQ (failover) Sun Cluster HA for IBM Websphere MQ Integrator (failover) Sun Cluster HA for MySQL (failover) Sun Cluster HA for PostgreSQL (failover) Sun Cluster HA for SWIFTalliance Access (failover) Sun Cluster HA for SWIFTalliance Gateway (failover) Sun Cluster HA for Agfa Impax (failover) Sun Cluster HA for Sun N1 Grid Engine (failover) Sun Cluster HA for Sun N1 Service Provisioning System (failover) Sun Cluster HA for Kerberos (failover) Sun Cluster HA for Solaris Containers (Solaris 10 OS zones whole zone provided as a failover in 3.1 -- agent still exists for 3.2 but not used when treating zone in new way as virtual node)
1-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
1-32
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster-aware application
Cluster-aware application
clprivnet0 172.16.193.1
clprivnet0 172.16.193.2
Figure 1-9
Note Non-global zones can optionally be given their own IP addresses on the clprivnet0 subnet. You would only do this if you wanted to support cluster-aware applications inside non-global zones. At this time Oracle RAC is not supported in non-global zones.
1-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster and node names Cluster transport conguration Device group conguration (including registered VERITAS disk groups or Solaris Volume Manager disksets) A list of nodes that can master each device group Information about NAS devices Data service operational parameter values Paths to data service callback methods Disk ID (DID) device conguration Current cluster status
q q q q q q
The CCR is accessed when error or recovery situations occur or when there has been a general cluster status change, such as a node leaving or joining the cluster.
1-34
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
1-35
Global Storage Services Figure 1-10 demonstrates the relationship between typical Solaris OS logical path names and DID instances.
Junction Node 1 d1=c0t0d0 d2=c2t18d0 c0 c2 Node 2 d2=c5t18d0 d3=c0t0d0 c5 c0 Node 3 d4=c0t0d0 d5=c1t6d0 d8191=/dev/rmt/0 c0 rmt/0 c1
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
t6d0 CD-ROM
Figure 1-10 DID Devices Device les are created for each of the standard eight Solaris OS disk partitions in both the /dev/did/dsk and /dev/did/rdsk directories; for example: /dev/did/dsk/d2s3 and /dev/did/rdsk/d2s3. DIDs themselves are just a global naming scheme and not a global access scheme. DIDs are used as components of Solaris Volume Manager volumes and in choosing cluster quorum devices, as described in Module 3, Preparing for Installation and Understanding Quorum Devices DIDs are not used as components of VERITAS Volume Manager volumes.
Global Devices
The global devices feature of Sun Cluster 3.2 software provides simultaneous access to the raw (character) device associated with storage devices from all nodes, regardless of where the storage is physically attached. This includes individual DID disk devices, CD-ROMs and tapes, as well as VERITAS Volume Manager volumes and Solaris Volume Manager volumes.
1-36
Global Storage Services The Sun Cluster 3.2 software framework manages automatic failover of the primary node for global device groups. All nodes use the same device path, but only the primary node for a particular device actually talks through the storage medium to the disk device. All other nodes access the device by communicating with the primary node through the cluster transport. In Figure 1-11, all nodes have simultaneous access to the device /dev/vx/rdsk/nfsdg/nfsvol. Node 2 becomes the primary node if Node 1 fails.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Node 2
Node 3
Node 4
nfsdg/nfsvol
1-37
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
81106 81106
6% 6%
/global/.devices/node@2 /global/.devices/node@1
The device names under the /global/.devices/node@nodeID arena can be used directly, but, because they are unwieldy, the Sun Cluster 3.2 environment provides symbolic links into this namespace. For VERITAS Volume Manager and Solaris Volume Manager, the Sun Cluster software links the standard device access directories into the global namespace: proto192:/dev/vx# ls -l /dev/vx/rdsk/nfsdg lrwxrwxrwx 1 root root 40 Nov 25 03:57 /dev/vx/rdsk/nfsdg ->/global/.devices/node@1/dev/vx/rdsk/nfsdg/ For individual DID devices, the standard directories /dev/did/dsk, /dev/did/rdsk, and /dev/did/rmt are not global access paths. Instead, Sun Cluster software creates alternate path names under the /dev/global directory that link into the global device space: proto192:/dev/md/nfsds# ls -l /dev/global lrwxrwxrwx 1 root other 34 Nov 6 13:05 /dev/global -> /global/.devices/node@1/dev/global/ proto192:/dev/md/nfsds# ls -l /dev/global/rdsk/d3s0 lrwxrwxrwx 1 root root 39 Nov 4 17:43 /dev/global/rdsk/d3s0 -> ../../../devices/pseudo/did@0:3,3s0,raw proto192:/dev/md/nfsds# ls -l /dev/global/rmt/1 lrwxrwxrwx 1 root root 39 Nov 4 17:43 /dev/global/rmt/1 -> ../../../devices/pseudo/did@8191,1,tp
1-38
Note You do have raw (character) device access from one node, through the /dev/global device paths, to the boot disks of other nodes. In other words, while you cannot mount one nodes root disk from another node, you can overwrite it with newfs or dd. It is not necessarily advisable to take advantage of this feature.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The UNIX File System (ufs), The VERITAS File System (VxFS) The High Sierra File System (hsfs)
The Sun Cluster software makes a le system global with a global mount option. This is typically in the /etc/vfstab le, but can be put on the command line of a standard mount command: # mount -o global /dev/vx/dsk/nfs-dg/vol-01 /global/nfs The equivalent mount entry in the /etc/vfstab file is: /dev/vx/dsk/nfs-dg/vol-01 /dev/vx/rdsk/nfs-dg/vol-01 \ /global/nfs ufs 2 yes global The global le system works on the same principle as the global device feature. That is, only one node at a time is the primary and actually communicates with the underlying le system. All other nodes use normal le semantics but actually communicate with the primary over the cluster transport. The primary node for the le system is always the same as the primary node for the device on which it is built. The global le system is also known by the following names:
q q
1-39
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
UFS VxFS Sun StorageTek QFS software Solaris ZFS (Solaris 10 OS only)
Note Ofcially, the acronyms QFS and ZFS do not stand for anything.
1-40
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Preparation
No special preparation is required for this lab.
Task
You will participate in a guided tour of the training lab. While participating in the guided tour, you will identify the Sun Cluster hardware components including the cluster nodes, terminal concentrator, and administrative workstation. If this course is being run without local access to the equipment, take this opportunity to review the essentials of the Sun Cluster hardware and software, or to familiarize yourself with the remote lab environment.
1-41
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
1-42
Module 2
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Describe the different methods for accessing a console Access the Node consoles on domain-based servers Congure the Sun Cluster console software on the administration workstation Use the Cluster console tools
2-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
What is the trade-off between convenience and security in different console access methods? How do you reach the node console on domain-based clusters? What benets does using a terminal concentrator give you? Is installation of the administration workstation software essential for proper cluster operation?
q q q
2-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971. Sun Cluster Software Installation Guide for Solaris OS, part number 8192970.
2-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Does not require node console access for most operations described during the duration of the course. Most cluster operations require only that you be logged in on a cluster node as root or as a user with cluster authorizations in the Role-Based Access Control (RBAC) subsystem. It is acceptable to have direct telnet, rlogin, or ssh access to the node. Must have console node access for certain emergency and informational purposes. If a node is failing to boot, the cluster administrator will have to access the node console to gure out why. The cluster administrator might like to observe boot messages even in normal, functioning clusters.
The following pages describe the essential difference between console connectivity on traditional nodes that are not domain-based and console connectivity on the domain-based nodes (Sun Enterprise 10000 servers, Sun Fire midrange servers (E4900, E6900, and 3800 through 6800 servers), and Sun Fire high-end servers 25K/20K/15K/12K servers).
2-4
Accessing the Cluster Node Consoles The rule for console connectivity is simple. You can connect to the node ttyA interfaces any way you prefer, if whatever device you have connected directly to the interfaces does not spuriously issue BREAK signals on the serial line. BREAK signals on the serial port bring a cluster node to the OK prompt, killing all cluster operations on that node. You can disable node recognition of a BREAK signal by a hardware keyswitch position (on some nodes), a software keyswitch position (on midrange and high-end sersvers), or a le setting (on all nodes). For those servers with a hardware keyswitch, turn the key to the third position to power the server on and disable the BREAK signal. For those servers with a software keyswitch, issue the setkeyswitch command with the secure option to power the server on and disable the BREAK signal. For all servers, while running Solaris OS, uncomment the line KEYBOARD_ABORT=alternate in /etc/default/kbd to disable receipt of the normal BREAK signal through the serial port. This setting takes effect on boot, or by running the kbd -i command as root. The Alternate Break signal is dened by the particular serial port driver you happen to have on your system. You can use the prtconf command to gure out the name of your serial port driver, and then use man serial-driver to gure out the sequence. For example, for the zs driver, the sequence is carriage return, tilde (~), and control-B: CR ~ CTRL-B. When the Alternate Break sequence is in effect, only serial console devices are affected.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2-5
Accessing the Cluster Node Consoles A TC is also known as a Network Terminal Server (NTS).
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 2-1 shows a terminal concentrator network and serial port interfaces. The node public network interfaces are not shown. While you can attach the TC to the public net, most security-conscious administrators would attach it to a private management network.
Administrative Console
Setup Device
Figure 2-1
Most TCs enable you to administer TCP pass-through ports on the TC. When you connect with telnet to the TCs IP address and pass through port, the TC transfers trafc directly to the appropriate serial port (perhaps with an additional password challenge).
2-6
Accessing the Cluster Node Consoles When you use telnet to connect to the Sun TC from an administrative workstation using, for example, the following command, the Sun TC passes you directly to serial port 2, connected to your rst cluster node. # telnet tc-ipname 5002 The Sun TC supports an extra level of password challenge, a per-port password that you have to enter before going through to the serial port. Full instructions about installing and conguring the Sun TC are in Appendix A, Terminal Concentrator.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Use dumb terminals for each node console. If these are in a secure physical environment, this is certainly the most secure, but least convenient, method. Use a workstation that has two serial ports as a tip launchpad, especially for a cluster with only two nodes. You can attach a workstation on the network exactly as you would place a TC, and attach its serial ports to the node consoles. You then add lines to the /etc/remote le of the Solaris OS workstation as follows: node1:\ :dv=/dev/term/a:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D: node2:\ :dv=/dev/term/b:br#9600:el=^C^S^Q^U^D:ie=%$:oe=^D: This allows you to access node consoles by accessing the launchpad workstation and manually typing tip node1 or tip node2. One advantage of using a Solaris OS workstation instead of a TC is that it is easier to tighten the security in the Solaris OS workstation. For example, you could easily disable telnet and rlogin access and require that administrators access the tip launchpad through Secure Shell.
Use network access to the console on servers that specically support the following types of remote control:
q q q
Advanced Lights Out Manager (ALOM) Remote System Control (RSC) Service Processor (on Sun Fire server V40z x86 nodes)
These servers provide a combination of rmware and hardware that provides direct access to the servers hardware and console. Dedicated Ethernet ports and dedicated IP addresses are used for remote console access and server management.
2-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2-9
Accessing the Node Consoles on Domain-Based Servers You can congure ScApp to provide remote access through telnet, through ssh, or neither. If you congure telnet access, the SC runs an emulation that is similar to the TC, where it listens on TCP ports 5001, 5002, and so forth. Accessing the SC with one of these ports passes you through (perhaps with an extra password challenge) to a domain console or domain shell.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster console (cconsole) The cconsole program accesses the node consoles through the TC or other remote console access method. Depending on your access method, you might be directly connected to the console node, you might have to respond to additional password challenges, or you might have to login to an SSP or SC and manually connect to the node console.
2-11
Cluster console (crlogin) The crlogin program accesses the nodes directly using the rlogin command.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster console (ctelnet) The ctelnet program accesses the nodes directly using the telnet command.
In certain scenarios, some tools might be unusable, but you might want to install this software to use other tools. For example, if you are using dumb terminals to attach to the node consoles, you will not be able to use the cconsole variation, but you can use the other two after the nodes are booted. Alternatively, you might have, for example, a Sun Fire 6800 server where telnet and rlogin are disabled in the domains, and only ssh is working as a remote login method. You might be able to use only the cconsole variation, but not the other two variations.
2-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 2-2
2-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 2-3
2-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 2-4
2-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
When you install the Sun Cluster console software on the administrative console, you must manually create the clusters and serialports les and populate them with the necessary information.
2-16
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2-17
Configuring Cluster Console Tools For a system using ALOM or RSC access instead of a TC, the columns display the name of the node, the name corresponding to the dedicated IP for ALOM/RSC for that node, and port 23. When the cconsole cluster command is executed, you are connected to the ALOM or RSC for that node. The actual login to the node console is performed manually in each window. node1 node2 node1-sc node2-sc 23 23
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
If you are using a tip launchpad, the columns display the name of the node, the name of the launchpad, and port 23. The actual login to the launchpad is performed manually in each window. The actual connection to each node console is made by executing the tip command and specifying the serial port used for that node in each window. node1 sc-tip-ws 23 node2 sc-tip-ws 23 For the Sun Fire 3800 through 6800 servers and E4900/E6900 servers, the columns display the domain name, the Sun Fire server main system controller name, and a TCP port number similar to those used with a terminal concentrator. Port 5000 connects you to the platform shell, ports 5001, 5002, 5003, or 5004 connect you to the domain shell or domain console for domains A, B, C, or D, respectively. Connection to the domain shell or console is dependent upon the current state of the domain. sf1_node1 sf1_mainsc 5001 sf1_node2 sf1_mainsc 5002
2-18
Using the Tools When ssh Is Required or Preferred to Access the Console
Certain environments might require ssh access to get to the console, or you might prefer to use ssh access even if telnet is also available. Examples are:
q q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
TCs that have ssh enabled The tip launchpad on which you have enabled ssh Sun high-end servers (ssh to the SC) and E10000 (ssh to the SSP) Sun Fire midrange SCs on which you have enabled ssh (telnet will be automatically disabled) Sun Fire v40z service processor (supports only ssh, not telnet)
To use the tools in these scenarios, you can create an /etc/serialports le similar to the following: # example for a 6800 (cluster in a box) node1 my6800sc 22 node2 my6800sc 22 # example for a 15K (cluster in a box) node1 my15kmainsc 22 node2 my15kmainsc 22 # example for two v40zs (each has its own sp) v40node1 node1-sp 22 v40node2 node2-sp 22 Now create a le called, for example, /etc/consolewithssh. This is used by the customized telnet script below to gure out which console access devices require ssh access, and what the ssh user name should be. The ssh server on the Sun Fire midrange SC ignores the user name, so the le just includes anything as a place holder. A Sun Fire 15K SC would have a user dened on the SC (in this example, mysmsuser), that is dened as a domain administrator of both domains in question. # console access device / user my6800sc anything node1-sp node2-sp my15kmainsc admin admin mysmsuser
2-19
Configuring Cluster Console Tools Now you can make a variation of the regular telnet executable:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# mv /usr/bin/telnet /usr/bin/telnet.real # vi /usr/bin/telnet #!/bin/ksh HOST=$1 PORT=$2 LOGINFORSSH=$(cat /etc/consolewithssh |awk '$1 == "'$HOST'" {print $2}') if [[ $LOGINFORSSH != "" ]] then /usr/bin/ssh -p $PORT -l $LOGINFORSSH $HOST else /usr/bin/telnet.real "$@" fi # chmod a+x /usr/bin/telnet
2-20
Using the Tools When ssh Is Required or Preferred for Direct Access to the Nodes
If you want to use the cluster access tools for direct access to nodes using ssh, you can make the same sort of variations to the /usr/bin/rlogin command, and then use the crlogin variation of the tools. For example, the following globally substitutes ssh instead of the real rlogin: # mv /usr/bin/rlogin /usr/bin/rlogin.real # vi /usr/bin/rlogin #!/bin/ksh /usr/bin/ssh "$@" # chmod a+x /usr/bin/rlogin
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Updating Host Name Resolution Task 2 Installing the Cluster Console Software Task 3 Verifying the Administrative Console Environment Task 4 Conguring the /etc/clusters File Task 5 Conguring the /etc/serialports File Task 6 Starting the cconsole Tool Task 7 Using the ccp Control Panel
Preparation
This exercise assumes that the Solaris 10 OS software is already installed on all of the cluster systems. Perform the following steps to prepare for the lab: 1. 2. Ask your instructor for the name assigned to your cluster. Cluster name: _______________________________ Record the information in Table 2-1 about your assigned cluster before proceeding with this exercise.
Table 2-1 Cluster Names and Addresses System Administrative console TC Node 1 Node 2 Node 3 (if any) Name IP Address
2-22
Exercise: Configuring the Administrative Console 3. Ask your instructor for the location of the Sun Cluster software.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2.
Note If your administrative console is in a remote lab, your instructor will assign you a user login. In this case, the cluster console software is already installed. 2. Check to see if the cluster console software is already installed: (# or $) pkginfo SUNWccon If it is already installed, you can skip the rest of the steps of this task. 3. Move to the Sun Cluster 3.2 packages directory: (# or $) cd sc32_location/Solaris_{sparc_or_x86}/Products (# or $) cd sun_cluster/Solaris_10/Packages
2-23
Exercise: Configuring the Administrative Console 4. Verify that you are in the correct location: # ls -d SUNWccon SUNWccon 5. Install the cluster console software package: # pkgadd -d . SUNWccon
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Create the .profile le in your home directory if necessary, and add the changes. 2. Execute the .profile le to verify changes that have been made: (# or $) $HOME/.profile
Note You can also log out and log in again to set the new variables.
2-24
Exercise: Configuring the Administrative Console Perform the following to congure the /etc/clusters le:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Edit the /etc/clusters le, and add a line using the cluster and node names assigned to your system. Note If you are using a remote lab environment, the /etc/clusters le might already be set up for you. Examine the le.
Perform the following to congure the /etc/serialports le: Edit the /etc/serialports le and add lines using the node and TC names assigned to your system. Note In the remote lab environment, this le might already be set up for you. Examine the le.
2-25
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Substitute the name of your cluster for clustername. 3. Place the cursor in the cconsole Common window, and press the Return key several times. You should see a response on all of the cluster host windows. If not, ask your instructor for assistance. If your console device is an ALOM dedicated interface, you might need to enter an ALOM command to access the actual node consoles. Consult your instructor. If the cluster host systems are not booted, boot them now. ok boot 6. 7. After all cluster host systems have completed their boot, log in as user root. Practice using the Common window Group Term Windows feature under the Options menu. You can ungroup the cconsole windows, rearrange them, and then group them together again.
4.
5.
2-26
Exercise Summary
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
!
?
2-27
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Module 3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
List the Sun Cluster software boot disk requirements and hardware restrictions Identify typical cluster storage topologies Describe quorum votes and quorum devices Describe persistent quorum reservations and cluster amnesia Describe data fencing Congure a supported cluster interconnect system Identify public network adapters Congure shared physical adapters
q q q q q q q
3-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
Is cluster planning required even before installation of the Solaris OS on a node? Do certain cluster topologies enforce which applications are going to run on which nodes, or do they just suggest this relationship? Why is a quorum device absolutely required in a two-node cluster? What is meant by a cluster amnesia problem?
q q
3-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971. Sun Cluster Software Installation Guide for Solaris OS, part number 8192970. Sun Cluster Concepts Guide for Solaris OS, part number 819-2969.
3-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You cannot use a shared storage device as a boot device. If a storage device is connected to more than one host, it is shared. The Solaris 9 OS 9/05 (Update 8), and Solaris 10 OS 11/06 (Update 3) or greater are supported with the following restrictions:
q
The same version of the Solaris OS, including the update revision (for example, Solaris 10 11/06), must be installed on all nodes in the cluster. VERITAS Dynamic Multipathing (DMP) is not supported. See Module 6, Using VERITAS Volume Manager With Sun Cluster Software for more details about this restriction. The supported versions of VERITAS Volume Manager must have the DMP device drivers enabled, but Sun Cluster 3.2 software still does not support an actual multipathing topology under control of Volume Manager DMP.
Sun Cluster 3.2 software requires a minimum swap partition of 750 megabytes (Mbytes). There must be a minimum 512 Mbytes per globaldevices le system. Solaris Volume Manager software requires a 20 Mbyte partition for its meta-state databases. VERITAS Volume Manager software requires two unused partitions for encapsulation of the boot disk.
3-4
Note The /globaldevices le system is modied during the Sun Cluster 3.2 software installation. It is automatically renamed to /global/.devices/node@nodeid, where nodeid represents the number that is assigned to a node when it becomes a cluster member. The original /globaldevices mount point is removed. The /globaldevices le system must have ample space and ample inode capacity for creating both block-special devices and character-special devices. This is especially important if a large number of disks are in the cluster. A le system size of 512 Mbytes should sufce for even the largest cluster congurations.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note The logging option is the default starting in Solaris 9 9/04 (Update 7), and including all updates of Solaris 10 OS.
3-5
Configuring Cluster Servers All cluster servers must meet the following minimum hardware requirements:
q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Each server must have a minimum of 512 Mbytes of memory. Each server must have a minimum of 750 Mbytes of swap space. If you are running applications on the cluster nodes that require swap space, you should allocate at least 512 Mbytes over that requirement for the cluster. Servers in a cluster can be heterogeneous with certain restrictions on mixing certain types of host adapters in the same cluster. For example, Peripheral Component Interconnect (PCI) and SBus adapters cannot be mixed in the same cluster if attached to small computer system interface (SCSI) storage.
3-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster software never supports more than sixteen nodes. Some storage congurations have restrictions on the total number of nodes supported. A shared storage device can connect to as many nodes as the storage device supports. Shared storage devices do not need to connect to all nodes of the cluster. However, these storage devices must connect to at least two nodes.
Cluster Topologies
Cluster topologies describe typical ways in which cluster nodes can be connected to data storage devices. While Sun Cluster does not require you to congure a cluster by using specic topologies, the following topologies are described to provide the vocabulary to discuss a clusters connection scheme. The following are some typical topologies:
q q q q q
Clustered pairs topology Pair+N topology N+1 topology Multiported (more than two node) N*N scalable topology NAS device-only topology
3-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1
Node 2
Node 3
Node 4
Storage
Storage
Storage
Storage
Figure 3-1
Nodes are congured in pairs. You can have any even number of nodes from 2 to 16. Each pair of nodes shares storage. Storage is connected to both nodes in the pair. All nodes are part of the same cluster. You are likely to design applications that run on the same pair of nodes physically connected to the data storage for that application, but you are not restricted to this design.
Because each pair has its own storage, no one node must have a signicantly higher storage capacity than the others. The cost of the cluster interconnect is spread across all the nodes. This conguration is well suited for failover data services.
q q
3-8
Pair+N Topology
As shown in Figure 3-2, the Pair+N topology includes a pair of nodes directly connected to shared storage and nodes that must use the cluster interconnect to access shared storage because they have no direct connection themselves.
Switch
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1
Node 2
Node 3
Node 4
Storage
Storage
Figure 3-2
Pair+N Topology
All shared storage is connected to a single pair. Additional cluster nodes support scalable data services or failover data services with the global device and le system infrastructure. A maximum of sixteen nodes are supported. There are common redundant interconnects between all nodes. The Pair+N conguration is well suited for scalable data services.
q q q
The limitations of a Pair+N conguration is that there can be heavy data trafc on the cluster interconnects. You can increase bandwidth by adding more cluster transports.
3-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
N+1 Topology
The N+1 topology, shown in Figure 3-3, enables one system to act as the storage backup for every other system in the cluster. All of the secondary paths to the storage devices are connected to the redundant or secondary system, which can be running a normal workload of its own.
Switch
Switch
Node 1 Primary
Node 2 Primary
Node 3 Primary
Node 4 Secondary
Storage
Storage
Storage
Figure 3-3
The secondary node is the only node in the conguration that is physically connected to all the multihost storage. The backup node can take over without any performance degradation. The backup node is more cost effective because it does not require additional data storage. This conguration is best suited for failover data services.
A limitation of the N+1 conguration is that if there is more than one primary node failure, you can overload the secondary node.
3-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1
Node 2
Node 3
Node 4
Storage
Storage
Figure 3-4
This conguration is required for running the ORACLE Parallel Server/Real Application Clusters (OPS/RAC) across more than two nodes. For ordinary, cluster-unaware applications, each particular disk group or diskset in the shared storage still only supports physical trafc from one node at a time. However, having more than two nodes physically connected to the storage adds exibility and reliability to the cluster.
3-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1
Node 2
Node 3
Node 4
Figure 3-5
Note There is support for using a NAS device as a quorum device, and support for NAS data fencing. These concepts are described in more detail later in this module. To support the specic NetApp Filer implementation (which is the only NAS device supported in Sun Cluster 3.2), you must install a package called NTAPclnas on the cluster nodes. This package is available for NetApp customers with a support contract from http://now.netapp.com.
3-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 3-6
At the time of the writing this course, the only supported mechanism for performing the data replication is the Hitachi TrueCopy product, so this topology is limited to Hitachi storage. The data replication topology is ideal for wider area clusters where the data replication solution is preferred to the extra connectivity that would be involved to actually connect the storage to nodes that are far apart.
3-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Non-Storage Topology
This can be suitable for an application that is purely compute-based and requires no data storage. Prior to Sun Cluster 3.2, two-node clusters used to require shared storage because they require a quorum device. Now, you could choose to use a quorum server device, described later in this module.
3-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Each node is assigned exactly one vote. Certain devices can be identied as quorum devices and are assigned votes. The following are the types of quorum devices:
q
Directly attached, multiported disks. Disks are the traditional type of quorum device and have been supported in all versions of Sun Cluster 3.x.
q q q
There must be a majority (more than 50 percent of all possible votes present) to form a cluster or remain in a cluster.
These are two distinct problems and it is actually quite clever that they are solved by the same quorum mechanism in the Sun Cluster 3.x software. Other vendors cluster implementations have two distinct mechanisms for solving these problems, making cluster management more complicated.
3-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Failure Fencing
As shown in Figure 3-7, if interconnect communication between nodes ceases, either because of a complete interconnect failure or a node crashing, each node must assume the other is still functional. This is called split-brain operation. Two separate clusters cannot be allowed to exist because of the potential for data corruption. Each node tries to establish a cluster by gaining another quorum vote. Both nodes attempt to reserve the designated quorum device. The rst node to reserve the quorum device establishes a majority and remains as a cluster member. The node that fails the race to reserve the quorum device aborts the Sun Cluster software because it does not have a majority of votes.
Node 1 (1) Interconnect CCR Database Quorum device = CCR Database Quorum device = Node 2 (1)
Figure 3-7
Failure Fencing
Amnesia Prevention
If it were allowed to happen, a cluster amnesia scenario would involve one or more nodes being able to form the cluster (boot rst in the cluster) with a stale copy of the cluster conguration. Imagine the following scenario: 1. 2. 3. 4. In a two-node cluster (Node 1 and Node 2), Node 2 is halted for maintenance or crashes. Cluster conguration changes are made on Node 1. Node 1 is shut down. You try to boot Node 2 to form a new cluster.
There is more information about how quorum votes and quorum devices prevent amnesia in Preventing Cluster Amnesia With Persistent Reservations on page 3-28.
3-16
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A quorum device must be available to both nodes in a two-node cluster. Quorum device information is maintained globally in the Cluster Conguration Repository (CCR) database. A traditional disk quorum device can also contain user data. The maximum and optimal number of votes contributed by quorum devices should be the number of node votes minus one (N-1). If the number of quorum devices equals or exceeds the number of nodes, the cluster cannot come up if too many quorum devices fail, even if all nodes are available. Clearly, this is unacceptable.
q q
Quorum devices are not required in clusters with greater than two nodes, but they are recommended for higher cluster availability. A single quorum device can be automatically congured by scinstall, for a two-node cluster only, using a shared disk quorum device only. All other quorum devices are manually congured after the Sun Cluster software installation is complete. Disk quorum devices are congured (specied) using DID device names.
The total possible quorum votes (number of nodes plus the number of disk quorum votes dened in the cluster) The total present quorum votes (number of nodes booted in the cluster plus the number of disk quorum votes physically accessible by those nodes) The total needed quorum votes, which is greater than 50 percent of the possible votes
3-17
Describing Quorum Votes and Quorum Devices The consequences are quite simple:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A node that cannot nd the needed number of votes at boot time freezes waiting for other nodes to join to obtain the needed vote count. A node that is booted in the cluster but can no longer nd the needed number of votes kernel panics.
Node 1 (1)
Node 2 (1)
Figure 3-8
3-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
Node 4 (1)
Figure 3-9
There are many possible split-brain scenarios. Not all of the possible split-brain combinations allow the continuation of clustered operation. The following is true for a clustered pair conguration without the extra quorum device shown in Figure 3-9.
q q q
There are six possible votes. A quorum is four votes. If both quorum devices fail, the cluster can still come up. The nodes wait until all are present (booted). If Nodes 1 and 2 fail, there are not enough votes for Nodes 3 and 4 to continue running. A token quorum device between Nodes 2 and 3 can eliminate this problem. A Sun StorEdge MultiPack desktop array could be used for this purpose.
An entire pair of nodes can fail, and there are still four votes out of seven.
3-19
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
Node 4 (1)
QD(1)
QD(1) QD(1)
Figure 3-10 Pair+N Quorum Devices The following is true for the Pair+N conguration shown in Figure 3-10:
q q q q q q
There are three quorum disks. There are seven possible votes. A quorum is four votes. Nodes 3 and 4 do not have access to any quorum devices. Nodes 1 or 2 can start clustered operation by themselves. Up to three nodes can fail (Nodes 1, 3, and 4 or Nodes 2, 3, and 4), and clustered operation can continue.
3-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
QD(1)
QD(1)
Figure 3-11 N+1 Quorum Devices The following is true for the N+1 conguration shown in Figure 3-11:
q q q
There are ve possible votes. A quorum is three votes. If Nodes 1 and 2 fail, Node 3 can continue.
3-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
QD(2)
Figure 3-12 Quorum Devices in the Scalable Storage Topology The following is true for the quorum devices shown in Figure 3-12:
q
The single quorum device has a vote count equal to the votes of the nodes directly attached to it minus one.
Note This rule is universal. In all the previous examples, there were two nodes (with one vote each) directly to the quorum device, so that the quorum device had one vote.
q q
The mathematics and consequences still apply. The reservation is performed using a SCSI-3 Persistent Group Reservation (see SCSI-3 Persistent Group Reservation (PGR) on page 3-32). If, for example, Nodes 1 and 3 can intercommunicate but Node 2 is isolated, Node 1 or 3 can reserve the quorum device on behalf of both of them.
3-22
Note It would seem that in the same race, Node 2 could win and eliminate both Nodes 2 and 3. Intentional Reservation Delays for Partitions With Fewer Than Half of the Nodes on page 3-34 shows why this is unlikely.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
3-23
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
Node 4 (1)
3-24
Describing Quorum Votes and Quorum Devices On the NetApp Filer side, the requirements for operation as a Sun Cluster quorum device are as follows:
q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You must install the iSCSI license from your NAS device vendor. You must congure an iSCSI Logical Unit (LUN) for use as the quorum device. When booting the cluster, you must always boot the NAS device before you boot the cluster nodes.
The iSCSI functionality is built into the NTAPclnas package that must be installed on each cluster node. A cluster can use a NAS device for only a single quorum device. There should be no need for other quorum devices. This is true because, in a cluster of more than two nodes, the quorum acts like a SCSI-3 quorum device attached to all the nodes. Multiple clusters using the same NAS device can use separate iSCSI LUNs on that device as their quorum devices.
3-25
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
Node 4 (1)
scqsd daemon (3 votes for this cluster) [ can be quorum for other clusters too ] External machine running Quorum Server Software
Figure 3-14 Quorum Server Quorum Devices The characteristics of the quorum server quorum device are:
q
The same quorum server daemon can be used as a quorum device for an unlimited number of clusters. The quorum server software must be installed separately on the server (external side). No additional software is necessary on the cluster side.
3-26
A quorum server is especially useful when there is a great physical distance between quorum nodes. It would be an ideal solution for a cluster using the data replication topology. Quorum server can be used on any cluster where you prefer the logic of having a single cluster quorum device with quorum device votes automatically assigned to be one fewer than the node votes. For example, with a clustered pairs topology, you might prefer the simplicity of a quorum server quorum device. In that example, any single node could boot into the cluster by itself, if it could access the quorum server. Of course, you might not be able to run clustered applications unless the storage for a particular application is also available, but those relationships can be controlled properly by the application resource dependencies that we will learn about in Module 9.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
3-27
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
In this simple scenario, the problem is that if you were allowed to boot Node 2 at the end, it would not have the correct copy of the CCR. Node 2 would have to use the copy that it has (because there is no other copy available) and you would lose the changes to the cluster conguration made in Step 2. The Sun Cluster software quorum involves persistent reservations that prevent Node 2 from booting into the cluster. It is not able to count the quorum device as a vote. Therefore, Node 2 waits until the other node boots to achieve the correct number of quorum votes.
3-28
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Survives even if all nodes connected to the device are reset Survives even after the quorum device itself is powered on and off
Clearly, this involves writing some type of information on the disk itself. The information is called a reservation key and is as follows:
q q
Each node is assigned a unique 64-bit reservation key value. Every node that is physically connected to a quorum device has its reservation key physically written onto the device. This set of keys is called the keys registered on the device. Exactly one nodes key is recorded on the device as the reservation holder, but this node has no special priviliges greater than any other registrant. You can think of the reservation holder as the last node to ever manipulate the keys, but the reservation holder can later be fenced out by another registrant.
Figure 3-15 and Figure 3-16 on page 3-30 consider two nodes booted into a cluster connected to a quorum device.
Node 1
Node 2
3-29
Preventing Cluster Amnesia With Persistent Reservations If Node 1 needs to fence out Node 2 for any reason it will preempt Node 2s registered key off of the device. If a nodes key is preempted from the device, it is fenced from the device. If there is a split brain, each node is racing to preempt the others key.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Node 1
Node 2
Figure 3-16 Node 2 Leaves the Cluster Now the rest of the equation is clear. The reservation is persistent, so if a node is booting into the cluster, a node cannot count the quorum device vote unless its reservation key is already registered on the quorum device. Therefore, in the scenario illustrated in the previous paragraph, if Node 1 subsequently goes down so there are no remaining cluster nodes, only Node 1s key remains registered on the device. If Node 2 tries to boot rst into the cluster, it will not be able to count the quorum vote, and must wait for Node 1 to boot. After Node 1 joins the cluster, it can detect Node 2 across the transport and add Node 2s reservation key back to the quorum device so that everything is equal again. A reservation key only gets added back to a quorum device by another node in the cluster whose key is already there.
3-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Disks to which there are exactly two paths use SCSI-2. Disks to which there are more than two paths (for example, any disk with physical connections from more than two nodes) must use SCSI-3.
Module 5 describes how you can change the rst of these policies so that you could use SCSI-3 even for disks with only two paths. The next several subsections outline the differences between SCSI-2 and SCSI-3 reservations
The persistent reservations are not supported directly by the SCSI-2 command set. Instead, they are emulated by the Sun Cluster software. Reservation keys are written (by the Sun Cluster software, not directly by the SCSI reservation mechanism) on private cylinders of the disk (cylinders that are not visible in the format command, but are still directly writable by the Solaris OS). The reservation keys have no impact on using the disk as a regular data disk, where you will not see the private cylinders.
The race (for example, in a split brain scenario) is still decided by a normal SCSI-2 disk reservation. It is not really a race to eliminate the others key, it is a race to do a simple SCSI-2 reservation. The winner will then use PGRE to eliminate the others reservation key.
3-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The persistent reservations are implemented directly by the SCSI-3 command set. Disk rmware itself must be fully SCSI-3 compliant. Removing another nodes reservation key is not a separate step from physical reservation of the disk, as it is in SCSI-2. With SCSI-3, the removal of the other nodes key is both the fencing and the amnesia prevention. SCSI-3 reservations are generally simpler in the cluster because everything that the cluster needs to do (both fencing and persistent reservations to prevent amnesia) is done directly and simultaneously with the SCSI-3 reservation mechanism. With more than two disk paths (that is, any time more than two nodes are connected to a device), Sun Cluster must use SCSI-3.
Node 1
Node 2
Node 3
Node 4
Quorum Device Node 1 key Node 2 key Node 3 key Node 4 key
3-32
Preventing Cluster Amnesia With Persistent Reservations Now imagine that because of multiple transport failures there is a partitioning where Nodes 1 and 3 can see each other over the transport and Nodes 2 and 4 can see each other over the transport. In each pair, the node with the lower reservation key tries to eliminate the registered reservation key of the other pair. The SCSI-3 protocol assures that only one pair will remain registered (the operation is atomic). This is shown in Figure 3-18.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Node 1
Node 2
Node 3
Node 4
Figure 3-18 One Pair of Nodes Eliminating the Other In the diagram, Node 1 has successfully won the race to eliminate the keys for Nodes 2 and 4. Because Nodes 2 and 4 have their reservation key eliminated, they cannot count the three votes of the quorum device. Because they fall below the needed quorum, they will kernel panic. Cluster amnesia is avoided in the same way as in a two-node quorum device. If you now shut down the whole cluster, Node 2 and Node 4 cannot count the quorum device because their reservation key is eliminated. They would have to wait for either Node 1 or Node 3 to join. One of those nodes can then add back reservation keys for Node 2 and Node 4.
3-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Intentional Reservation Delays for Partitions With Fewer Than Half of the Nodes
Imagine the same scenario just presented, but three nodes can talk to each other while the fourth is isolated on the cluster transport, as shown in Figure 3-19.
Node 1
Node 2
Node 4
Quorum Device Node 1 key Node 2 key Node 3 key Node 4 key
Figure 3-19 Node 3 Is Isolated on the Cluster Transport Is there anything to prevent the lone node from eliminating the cluster keys of the other three and making them all kernel panic? In this conguration, the lone node intentionally delays before racing for the quorum device. The only way it can win is if the other three nodes are really dead (or each of them is also isolated, which would make them all delay the same amount).
3-34
Preventing Cluster Amnesia With Persistent Reservations The delay is implemented when the number of nodes that a node can see on the transport (including itself) is fewer than half the total nodes.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
3-35
Data Fencing
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Data Fencing
As an extra precaution, nodes that are eliminated from the cluster because of quorum problems also lose access to all shared data devices. The reason for this is to eliminate a potential timing problem. The node or nodes that remain in the cluster have no idea whether the nodes being eliminated from the cluster are actually still running. If they are running, they will have a kernel panic (after they recognize that they have fallen beneath the required quorum votes). However, the surviving node or nodes cannot wait for the other nodes to kernel panic before taking over the data. The reason nodes are being eliminated is that there has been a communication failure with them. To eliminate this potential timing problem, which could otherwise lead to data corruption, before a surviving node or nodes recongures the data and applications, it fences the eliminated node or nodes from all shared data devices, in the following manner:
q
With the default pathcount policy, SCSI-2 reservation is used for two-path devices and SCSI-3 for devices with more than two paths. You can change the policy to use SCSI-3 even if there are only two paths (see Module 5). If you do use the default SCSI-2 for a two-path device, data fencing is just the reservation and does not include any PGRE. For NetApp Filer NAS devices, a surviving node informs the NAS device to eliminate the NFS share from the eliminated node or nodes.
Data fencing is released when a fenced node is able to boot successfully into the cluster again. Note This data fencing is the reason you absolutely can not put any boot device in a shared storage array. If you do so, a node that is eliminated from the cluster is fenced off from its own boot device.
3-36
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
System board
hme2
hme2
System board
Figure 3-20 Point-to-Point Cluster Interconnect During the Sun Cluster 3.2 software installation, you must provide the names of the end-point interfaces for each cable. Caution If you provide the wrong interconnect interface names during the initial Sun Cluster software installation, the rst node is installed without errors, but when you try to manually install the second node, the installation hangs. You have to correct the cluster conguration error on the rst node and then restart the installation on the second node.
3-37
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1
Node 2
Node 3
Node 4
Note If you specify more than two nodes during the initial portion of the Sun Cluster software installation, the use of switches is assumed.
3-38
Configuring a Cluster Interconnect The netmask property associated with the entire cluster transport describes, together with the base address, the entire range of addresses associated with the transport. For example, if you used the default base address of 172.16.0.0 and the default netmask of 255.255.248.0, you would be dedicating an 11-bit range (255.255.248.0 has 11 0s at the end) to the cluster transport. This range is 172.16.0.0 172.16.7.255. Note When you set 255.255.248.0 as the cluster transport netmask you will not see this netmask actually applied to any of the private network adapters. Once again, the cluster uses this netmask to dene the entire range it has access to, and then subdivides the range even further to cover the multiple separate networks that make up the cluster transport.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Choosing the Cluster Transport Netmask Based on Anticipated Nodes and Private Subnets
While you can choose the cluster transport netmask by hand, the cluster prefers instead that you specify:
q q
The maximum anticipated number of private networks The maximum anticipated number of nodes
The Sun Cluster software uses a formula to derive a suggested netmask. Sun Clusters actual needs are:
q
Physical subnets (the subnets for the actual private network switches or crossover cables) with a capacity of the actual number of nodes, plus the all zeros and all ones addresses. If you say your anticipated maximum is eight nodes, Sun Cluster needs to use subnets that have four bits (not three) for the host part because you need to allow for the all zeros and all ones addresses.
A single additional virtual subnet corresponding to the addresses on the clprivnet0 adapters. This needs more capacity than just the number of nodes because you can actually choose to give non-global zones their own IP addresses on this network as well.
When you say you want P physical private network subnets, the cluster uses a conservative formula (4/3)*(P + 2) subnets to calculate the anticipated space it needs for subnetting (including the wider clprivnet0 subnet). This should just be considered a conservative formula.
3-39
Configuring a Cluster Interconnect For example, if you say your absolute maximum is eight nodes and four physical subnets, Sun Cluster will calculate:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Four bits required for the host part for actual physical subnets (as above). Three bits required for subnetting (apply the formula (4/3)* (4+2) and you get eight subnets, for which there are three bits required. The total bits is 4+3 or a total of seven bits of private network space. So the suggested netmask will have seven bits of zeros, which is 255.255.255.128.
Note In Sun Cluster 3.2, if you want to restrict private network addresses with a class C-like space, similar to 192.168.5.0, you can do it easily even with relatively large numbers of nodes and subnets. In the above example, you could support eight nodes and as many as four private networks all in the space 192,168.5.0 - 192.168.5.127.
3-40
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# ifconfig -a lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 129.200.9.2 netmask ffffff00 broadcast 129.200.9.255 ether 8:0:20:96:3:86 3. Choose a candidate interface (at this point, it might be an actual private net, or one ready to be set up as a secondary public net, or when not connected to anything at all).
3-41
Configuring a Cluster Interconnect 4. Make sure that the interface you are looking at is not actually on the public net: a. In one window run the snoop command for the interface in question: # ifconfig qfe1 plumb # snoop -d qfe1 hope to see no output here b. In another window, ping the public broadcast address, and make sure no trafc is seen by the candidate interface. # ping public_net_broadcast_address If you do see snoop output, you are looking at a public network adapter. Do not continue. Just return to step 3 and pick a new candidate. 5. Now that you know your adapter is not on the public net, check to see if it is connected on a private net. Make up some unused subnet address just to test out interconnectivity across a private network. Do not use addresses in the existing public subnet space. # ifconfig qfe1 192.168.1.1 up 6. Perform Steps 4 and 5 to try to guess the matching candidate on the other node. Choose a corresponding IP address, for example 192.168.1.2. Test that the nodes can ping across each private network, for example: # ping 192.168.1.2 192.168.1.2 is alive 8. After you have identied the new network interfaces, bring them down again. Cluster installation fails if your transport network interfaces are still up from testing. # ifconfig qfe1 down unplumb 9. Repeat Steps 3 through 8 with transport adapter candidates for the second cluster transport. Repeat again if you are conguring more than two cluster transports.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7.
3-42
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Making sure it is not one of those you identied to be used as the private transport Making sure it can snoop public network broadcast trafc # ifconfig ifname plumb # snoop -d ifname (other window or node)# ping -s pubnet_broadcast_addr
3-43
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
3-44
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Verifying the Solaris OS Task 2 Identifying a Cluster Topology Task 3 Selecting Quorum Devices Task 4 Verifying the Cluster Interconnect Conguration Task 5 Selecting Public Network Interfaces
Preparation
To begin this exercise, you must be connected to the cluster hosts through the cconsole tool, and you should be logged into them as user root. Note During this exercise, when you see italicized variable names, such as IPaddress, enclosure_name, node1, or clustername embedded in a command string, substitute the names appropriate for your cluster.
3.
3-45
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Table 3-1 Topology Conguration Number of nodes Number of storage arrays Types of storage arrays 2. Verify that the storage arrays in your cluster are properly connected for your target topology. Recable the storage arrays if necessary.
3-46
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You can use the strategy presented on page 3-41, if you are remote from the cluster equipment, you want to pretend you are remote, or your instructor does not want to tell you so that you gain experience doing it yourself. Do not use IP addresses in the existing public network space. 2. Complete the form in Figure 3-22 if your cluster uses an Ethernet-based point-to-point interconnect conguration.
Node 2 Primary Interconnect Interface Secondary Interconnect Interface
Figure 3-22 Ethernet Interconnect Point-to-Point Form Perform the following steps to congure a switch-based Ethernet interconnect: 3. Complete the form in Figure 3-23 on page 3-48 if your cluster uses an Ethernet-based cluster interconnect with switches. a. Record the logical names of the cluster interconnect interfaces (hme2, qfe1, and so forth).
3-47
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You can use the strategy presented on page 3-41, if you are remote from the cluster equipment, you want to pretend you are remote, or your instructor does not want to tell you so that you gain experience doing it yourself. b. Add or delete nodes to the diagram as appropriate.
Node 1 Primary interconnect interface Secondary interconnect interface Node 3 Primary interconnect interface Secondary interconnect interface
Switch 1
Node 2 Primary interconnect interface Secondary interconnect interface Node 4 Primary interconnect interface Secondary interconnect interface
Switch 2
Figure 3-23 Ethernet Interconnect With Switches Form 4. Verify that each Ethernet interconnect interface is connected to the correct switch. If you are remote, and you do not want to take your instructors word for it, you can verify that all nodes can ping each other across the private switches by using the strategy presented on page 3-41.
Note If you have any doubt about the interconnect cabling, consult with your instructor now. Do not continue this exercise until you are condent that your cluster interconnect system is cabled correctly, and that you know the names of the cluster transport adapters.
3-48
Exercise: Preparing for Installation Perform the following steps to select public network interfaces:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
1.
Record the logical names of potential IPMP Ethernet interfaces on each node in Table 3-2.
Note You can use the strategy presented on page 3-43, if you are remote from the cluster equipment, you want to pretend you are remote, or your instructor does not want to tell you so that you gain experience doing it yourself. Do not use IP addresses in the existing public network range. Table 3-2 Logical Names of Potential IPMP Ethernet Interfaces System Node 1 Node 2 Node 3 (if any) Node 4 (if any) Primary IPMP interface Backup IPMP interface
Note It is important that you are sure about the logical name of each public network interface (hme2, qfe3, and so on).
3-49
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
3-50
Module 4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Understand the Sun Cluster installation and conguration steps Install the Sun Cluster packages using the Sun Java Enterprise System (Java ES) installer Describe the Sun Cluster framework conguration Congure a cluster installation using all-at-once and typical modes Congure a cluster installation using one-at-a-time and custom modes Congure additional nodes for the one-at-a-time method Describe the Solaris OS les and settings that are automatically congured by scinstall Perform automatic quorum conguration Describe the manual quorum selection Perform post-installation conguration
q q q
q q
q q q
4-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
What advantages are we given by standardizing on the Java Enterprise System (Java ES) installer as the utility to lay down the cluster packages? Why might you not want the conguration utilities to automatically choose a quorum device for you for a two-node cluster? Are there pieces of the cluster conguration you might need to congure manually after running the conguration utility scinstall?
4-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971. Sun Cluster Software Installation Guide for Solaris OS, part number 8192970.
4-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
It is required that these be done as separate steps, although both steps can be automated as part of Solaris jumpstart.
Using the Java ES installer utility that is always packaged with the Sun Cluster framework (this can also be automated as part of a jumpstart install). Using a ash image that was created on a system where the Sun Cluster packages have been installed using the Java ES installer. The ash image must have been created before the cluster was actually congured with scinstall.
The full collection of Java ES software (Sun Cluster, Java System Web Server, Messaging Server, Directory Server, and so on). A partial set of Java ES software, containing only availability-related products. This is called the Java ES Availability Suite.
Two CD-ROMs, that combined make up the software distribution for either SPARC or x86 versions A single DVD, containing the distribution for both SPARC and x86
4-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
OS and hardware patches These must be installed before the Sun Cluster framework packages are installed. OS patches should be consistent between cluster nodes, although when you add new ones after cluster conguration it is almost always possible to do so in a rolling fashion, patching and rebooting one node at a time.
Sun Cluster framework patches Available patches should be added after the Sun Cluster framework packages are installed but before the Sun Cluster framework is congured. You add these patches manually before you congure the Sun Cluster framework with scinstall. If you need to add new patches after the cluster is already installed and your applications are already running, you will usually be able to do so in a rolling fashion. Sun Cluster Data Service Agent patches You cannot add these until you install the particular agent being patched.
4-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The Common Agent Container (CACAO) This is a Java application that serves as a container for a Sun Cluster remote management interface that is implemented as a series of Java objects. In the standard Sun Cluster implementation, this interface is used only by the Sun Cluster Manager web interface and by the data service wizards that are part of clsetup.
q q
Java Runtime Environment 1.5 The Java Management Development Kit (JMDK) runtime libraries These Java libraries are required by the management objects. Java Activation Framework These are libraries that allow run-time inspection of Java classes Java Assisted Take-off (JATO) This is a Web framework based on the Java language used by the Sun Cluster Manager.
Java Mail This is a Java library implementation for sending electronic mail. Ant This is a Java-based utility for compilation of projects (sort of a platform-independent make)
The Sun Java Web Console Application 3.0.2 Sun Java Web Console serves as a single point of entry and single sign-on (SSO) for a variety of Web-based applications, including the Sun Cluster Manager used in the Sun Cluster software.
4-8
Auxiliary Components Listed as Separate Applications Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
For the initial release of Sun Cluster 3.2, there is one component listed on the Java ES install screens as a separate application: This is Java DB, Suns implementation of the Apache pure-Java database standard known as Derby. Like the shared components, it is required to allow the Java ES installer to install or upgrade the Java DB software as you install Sun Cluster.
4-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 4-1
4-10
Component Selection Screen Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You can choose the Java ES components that you want. The screen shown in Figure 4-2 is stretched to show the entire component selection screen from the Java ES Availability Suite distribution. Recall that this distribution includes only the availability products. In the screenshot, Sun Cluster 3.2 is selected. This includes the Sun Cluster core product and the Sun Cluster Manager. A check box near the bottom of the screen allows you to install multilingual support. Adding the multilingual packages for Sun Cluster can signicantly increase the time required for your installation. Note that you do not need to explicitly select the All Shared Components. A later screen will inform you about which of these need to be installed or upgraded, Note that the Java DB is listed separately, and noted as required since we have chosen to install Sun Cluster.
Figure 4-2
4-11
Shared Components Screen Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The screen shown in Figure 4-3 shows only the shared components which the Java ES installer has determined must be installed or upgraded. Note in this example some shared components listed on a previous page of this module, such as JRE 1.5 and Web Console are not shown. This is because the base OS we are running (Solaris 10 Update 3) already contains the correct versions of these components; this determination is made automatically, correctly, and silently by the Java ES installer.
Figure 4-3
4-12
Java ES Conguration: Congure Later Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
For some of the other Java System applications, the Java ES installer has the ability to actually congure the applications. It does not have this ability for the Sun Cluster framework, and you must choose the Congure Later option, as shown in Figure 4-4.
Figure 4-4
Note If you choose Configure Now you will get a screen saying that this option is not applicable for Sun Cluster. However, you will also get an extra screen asking if you want to enable the remote conguration utility (discussed later). If you choose Configure Later as is shown, here, the installer assumes you do want to enable remote conguration.
Software Installation
After conrmation, the installer proceeds to install the shared components (the auxiliary software identied on page 4-8), and then the actual Sun Cluster framework packages. You do not need to reboot your node before proceeding to the conguration of the Sun Cluster framework. You will be allowing the conguration utility, scinstall, to reboot all of your nodes.
4-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Using the scinstall utility interactively This is the most common method of conguring Sun Cluster, and the only one that is described in detail in this module. Using JumpStart software off a jumpstart server The scinstall can be run on the jumpstart server to provision the jumpstart server so that the client (node) performs cluster conguration as part of jumpstart nish scripts. If you used this method, you would manually provision the jumpstart server one of two ways:
q
Provide clients a ash image from a cluster node on which the Java ES installer had been used to install the cluster software packages. The jumpstart nish scripts then run the scinstall conguration functionality. Provide the clients a completely fresh OS (or a ash image from a Solaris without cluster packages.). The jumpstart nish scrips then run the Java ES installer to provide the cluster packages, and then run the scinstall conguration functionality.
The rst node installed (node ID 1) has a quorum vote of one. All other nodes have a quorum vote of zero.
This allows you to complete the rebooting of the second node into the cluster while maintaining the quorum mathematics rules. If the second node had a vote (making a total of two in the cluster), the rst node would kernel panic when the second node was rebooted after the cluster software was installed because the rst node would lose operational quorum.
4-15
Sun Cluster Framework Configuration One important side effect of the installmode ag is that you must be careful not to reboot the rst node (node ID 1) until you can choose quorum devices and eliminate (reset) the installmode ag. If you accidentally reboot the rst node, all the other nodes will kernel panic because they have zero votes out of a possible total of one. If the installation is a single-node cluster, the installmode ag is not set. Post-installation steps to choose a quorum device and reset the installmode ag are unnecessary.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You want to choose the quorum yourself. You have a dual-ported disk or LUN that is not capable of being a quorum device. You want to use a NAS device as a quorum device. You want to use the quorum server as a quorum device.
q q
4-16
Automatic Reset of installmode Without Quorum Devices (Clusters With More Than Two Nodes Only)
In clusters with more than two nodes, the scinstall inserts a script to automatically reset the installmode ag. It will not automatically congure a quorum device. You still have to do that manually after the installation. By resetting installmode, each node is assigned its proper single quorum vote.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The nodes in the cluster must be able to resolve each others host name. If for some reason this is true but the names are not in each nodes /etc/hosts le (the names can be resolved through only NIS or DNS, for example), the /etc/hosts le is automatically modied to include these names.
4-17
Cluster Transport IP Network Number and Netmask Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
As described in module 3, the default cluster transport IP address range begins with 172.16.0.0 and the netmask is 255.255.248.0. You should keep the default if it causes no conict with anything else visible on any other other networks. It is perfectly ne for multiple clusters to use the same addresses on their cluster transports, as these addresses are not visible anywhere outside the cluster. Note The netmask refers to the range of IP addresses that are reserved for all possible cluster transport addresses. This will not match the actual netmask that you will see congured on the transport adapters if you check using ifconfig -a. If you need to specify a different IP address range for the transport, you can do so. Rather than being asked initially for a specic netmask, you will be asked for the anticipated maximum number of nodes and physical private networks. A suggested netmask is then calculated for you.
4-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-19
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Using DES Authentication for Nodes to Authenticate With Each Other as They Join the Cluster
By default, nodes are authenticated as they join the cluster using standard UNIX authentication. A reverse IP address lookup is done for a node trying to join the cluster, and if the resulting name is in the list of nodes that you typed in as nodes for this cluster, it is allowed to add itself to the cluster. The reasons for considering more stringent authentication are the following:
q
Nodes that are adding themselves to the cluster communicate across the public network. A bogus node adding itself to the cluster can add bogus information to the CCR of existing nodes, or in fact, copy out any le from existing nodes.
DES authentication, also known as secure remote procedure call (secure RPC) authentication, is a much stronger authentication that cannot be spoofed by something simple like spoong IP addresses.
4-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The node you are driving from will be the last node to join the cluster because it needs to congure and reboot all the other nodes rst. If you care about which node IDs are assigned to the nodes, you should drive from the node that you want to be the highest node ID, and list the other nodes in reverse order. The Sun Cluster software packages must already be installed on all nodes (by the Java ES installer). You therefore do not need remote shell access (neither rsh nor ssh) between the nodes. The remote conguration is performed using RPC installed by the Sun Cluster packages. If you are concerned about authentication, you can use DES authentication as described on page 4-20.
4-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
It uses network address 172.16.0.0 with netmask 255.255.248.0 for the cluster interconnect. It assumes that you want to perform autodiscovery of cluster transport adapters on the other nodes with the all-at-once method. (On the one-node-at-a-time method, it asks you if you want to use autodiscovery in both Typical and Custom modes.) It uses the names switch1 and switch2 for the two transport switches and assumes the use of switches even for a two-node cluster. It assumes that you have the standard empty /globaldevices mounted as an empty le system with the recommended size (512 Mbytes) to be remounted as the /global/.devices/node@# le system. It assumes that you want to use standard system authentication (not DES authentication) for new nodes conguring themselves into the cluster.
4-22
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
* ?) Help with menu options * q) Quit Option: 1 *** New Cluster and Cluster Node Menu *** Please select from any one of the following options: 1) Create a new cluster 2) Create just the first node of a new cluster on this machine 3) Add this machine as a node in an existing cluster ?) Help with menu options q) Return to the Main Menu Option: 1
4-23
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
This option creates and configures a new cluster. You must use the Java Enterprise System (JES) installer to install the Sun Cluster framework software on each machine in the new cluster before you select this option. If the "remote configuration" option is unselected from the JES installer when you install the Sun Cluster framework on any of the new nodes, then you must configure either the remote shell (see rsh(1)) or the secure shell (see ssh(1)) before you select this option. If rsh or ssh is used, you must enable root access to all of the new member nodes from this node. Press Control-d at any time to return to the Main Menu.
yes
4-24
Configuring Using All-at-Once and Typical Modes: Example ?) Help q) Return to the Main Menu Option [1]: 1
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Cluster Nodes <<< This Sun Cluster release supports a total of up to 16 nodes. Please list the names of the other nodes planned for the initial cluster configuration. List one node name per line. When finished, type Control-D: Node name (Control-D to finish): vincent Node name (Control-D to finish): ^D
This is the complete list of nodes: vincent theo Is it correct (yes/no) [yes]? yes Attempting to contact "vincent" ... done Searching for a remote configuration method ... done The Sun Cluster framework is able to complete the configuration process without remote shell access.
4-25
Cluster Transport Adapters Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Cluster Transport Adapters and Cables <<< You must identify the two cluster transport adapters which attach this node to the private cluster interconnect. Select the first cluster transport adapter for "theo": 1) 2) 3) 4) hme0 qfe0 qfe3 Other
Option: 1 Searching for any unexpected network traffic on "hme0" ... done Verification completed. No traffic was detected over a 10 second sample period. Select the second cluster transport adapter for "theo": 1) 2) 3) 4) hme0 qfe0 qfe3 Other
Option: 2
4-26
Cluster Transport Adapters (Tagged VLAN-Capable Adapters) Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The following shows a slight difference in the questionnaire when you have private network adapters capable of tagged VLANs. In this example, you do not have a shared physical adapter, and therefore you do not need to use the tagged VLAN feature. >>> Cluster Transport Adapters and Cables <<< You must identify the two cluster transport adapters which attach this node to the private cluster interconnect. Select the first cluster transport adapter for "dani": 1) bge1 2) ce1 3) Other Option: 1 Will this be a dedicated cluster transport adapter (yes/no) [yes]?yes Searching for any unexpected network traffic on "bge1" ... done Verification completed. No traffic was detected over a 10 second sample period. Select the second cluster transport adapter for "dani": 1) bge1 2) ce1 3) Other Option: 2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?yes Searching for any unexpected network traffic on "ce1" ... done Verification completed. No traffic was detected over a 10 second sample period.
4-27
Cluster Transport Adapters (Actually Using Shared Adapters) Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
In the following variation, you actually choose an adapter for the private network that is already congured (already using tagged VLAN) for the public network.
>>> Cluster Transport Adapters and Cables <<< You must identify the two cluster transport adapters which attach this node to the private cluster interconnect. Select the first cluster transport adapter for "dani": 1) 2) 3) 4) bge1 ce0 ce1 Other 2
Option:
This adapter is used on the public network also, you will need to configure it as a tagged VLAN adapter for cluster transport. What is the cluster transport VLAN ID for this adapter? 5
Searching for any unexpected network traffic on "ce5000" ... done Verification completed. No traffic was detected over a 10 second sample period.
4-28
Automatic Quorum Conguration Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Quorum Configuration <<< Every two-node cluster requires at least one quorum device. By default, scinstall will select and configure a shared SCSI quorum disk device for you. This screen allows you to disable the automatic selection and configuration of a quorum device. The only time that you must disable this feature is when ANY of the shared storage in your cluster is not qualified for use as a Sun Cluster quorum device. If your storage was purchased with your cluster, it is qualified. Otherwise, check with your storage vendor to determine whether your storage device is supported as Sun Cluster quorum device. If you disable automatic quorum device selection now, or if you intend to use a quorum device that is not a shared SCSI disk, you must instead use scsetup(1M) to manually configure quorum once both nodes have joined the cluster for the first time. Do you want to disable automatic quorum device selection (yes/no) [no]? no
4-29
Installation and Conguration Messages Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Installation and Configuration Log file - /var/cluster/logs/install/scinstall.log.20852 Testing for "/globaldevices" on "theo" ... done Testing for "/globaldevices" on "vincent" ... done Starting discovery of the cluster transport configuration. The following connections were discovered: theo:hme0 theo:qfe0 switch1 switch2 vincent:hme0 vincent:qfe0
Completed discovery of the cluster transport configuration. Started sccheck on "theo". Started sccheck on "vincent". sccheck completed with no errors or warnings for "theo". sccheck completed with no errors or warnings for "vincent".
Configuring "vincent" ... done Rebooting "vincent" ... Configuring "theo" ... done Rebooting "theo" ... Log file - /var/cluster/logs/install/scinstall.log.20852
Rebooting ...
4-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
*** New Cluster and Cluster Node Menu *** Please select from any one of the following options: 1) Create a new cluster 2) Create just the first node of a new cluster on this machine 3) Add this machine as a node in an existing cluster ?) Help with menu options q) Return to the Main Menu Option: 2
4-31
First Node Introduction Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
*** Establish Just the First Node of a New Cluster ***
This option is used to establish a new cluster using this machine as the first node in that cluster. Before you select this option, the Sun Cluster framework software must already be installed. Use the Java Enterprise System (JES) installer to install Sun Cluster software. Press Control-d at any time to return to the Main Menu.
4-32
Cluster Name Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Cluster Name <<< Each cluster has a name assigned to it. The name can be made up of any characters other than whitespace. Each cluster name should be unique within the namespace of your enterprise. What is the name of the cluster you want to establish? orangecat
4-33
Cluster Nodes Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Cluster Nodes <<< This Sun Cluster release supports a total of up to 16 nodes. Please list the names of the other nodes planned for the initial cluster configuration. List one node name per line. When finished, type Control-D: Node name (Control-D to finish): Node name (Control-D to finish): theo ^D
This is the complete list of nodes: vincent theo Is it correct (yes/no) [yes]? yes
4-34
Transport IP Address Range and Netmask Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Network Address for the Cluster Transport <<< The cluster transport uses a default network address of 172.16.0.0. If this IP address is already in use elsewhere within your enterprise, specify another address from the range of recommended private addresses (see RFC 1918 for details). The default netmask is 255.255.248.0. You can select another netmask, as long as it minimally masks all bits that are given in the network address. The default private netmask and network address result in an IP address range that supports a cluster with a maximum of 64 nodes and 10 private networks. Is it okay to accept the default network address (yes/no) [yes]? yes Is it okay to accept the default netmask (yes/no) [yes]? no The combination of private netmask and network address will dictate the maximum number of both nodes and private networks that can be supported by a cluster. Given your private network address, this program will generate a range of recommended private netmasks that are based on the maximum number of nodes and private networks that you anticipate for this cluster. In specifying the anticipated maximum number of nodes and private networks for this cluster, it is important that you give serious consideration to future growth potential. While both the private netmask and network address can be changed later, the tools for making such changes require that all nodes in the cluster be booted in noncluster mode. Maximum number of nodes anticipated for future growth [64]? 8 Maximum number of private networks anticipated for future growth [10]? 4 Specify a netmask of 255.255.255.128 to meet anticipated future requirements of 8 cluster nodes and 4 private networks. To accommodate more growth, specify a netmask of 255.255.254.0 to support up to 16 cluster nodes and 8 private networks.
255.255.255.128
4-35
Choosing Whether to Dene Switches and Switch Names Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Point-to-Point Cables <<< The two nodes of a two-node cluster may use a directly-connected interconnect. That is, no cluster switches are configured. However, when there are greater than two nodes, this interactive form of scinstall assumes that there will be exactly two cluster switches. Does this two-node cluster use switches (yes/no) [yes]? yes
>>> Cluster Switches <<< All cluster transport adapters in this cluster must be cabled to a "switch". And, each adapter on a given node must be cabled to a different switch. Interactive scinstall requires that you identify two switches for use in the cluster and the two transport adapters on each node to which they are cabled. What is the name of the first switch in the cluster [switch1]? <CR> What is the name of the second switch in the cluster [switch2]? <CR>
Option:
4-36
Configuring Using One-at-a-Time and Custom Modes: Example (First Node) Searching for any unexpected network traffic on "hme0" ... done Verification completed. No traffic was detected over a 10 second sample period. The "dlpi" transport type will be set for this cluster. Name of the switch to which "hme0" is connected [switch1]? <CR>
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Each adapter is cabled to a particular port on a switch. And, each port is assigned a name. You can explicitly assign a name to each port. Or, for Ethernet and Infiniband switches, you can choose to allow scinstall to assign a default name for you. The default port name assignment sets the name to the node number of the node hosting the transport adapter at the other end of the cable. Use the default port name for the "hme0" connection (yes/no) [yes]? yes
4-37
Second Transport Adapter Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Select the second cluster transport adapter: 1) 2) 3) 4) hme0 qfe0 qfe3 Other 2
Option:
Adapter "qfe0" is an Ethernet adapter. Searching for any unexpected network traffic on "qfe0" ... done Verification completed. No traffic was detected over a 10 second sample period. Name of the switch to which "qfe0" is connected [switch2]? switch2
Use the default port name for the "qfe0" connection (yes/no) [yes]? yes
4-38
Automatic Quorum Conguration (Two-Node Cluster) Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Quorum Configuration <<< Every two-node cluster requires at least one quorum device. By default, scinstall will select and configure a shared SCSI quorum disk device for you. This screen allows you to disable the automatic selection and configuration of a quorum device. The only time that you must disable this feature is when ANY of the shared storage in your cluster is not qualified for use as a Sun Cluster quorum device. If your storage was purchased with your cluster, it is qualified. Otherwise, check with your storage vendor to determine whether your storage device is supported as Sun Cluster quorum device. If you disable automatic quorum device selection now, or if you intend to use a quorum device that is not a shared SCSI disk, you must instead use scsetup(1M) to manually configure quorum once both nodes have joined the cluster for the first time. Do you want to disable automatic quorum device selection (yes/no) [no]? no
4-39
Automatic Reboot Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Automatic Reboot <<< Once scinstall has successfully initialized the Sun Cluster software for this machine, the machine must be rebooted. After the reboot, this machine will be established as the first node in the new cluster. Do you want scinstall to reboot for you (yes/no) [yes]? yes
Option Conrmation
>>> Confirmation <<< Your responses indicate the following options to scinstall: scinstall -i \ -C orangecat \ -F \ -T node=vincent,node=theo,authtype=sys \ -w netaddr=172.16.0.0,netmask=255.255.255.128,maxnodes=8,maxprivatenets=4 \ -A trtype=dlpi,name=hme0 -A trtype=dlpi,name=qfe0 \ -B type=switch,name=switch1 -B type=switch,name=switch2 \ -m endpoint=:hme0,endpoint=switch1 \ -m endpoint=:qfe0,endpoint=switch2 \ -P task=quorum,state=INIT Are these the options you want to use (yes/no) [yes]? yes Do you want to continue with the this configuration step (yes/no) [yes]? yes
4-40
Conguration Messages Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Some of the post-installation steps for which scinstall is printing out messages here (NTP, hosts le, nsswitch.conf le, and IPMP) are described later in this module. Checking device to use for global devices file system ... done Initializing Initializing Initializing Initializing Initializing Initializing Initializing Initializing Initializing cluster name to "orangecat" ... done authentication options ... done configuration for adapter "hme0" ... done configuration for adapter "qfe0" ... done configuration for switch "switch1" ... done configuration for switch "switch2" ... done configuration for cable ... done configuration for cable ... done private network address options ... done
Setting the node ID for "vincent" ... done (id=1) Checking for global devices global file system ... done Updating vfstab ... done Verifying that NTP is configured ... done Initializing NTP configuration ... done Updating nsswitch.conf ... done Adding cluster node entries to /etc/inet/hosts ... done Configuring IP multipathing groups ... done Verifying that power management is NOT configured ... done Unconfiguring power management ... done /etc/power.conf has been renamed to /etc/power.conf.062606112159 Power management is incompatible with the HA goals of the cluster. Please do not attempt to re-configure power management. Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done Ensure network routing is disabled ... done Log file - /var/cluster/logs/install/scinstall.log.3082 Rebooting ...
4-41
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
*** New Cluster and Cluster Node Menu *** Please select from any one of the following options: 1) Create a new cluster 2) Create just the first node of a new cluster on this machine 3) Add this machine as a node in an existing cluster ?) Help with menu options q) Return to the Main Menu Option: 3
4-42
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
This option is used to add this machine as a node in an already established cluster. If this is a new cluster, there may only be a single node which has established itself in the new cluster. Before you select this option, the Sun Cluster framework software must already be installed. Use the Java Enterprise System (JES) installer to install Sun Cluster software. Press Control-d at any time to return to the Main Menu.
4-43
Sponsoring Node Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
After entering the name of the sponsor node (rst node already booted into the cluster and the cluster name), the authentication and the cluster name are checked with the sponsor node. >>> Sponsoring Node <<< For any machine to join a cluster, it must identify a node in that cluster willing to "sponsor" its membership in the cluster. When configuring a new cluster, this "sponsor" node is typically the first node used to build the new cluster. However, if the cluster is already established, the "sponsoring" node can be any node in that cluster. Already established clusters can keep a list of hosts which are able to configure themselves as new cluster members. This machine should be in the join list of any cluster which it tries to join. If the list does not include this machine, you may need to add it by using claccess(1CL) or other tools. And, if the target cluster uses DES to authenticate new machines attempting to configure themselves as new cluster members, the necessary encryption keys must be configured before any attempt to join. What is the name of the sponsoring node? vincent
Cluster Name
>>> Cluster Name <<< Each cluster has a name assigned to it. When adding a node to the cluster, you must identify the name of the cluster you are attempting to join. A sanity check is performed to verify that the "sponsoring" node is a member of that cluster. What is the name of the cluster you want to join? Attempting to contact "vincent" ... done Cluster name "orangecat" is correct. Press Enter to continue: orangecat
4-44
Option for sccheck Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Check <<< This step allows you to run sccheck(1M) to verify that certain basic hardware and software pre-configuration requirements have been met. If sccheck(1M) detects potential problems with configuring this machine as a cluster node, a report of failed checks is prepared and available for display on the screen. Data gathering and report generation can take several minutes, depending on system configuration. Do you want to run sccheck (yes/no) [yes]? no
Probing ........ The following connections were discovered: vincent:hme0 vincent:qfe0 switch1 switch2 theo:hme0 theo:qfe0
4-45
Automatic Reboot and Option Conrmation Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
>>> Automatic Reboot <<< Once scinstall has successfully initialized the Sun Cluster software for this machine, the machine must be rebooted. The reboot will cause this machine to join the cluster for the first time. Do you want scinstall to reboot for you (yes/no) [yes]? yes >>> Confirmation <<< Your responses indicate the following options to scinstall: scinstall -i \ -C orangecat \ -N vincent \ -A trtype=dlpi,name=hme0 -A trtype=dlpi,name=qfe0 \ -m endpoint=:hme0,endpoint=switch1 \ -m endpoint=:qfe0,endpoint=switch2 Are these the options you want to use (yes/no) [yes]? yes
Do you want to continue with this configuration step (yes/no) [yes]? yes
4-46
Conguration Messages Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Checking device to use for global devices file system ... done Adding Adding Adding Adding Adding node "theo" to the cluster configuration ... done adapter "hme0" to the cluster configuration ... done adapter "qfe0" to the cluster configuration ... done cable to the cluster configuration ... done cable to the cluster configuration ... done
Copying the config from "vincent" ... done Copying the postconfig file from "vincent" if it exists ... done Setting the node ID for "theo" ... done (id=2) Verifying the major number for the "did" driver with "vincent" ... done Checking for global devices global file system ... done Updating vfstab ... done Verifying that NTP is configured ... done Initializing NTP configuration ... done Updating nsswitch.conf ... done Adding cluster node entries to /etc/inet/hosts ... done Configuring IP multipathing groups ... done Verifying that power management is NOT configured ... done Unconfiguring power management ... done /etc/power.conf has been renamed to /etc/power.conf.062606155915 Power management is incompatible with the HA goals of the cluster. Please do not attempt to re-configure power management. Ensure that the EEPROM parameter "local-mac-address?" is set to "true" ... done Ensure network routing is disabled ... done Updating file ("ntp.conf.cluster") on node vincent ... done Updating file ("hosts") on node vincent ... done Log file - /var/cluster/logs/install/scinstall.log.3196 Rebooting ...
4-47
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
/etc/hosts /etc/nsswitch.conf /etc/inet/ntp.conf.cluster /etc/hostname.xxx /etc/vfstab /etc/notrouter local-mac-address? Setting in electrically erasable programmable read-only memory (EEPROM) (SPARC only)
It makes sure the files keyword precedes every other name service for every entry in the le. It adds the cluster keyword for the hosts and netmasks keywords. This keyword modies the standard Solaris OS resolution libraries so that they can resolve the cluster transport host names and netmasks directly from the CCR. The default transport host names (associated with IP addresses on the clprivnet0 adapter) are clusternode1-priv, clusternode2-priv, and so on. These names can be used by any utility or application as normal resolvable names without having to be entered in any other name service.
4-48
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-49
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-50
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-51
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Two-node cluster where you disabled automatic quorum selection Any cluster of more than two nodes
vincent:/# cldev list -v DID Device Full Device Path ------------------------d1 vincent:/dev/rdsk/c0t0d0 d2 vincent:/dev/rdsk/c0t1d0 d3 vincent:/dev/rdsk/c0t6d0 d4 vincent:/dev/rdsk/c1t0d0 d4 theo:/dev/rdsk/c1t0d0 d5 vincent:/dev/rdsk/c1t1d0 d5 theo:/dev/rdsk/c1t1d0 d6 vincent:/dev/rdsk/c1t2d0
4-52
Manual Quorum Selection d6 d7 d7 d8 d8 d9 d9 d10 d10 d11 d11 d12 d12 d13 d13 d14 d14 d15 d15 d16 d16 d17 d17 d18 d18 d19 d19 d20 d21 d22 theo:/dev/rdsk/c1t2d0 vincent:/dev/rdsk/c1t3d0 theo:/dev/rdsk/c1t3d0 vincent:/dev/rdsk/c1t8d0 theo:/dev/rdsk/c1t8d0 vincent:/dev/rdsk/c1t9d0 theo:/dev/rdsk/c1t9d0 vincent:/dev/rdsk/c1t10d0 theo:/dev/rdsk/c1t10d0 vincent:/dev/rdsk/c1t11d0 theo:/dev/rdsk/c1t11d0 vincent:/dev/rdsk/c2t0d0 theo:/dev/rdsk/c2t0d0 vincent:/dev/rdsk/c2t1d0 theo:/dev/rdsk/c2t1d0 vincent:/dev/rdsk/c2t2d0 theo:/dev/rdsk/c2t2d0 vincent:/dev/rdsk/c2t3d0 theo:/dev/rdsk/c2t3d0 vincent:/dev/rdsk/c2t8d0 theo:/dev/rdsk/c2t8d0 vincent:/dev/rdsk/c2t9d0 theo:/dev/rdsk/c2t9d0 vincent:/dev/rdsk/c2t10d0 theo:/dev/rdsk/c2t10d0 vincent:/dev/rdsk/c2t11d0 theo:/dev/rdsk/c2t11d0 theo:/dev/rdsk/c0t0d0 theo:/dev/rdsk/c0t1d0 theo:/dev/rdsk/c0t6d0
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-53
Choosing Quorum Using the clsetup Utility Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The clsetup utility is a menu-driven interface which, when the installmode ag is reset, turns into a general menu-driven alternative to low-level cluster commands. This is described in more detail in Module 5. The clsetup utility recognizes if the installmode ag is still set, and will not present any of its normal menus until you reset it. For a two-node cluster, this involves choosing a single quorum device rst. # /usr/cluster/bin/clsetup >>> Initial Cluster Setup <<< This program has detected that the cluster "installmode" attribute is still enabled. As such, certain initial cluster setup steps will be performed at this time. This includes adding any necessary quorum devices, then resetting both the quorum vote counts and the "installmode" property. Please do not proceed if any additional nodes have yet to join the cluster. Is it okay to continue (yes/no) [yes]? yes
Do you want to add any quorum devices (yes/no) [yes]? yes Following are supported Quorum Devices types in Sun Cluster. Please refer to Sun Cluster documentation for detailed information on these supported quorum device topologies. What is the type of device you want to use? 1) Directly attached shared disk 2) Network Attached Storage (NAS) from Network Appliance 3) Quorum Server q) Option: 1 >>> Add a SCSI Quorum Disk <<< A SCSI quorum device is considered to be any Sun Cluster supported attached storage which connected to two or more nodes of the cluster. Dual-ported SCSI-2 disks may be used as quorum devices in two-node clusters. However, clusters with more than two nodes require that
4-54
Manual Quorum Selection SCSI-3 PGR disks be used for all disks with more than two node-to-disk paths. You can use a disk containing user data or one that is a member of a device group as a quorum device. For more information on supported quorum device topologies, see the Sun Cluster documentation. Is it okay to continue (yes/no) [yes]? yes Which global device do you want to use (d<N>)? d4 yes
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Is it okay to proceed with the update (yes/no) [yes]? clquorum add d4 Command completed successfully.
Press Enter to continue: Do you want to add another quorum device (yes/no) [yes]? no
Once the "installmode" property has been reset, this program will skip "Initial Cluster Setup" each time it is run again in the future. However, quorum devices can always be added to the cluster using the regular menu options. Resetting this property fully activates quorum settings and is necessary for the normal and safe operation of the cluster. Is it okay to reset "installmode" (yes/no) [yes]? yes
*** Main Menu *** Please select from one of the following options:
4-55
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
1) 2) 3) 4) 5) 6) 7) 8)
Quorum Resource groups Data Services Cluster interconnect Device groups and volumes Private hostnames New nodes Other cluster tasks
4-56
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
?) Help with menu options q) Quit Option: 1 *** Quorum Menu *** Please select from one of the following options: 1) Add a quorum device 2) Remove a quorum device ?) Help q) Return to the Main Menu Option: 1 From here, the dialog looks similar to the previous example, except that the installmode is already reset so after adding your quorum devices you just return to the main menu.
4-57
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Nodes Quorum votes (including node and device quorum votes) Device groups Resource groups and related resources Cluster interconnect status
Note The cluster command-line interface (CLI) commands are described in detail starting in the next module and continuing on a per-subject basis as you congure storage and applications into the cluster in Modules 6 through 10.
4-58
Performing Post-Installation Verification The following two commands give identical output, and show the cluster membership and quorum vote information: vincent:/# cluster status -t quorum vincent:/# clquorum status Cluster Quorum === --- Quorum Votes Summary --Needed -----2 Present ------3 Possible -------3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
--- Quorum Votes by Node --Node Name --------vincent theo Present ------1 1 Possible -------1 1 Status -----Online Online
--- Quorum Votes by Device --Device Name ----------d4 Present ------1 Possible -------1 Status -----Online
The following two commands are identical, and show status of the private networks that make up the cluster interconnect (cluster transport): vincent:/# cluster status -t intr vincent:/# clintr status Cluster Transport Paths === Endpoint1 --------vincent:qfe0 vincent:hme0 Endpoint2 --------theo:qfe0 theo:hme0 Status -----Path online Path online
4-59
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-60
Performing Post-Installation Verification Node ID: Enabled: privatehostname: reboot_on_path_failure: globalzoneshares: defaultpsetmin: quorum_vote: quorum_defaultvote: quorum_resv_key: Transport Adapter List: Transport Cables === Transport Cable: Endpoint1: Endpoint2: State: Transport Cable: Endpoint1: Endpoint2: State: Transport Cable: Endpoint1: Endpoint2: State: Transport Cable: Endpoint1: Endpoint2: State: Transport Switches === Transport Switch: State: Type: Port Names: Port State(1): Port State(2): Transport Switch: State: Type: Port Names: switch1 Enabled switch 1 2 Enabled Enabled switch2 Enabled switch 1 2 vincent:hme0,switch1@1 vincent:hme0 switch1@1 Enabled vincent:qfe0,switch2@1 vincent:qfe0 switch2@1 Enabled theo:hme0,switch1@2 theo:hme0 switch1@2 Enabled theo:qfe0,switch2@2 theo:qfe0 switch2@2 Enabled 2 yes clusternode2-priv disabled 1 1 1 1 0x44A025C200000002 hme0, qfe0
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-61
Performing Post-Installation Verification Port State(1): Port State(2): Quorum Devices === Quorum Device Name: Enabled: Votes: Global Name: Type: Access Mode: Hosts (enabled): d4 yes 1 /dev/did/rdsk/d4s2 scsi scsi2 vincent, theo Enabled Enabled
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
. . . // This continues on to show configuration of resource types, resource groups resources and disks. We will cover these in later chapters
4-62
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Verifying the Boot Disk Task 2 Verifying the Environment Task 3 Updating Local Name Resolution Task 4 Installing the Sun Cluster Packages Task 5 Conguring a New Cluster The All-Nodes-at-Once Method Task 6 Conguring a New Cluster The One-Node-at-a-Time Method Task 7 Verifying an Automatically Selected Quorum Device (TwoNode Cluster) Task 8 Conguring a Quorum Device (Three-Node Cluster or TwoNode Cluster With No Automatic Selection) Task 9 Verifying the Cluster Conguration and Status
4-63
Exercise: Installing the Sun Cluster Server Software 2. Use the prtvtoc command to verify that each boot disk meets the Sun Cluster software partitioning requirements. # prtvtoc /dev/dsk/c0t0d0s2
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note In the Remote Lab Environment, the machine named vnchost may be serving as the shared administrative station for all the students. 2. Check specically that the line for ipnodes in the /etc/nsswitch.conf le references only the files keyword, regardless of any other name service you may be using. Make sure the line reads: ipnodes: files
Note In Solaris 10, having other name services listed on this line causes IPV4 name resolution to use those other name services before it uses your local /etc/hosts le. This can corrupt certain cluster operations, specically causing failures in creation of Solaris VM disksets.
4-64
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-65
Exercise: Installing the Sun Cluster Server Software d. Feel free to skip multilingual support if English-only is ne for you, or feel free to select it. (It adds signicant time to the installation.) Conrm upgrade of the shared components. Verify that system requirements are met. Choose Congure Later (if you accidentally choose Congure Now, you will be asked if you want to enable remote conguration; respond Yes). Conrm that you want to continue with the installation.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
e. f. g.
h.
f.
g. 4.
You will observe the following error messages as each node reboots, just the very rst time, into the cluster. They are all normal.
4-66
Exercise: Installing the Sun Cluster Server Software /usr/cluster/bin/scdidadm: /usr/cluster/bin/scdidadm: 5. Could not load DID instance list. Cannot open /etc/cluster/ccr/did_instances.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You will observe the following additional error messages every time a node boots into the cluster. They are also normal.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast t_optmgmt: System error: Cannot assign requested address svc.startd[8]: system/cluster/scsymon-srv:default misconfigured
i. j.
4-67
Exercise: Installing the Sun Cluster Server Software k. Examine the scinstall command options for correctness. Accept them if they seem appropriate. You must wait for this node to complete rebooting to proceed to the second node. 3. You will observe the following error messages as the node reboots, just the very rst time, into the cluster. They are all normal. Could not load DID instance list. Cannot open /etc/cluster/ccr/did_instances.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
/usr/cluster/bin/scdidadm: /usr/cluster/bin/scdidadm: 4.
You will observe the following additional error messages every time a node boots into the cluster. They are also normal.
ip: joining multicasts failed (18) on clprivnet0 - will use link layer broadcasts for multicast t_optmgmt: System error: Cannot assign requested address svc.startd[8]: system/cluster/scsymon-srv:default misconfigured Perform the following steps on each additional node in the cluster. If you have more than two nodes, you could perform all the other nodes simultaneously, although then it will be hard to predict which node gets which node ID. If you want this to be predictable, do one node at a time. 1. 2. Start the scinstall utility on the new node. As the installation proceeds, make the following choices: a. b. c. d. e. Choose Option 1 from the Main Menu. Choose Option 3 from the Install Menu, Add this machine as a node in an existing cluster. From the Type of Installation Menu, choose Option 1, Typical. Provide the name of a sponsoring node. Provide the name of the cluster that you want to join. Type cluster show -t global on the rst node (the sponsoring node) if you have forgotten the name of the cluster. f. g. h. i. Use auto-discovery for the transport adapters. Reply yes to the automatic reboot question. Examine and approve the scinstall command-line options. You will see the same normal boot error messages on the additional nodes as on the rst node.
4-68
Note You see interconnect-related errors on the existing nodes beginning when a new node adds itself to the cluster until the new node completes the rst portion of its reboot operation.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2.
4-69
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 8 Configuring a Quorum Device (Three-Node Cluster or Two-Node Cluster With No Automatic Selection)
Perform the following steps for quorum selection on a three-node cluster: 1. In a three node-cluster, wait after the third node boots until you see console messages (on all nodes) indicating that the quorum votes are being reset. On Node 1, type the cldev list -v command, and record the DID devices you intend to congure as quorum disks. For example, if you had three nodes in a Pair + 1 conguration, you would want two quorum devices. Quorum disks: _______ (d4, d6, and so on) Note Pay careful attention. The rst few DID devices might be local disks, such as the boot disk and a CD-ROM (target 6). Examine the standard logical path to make sure the DID device you select is a disk in a storage array and is connected to more than one node. 3. On Node 1, type the clsetup command. Navigate to the section for adding a new quorum device and supply the name of the rst DID device (global device) that you selected in the previous step. You should see output similar to the following. clquorum add d12 May 3 22:29:13 vincent cl_runtime: NOTICE: CMM: Cluster members: vincent theo apricot. May 3 22:29:13 proto192 cl_runtime: NOTICE: CMM: node reconfiguration #4 completed. 4. 5. 6. If you have three nodes in a Pair +N conguration, you should add a second quorum device. Reply yes to the reset installmode question (two-node cluster only). You should see a Cluster initialization is complete message. Type q to quit the clsetup utility.
2.
4-70
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4-71
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
4-72
Module 5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Identify the cluster daemons Use cluster commands View and administer cluster global properties View and administer quorum View and administer disk paths and settings View and administer interconnect components Use the clsetup command and the Sun Cluster Manager Perform basic cluster startup and shutdown operations, including booting nodes in non-cluster mode and placing nodes in a maintenance state Modify the private network settings from non-cluster mode
5-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
What are various ways about getting more information about various Sun Cluster commands themselves? Why are you sometimes required to add a new quorum device before deleting an old one? What are some of the advantages of clsetup and Sun Cluster Manager? What is the value of being able to asssign authorization for non-root users to do some cluster management?
5-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for the Solaris OS, part number 819-2971. Sun Cluster Quorum Server User Guide, part number 819-5360.
5-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Solaris 9 OS Daemons are launched by traditional Solaris OS boot scripts (and are thus guaranteed to be running by the time you get the console login prompt after a boot). A special cluster specic daemon monitor, rcp.pmfd, is required for restarting some daemons. Solaris 10 OS Daemons are launched by the Solaris 10 OS Service Management Facility (SMF). Therefore, at boot time you might see a console login prompt before many of these daemons are launched. SMF itself can restart some daemons.
The following is taken from a cluster node running Solaris 10 OS: # ps -ef|grep clust root 4 0 0 13:48:57 ? 0:09 root 223 1 0 13:49:52 ? 0:00 root 2390 1 0 13:52:02 ? 0:01 root 257 1 0 13:50:09 ? 0:00 root 270 1 0 13:50:10 ? 0:00 root 271 270 0 13:50:10 ? 0:00 root 2395 1 0 13:52:03 ? 0:00 root 2469 1 0 13:52:05 ? 0:00 root 2403 1 0 13:52:03 ? 0:00 root 2578 1 0 13:52:10 ? 0:02 root 2401 1 0 13:52:03 ? 0:00 /usr/cluster/lib/sc/sc_delegated_restarter root 2394 1 0 13:52:03 ? 0:01 root 2388 1 0 13:52:02 ? 0:00 root 2473 1 0 13:52:05 ? 0:01 root 2418 1 0 13:52:04 ? 0:10 root 2423 1 0 13:52:04 ? 0:00 /usr/cluster/lib/sc/cl_eventlogd
cluster /usr/cluster/lib/sc/qd_userd /usr/cluster/bin/pnmd /usr/cluster/lib/sc/failfastd /usr/cluster/lib/sc/clexecd /usr/cluster/lib/sc/clexecd /usr/cluster/lib/sc/sc_zonesd /usr/cluster/lib/sc/scprivipd /usr/cluster/lib/sc/rpc.pmfd /usr/cluster/lib/sc/rgmd
5-4
cluster This is a system process (created by the kernel) to encapsulate the kernel threads that make up the core kernel range of operations. There is no way to kill this process (even with a KILL signal) because it is always in the kernel.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
failfastd This daemon is the failfast proxy server. Other daemons that require services of the failfast device driver register with failfastd. The failfast daemon allows the kernel to panic if certain essential daemons have failed. clexecd This is used by cluster kernel threads to execute userland commands (such as the run_reserve and dofsck commands). It is also used to run cluster commands remotely (like the cluster shutdown command). This daemon registers with failfastd so that a failfast device driver will panic the kernel if this daemon is killed and not restarted in 30 seconds.
cl_eventd This daemon registers and forwards cluster events (such as nodes entering and leaving the cluster). There is also a protocol whereby user applications can register themselves to receive cluster events. The daemon is automatically respawned if it is killed. qd_userd This daemon serves as a proxy whenever any quorum device activity requires execution of some command in userland (for example, a NAS quorum device). If you kill this daemon, you must restart it manually. rgmd This is the resource group manager, which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.fed This is the fork-and-exec daemon, which handles requests from rgmd to spawn methods for specic data services. A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds.
rpc.pmfd This is the process monitoring facility. It is used as a general mechanism to initiate restarts and failure action scripts for some cluster framework daemons (in Solaris 9 OS), and for most application daemons and application fault monitors (in Solaris 9 and 10 OS)
5-5
Identifying Cluster Daemons A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
pnmd This is the public network management daemon, which manages network status information received from the local IPMP daemon running on each node and facilitates application failovers caused by complete public network failures on nodes. It is automatically restarted if it is stopped. cl_eventlogd This daemon logs cluster events into a binary log le. At the time of writing for this course, there is no published interface to this log. It is automatically restarted if it is stopped. cl_ccrad This daemon provides access from userland management applications to the CCR. It is automatically restarted if it is stopped. scdpmd This daemon monitors the status of disk paths, so that they can be reported in the output of the cldev status command. It is automatically restarted if it is stopped. sc_zonesd This daemon monitors the state of Solaris 10 non-global zones so that applications designed to failover between zones can react appropriately to zone booting and failure. A failfast driver panics the kernel if this daemon is stopped and not restarted in 30 seconds.
sc_delegated_restarter This daemon restarts cluster applications that are written as SMF services and then placed under control of the cluster using the Sun Cluster 3.2 SMF proxy feature. It is automatically restarted if it is stopped. scprivipd This daemon provisions IP addresses on the clprivnet0 interface, on behalf of zones. It is automatically restarted if it is stopped.
5-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The name of the command is related to the cluster object you are trying to display or administer. Every command requires a sub-command that indicates what it is you actually want to do. Every command has a common style of syntax whereby:
q
When using sub-commands to operate on (or delete) a specic object, you give the name of the object as the last item on the command line, or you give a + to indicate a wildcard. The wildcard might still be limited by other command-line arguments (specic group or subtypes, if appropriate). When using subcommands to display conguration or status, the default is to show all objects (of that category), but you can give an optional last argument to show only a specic object.
clnode Conguration, status, and settings for nodes clquorum (clq) Conguration, status, settings, adding, and deleting quorum devices (includes node quorum information) clinterconnect (clintr) Conguration, status, settings, adding, and deleting private networks cldevice (cldev) Conguration, status, and settings for individual devices (disk, CDROM, and tape) cluster:
q q
Administering cluster global settings Showing conguration and status of everything; can be limited by other arguments to certain types or groups of information
5-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Additional Commands
The following commands relate to administration of device groups and cluster application resources, which are the topics of Modules 6 through 10. No additional information is presented about them in this module:
q
cldevicegroup (cldg) Conguration, status, settings, adding and deleting device groups (including VxVM and Solaris Volume Manager device groups) clresourcegroup (clrg) Conguration, status, settings, adding and deleting application resource groups clresource (clrs) Conguration, status, settings, adding, and deleting individual resources in application resource groups clreslogicalhostname (clrslh) and clressharedaddress (clrssa) Conguration, status, settings, adding, and deleting IP resources in application resource groups (these commands simplify tasks that can also be accomplished with clresource)
If you run a command without any sub-command, the usage message always lists the possible sub-commands:
# clquorum clquorum: (C961689) Not enough arguments. clquorum: (C101856) Usage error. Usage: clquorum <subcommand> [<options>] [+ | <devicename> ...] clquorum [<subcommand>] -? | --help clquorum -V | --version
Manage cluster quorum SUBCOMMANDS: add disable enable export list remove Add a quorum device to the cluster configuration Put quorum devices into maintenance state Take quorum devices out of maintenance state Export quorum configuration List quorum devices Remove a quorum device from the cluster configuration
5-8
Reset the quorum configuration Show quorum devices and their properties Display the status of the cluster quorum If you run a sub-command that requires arguments, the usage message gives you more information about the particular sub-command. It does not give you all the information about the names of properties that you might need to set (for that, you have to go to the man pages):
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# clquorum add clquorum: (C456543) You must specify the name of the quorum device to add. clquorum: (C101856) Usage error. Usage: clquorum add [<options>] <devicename> clquorum add -a [-v] clquorum add -i <configfile> [<options>] + | <devicename> ... Add a quorum device to the cluster configuration OPTIONS: -? -a Display this help text. Automatically select and add a quorum device.
-i {- | <clconfiguration>} Specify XML configuration as input. -p <name>=<value> Specify the properties. -t <type> Specify the device type. -v Verbose output.
Note If you perform the man cluster command on the Solaris 10 OS, you might get the wrong man page because there is a man page for a completely unrelated SQL command called cluster. To x that, you can put /usr/cluster/man as the rst directory in your MANPATH variable, or you can type man -s1cl cluster.
5-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Cluster === Cluster Name: installmode: heartbeat_timeout: heartbeat_quantum: private_netaddr: private_netmask: max_nodes: max_privatenets: global_fencing: Node List: orangecat disabled 4000 800 172.16.0.0 255.255.248.0 64 10 pathcount vincent, theo
Note Modifying the private_netaddr and private_netmask properties is a special case, in that it is done only when the entire cluster is down, and all nodes are booted in non-cluster mode. This will be covered later in the module.
5-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-12
Viewing and Administering Nodes reboot_on_path_failure: globalzoneshares: defaultpsetmin: quorum_vote: quorum_defaultvote: quorum_resv_key: Transport Adapter List: disabled 1 1 1 1 0x44A1995C00000002 hme0, qfe0
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If it seems that the OS can resolve private hostnames that no longer exist (because you changed them), it is because of the OS namecaching daemon (nscd). You can use nscd -i hosts to clear this cache.
5-14
Viewing and Administering Nodes You can also directly examine the le /etc/cluster/release to get quick information about the release of the cluster software installed on a particular node: # cat /etc/cluster/release Sun Cluster 3.2 for Solaris 10 sparc Copyright 2006 Sun Microsystems, Inc. All Rights Reserved.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
--- Quorum Votes by Node --Node Name --------vincent theo Present ------1 1 Possible -------1 1 Status -----Online Online
--- Quorum Votes by Device --Device Name ----------d4 # clq show d4 Quorum Devices === Quorum Device Name: Enabled: d4 yes Present ------1 Possible -------1 Status -----Online
5-16
Viewing and Administering Quorum Votes: Global Name: Type: Access Mode: Hosts (enabled): 1 /dev/did/rdsk/d4s2 scsi scsi2 vincent, theo
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-17
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
--- Quorum Votes by Node --Node Name --------vincent theo Present ------1 1 Possible -------1 1 Status -----Online Online
--- Quorum Votes by Device --Device Name ----------qservydude Present ------1 Possible -------1 Status -----Online
5-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Quorum Devices === Quorum Device Name: Enabled: Votes: Global Name: Type: Hosts (enabled): Quorum Server Host: Port: qservydude yes 1 qservydude quorum_server vincent, theo clustergw 9000
5-19
Viewing and Administering Quorum A Netapp NAS device can be configured as a quorum device for Sun Cluster. The NAS configuration data includes a device name, which is given by the user and must be unique across all quorum devices, the filer name, and a LUN id, which defaults to 0 if not specified. Please refer to the clquorum(1M) man page and other Sun Cluster documentation for details. The NAS quorum device must be setup before configuring it with Sun Cluster. For more information on setting up Netapp NAS filer, creating the device, and installing the license and the Netapp binaries, see the Sun Cluster documentation. Is it okay to continue (yes/no) [yes]? yes What name do you want to use for this quorum device? What is the name of the filer [netapps]? What is the LUN id on the filer [0]? 0 Is it okay to proceed with the update (yes/no) [yes]? yes clquorum add -t netapp_nas -p filer=mynetapp -p lun_id=0 netappq mynetapp netappq
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-20
Viewing and Administering Quorum You can verify the conguration of the NAS device in the CCR using the clnas show command as in the following example: # clnas show Filers of type "netapp": Filer name: type: password: userid: directories: directories:
.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Display disk path information (which nodes are connected to which disks) and the mapping between DID device names and c#t#d# Display disk path status information. Disk paths are monitored by a daemon scdpmd Allow you to change disk monitoring settings:
q q
By default, all paths are monitored. We will see at least one reason you may like to turn off monitoring for non-shared disks.
Allow you to change properties, which can affect whether SCSI-2 or SCSI-3 is used for fencing disks with exactly two paths.
5-22
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
/dev/did/rdsk/d11
/dev/did/rdsk/d12
/dev/did/rdsk/d13
/dev/did/rdsk/d14
/dev/did/rdsk/d15
/dev/did/rdsk/d16
/dev/did/rdsk/d17
/dev/did/rdsk/d18
5-23
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
/dev/did/rdsk/d19
theo vincent vincent theo theo theo vincent theo vincent theo vincent theo vincent theo vincent theo vincent
/dev/did/rdsk/d5
/dev/did/rdsk/d6
/dev/did/rdsk/d7
/dev/did/rdsk/d8
/dev/did/rdsk/d9
You can limit the output to a specic DID device by giving its name as an argument. You can also display only those paths that have a specic status, as in the following example: # cldev status -s Fail Device Instance --------------Node ---Status ------
5-24
Viewing and Administering Disk Paths and Settings You can unmonitor specic paths, or re-monitor them if you have previously unmonitored them. By default you affect the path from all connected nodes to the DID device that you mention. You can limit your action to a path from a specic node by using the -n option. For example, the following commands unmonitor all paths to disk device d8, and specically the path from node theo to disk device d9: # # # # cldev cldev cldev cldev monitor all unmonitor d8 unmonitor -n theo d9 status -s Unmonitored Node ---theo vincent theo Status -----Unmonitored Unmonitored Unmonitored
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
/dev/did/rdsk/d9
As is the convention for all cluster commands, you can use + as the wildcard. So cldev monitor + would turn monitoring back on for all devices, regardless of their previous state.
5-25
Viewing and Administering Disk Paths and Settings # # # # cldev unmonitor d1 d2 d4 clnode set -p reboot_on_path_failure=enabled vincent cldev unmonitor d20 d21 clnode set -p reboot_on_path_failure=enabled theo
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Each individual disk has its own fencing property. The default value for every disk is global, which means that fencing for that disk follows the global property. # cldev show d7 DID Device Instances === DID Device Name: Full Device Path: Full Device Path: Replication: default_fencing: /dev/did/rdsk/d7 theo:/dev/rdsk/c1t3d0 vincent:/dev/rdsk/c1t3d0 none global
Modifying Properties to use SCSI-3 Reservations for Disks With Two Paths
The alternate value for the global_fencing property is prefer3. The prefer3 global value implies that the cluster should try to use SCSI-3 fencing for every device that is still following the global policy. When you change the global policy from pathcount to prefer3:
5-26
The cluster probes the current two-path devices by sending a sample SCSI-3 ioctl command. The cluster assumes that any device that responds appropriately to such a command can support SCSI-3. The cluster immediately begins to use SCSI-3 fencing for any nonquorum disk that seems to support SCSI-3. If there is a current quorum disk, currently congured as a SCSI-2 quorum, at the time that the global policy is changed to prefer3, that quorum disk will not have its behavior changed to SCSI-3. You would have to delete the quorum and add it back (or add a different quorum device) to get SCSI-3 behavior on the quorum disk.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The following command changes the global policy. You can see that SCSI2 behavior is left intact for an existing quorum disk. There is no output or change in per-disk properties for other, non-quorum disks. # cluster set -p global_fencing=prefer3 Warning: Device instance d4 is a quorum device - fencing protocol remains PATHCOUNT for the device. # cldev show d4 DID Device Instances === DID Device Name: Full Device Path: vincent:/dev/rdsk/c1t0d0 Full Device Path: Replication: default_fencing: # clq show d4 Quorum Devices === Quorum Device Name: Enabled: Votes: Global Name: Type: Access Mode: Hosts (enabled): d4 yes 1 /dev/did/rdsk/d4s2 scsi scsi2 vincent, theo /dev/did/rdsk/d4
5-27
Viewing and Administering Disk Paths and Settings The per-disk policy for the existing quorum device has been set to pathcount so that it does not follow the global default of prefer3. For other disks, the individual default_fencing property remains with the value global and the cluster immediately uses SCSI-3 the next time they need to be fenced.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
global (follow cluster-wide global_fencing property) scsi3 (force SCSI-3 even if the cluster-wide global_fencing is pathcount) pathcount (perform the old-style: use SCSI-2 if exactly two-paths and SCSI-3 if more than two paths, even if the global_fencing is prefer3)
As expected, you cannot change this property for a current quorum device (you would delete it and add it back as in the previous example). For a non-quorum device, you can have many disks that could handle SCSI-3 reservations, but a few that you might know in advance have problems with it. You could, for example, then set the cluster-wide global_fencing to prefer3 and the individual disk property to pathcount.
5-28
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
No particular software administration is required to repair a broken interconnect path, if you are not redening which adapters are used. If a cable breaks, for example, the cluster immediately reports a path failure. If you replace the cable, the path immediately goes back on line.
Dene the two adapter endpoints (for example, theo:qfe3 and vincent:qfe3) Dene the cable between the two endpoints
5-29
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Dene the adapter endpoints for each node Dene the switch endpoint (cluster assumes any endpoint not in the form of node:adapter is a switch) Dene cables between each adapter and the switch
For example, the following commands dene a new private network for a two-node cluster. The denitions dene a switch. This does not mean that the switch needs to physically exist, it is just a denition in the cluster. # # # # # clintr clintr clintr clintr clintr add add add add add vincent:qfe3 theo:qfe3 switch3 vincent:qfe3,switch3 theo:qfe3,switch3
# clintr status Cluster Transport Paths === Endpoint1 --------vincent:qfe3 vincent:qfe0 vincent:hme0 Endpoint2 --------theo:qfe3 theo:qfe0 theo:hme0 Status -----Path online Path online Path online
5-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
*** Main Menu *** Please select from one of the following options: 1) 2) 3) 4) 5) 6) 7) 8) Quorum Resource groups Data Services Cluster interconnect Device groups and volumes Private hostnames New nodes Other cluster tasks
5-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
?) Help q) Return to the Main Menu Option: You still need to know, in general, how things are done. For example, if you wanted to permanently delete an entire private network, you need to perform the following tasks in the correct order: 1. Disable the cable or cables that dene the transport. This is a single operation for a crossover-cable denition, or multiple operations if a switch is dened (one cable per node connected to the switch). 2. 3. 4. Remove the denition(s) of the transport cable or cables. Remove the denition of the switch. Remove the denitions of the adapters, one per node.
Nothing bad happens if you try to do things in the wrong order; you are just informed that you missed a step. But this would be the same with the command-line or with clsetup. The clsetup command saves you from needing to remember the commands, sub-commands, and options.
5-32
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You still really need to understand the cluster tasks before accomplishing anything. It saves you from having to remember options of the lower-level commands. It accomplishes all its actions through the lower-level commands, and shows the lower-level commands as it runs them.
It offers some graphical topology views of the Sun Cluster nodes with respect to device groups, resource groups, and so forth. It highlights faulted components in real time. The browser refreshes automatically as component status changes.
Many of the exercises in the modules following this one offer you the opportunity to view and manage device and resource groups through Sun Cluster Manager. However, this course does not focus on the details of using Sun Cluster Manager as the method of accomplishing the task. Similar to clsetup, after you understand the nature of a task, Sun Cluster Manager can guide you through it. This is frequently easier than researching and typing all the options correctly to the lower-level commands.
5-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 5-1
Log in with the user name of root and the root password. Alternatively, you can create a non-root user or role authorized to have full or limited access through RBAC. By default, any user will be able to log in to Sun Cluster Manager through the Sun Java Web Console and have view-only access.
5-34
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 5-2
5-35
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 5-3
5-36
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 5-4
5-37
Running Cluster Commands as Non-root User or Role Using Role Based Access
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Running Cluster Commands as Non-root User or Role Using Role Based Access Control (RBAC)
Sun Cluster 3.2 has a simplied RBAC structure that can allow you to assign cluster administrative privileges to non-root users or roles. There are only three authorizations that pertain to Sun Cluster:
q
solaris.cluster.read This authorization gives the ability to do any status, list, show subcommands. By default every user has this authorization, since it is in the Basic Solaris User prole
solaris.cluster.modify This gives the ability to do add, create, delete, remove and related subcommands
solaris.cluster.admin This gives the ability to do switch, online, offline, enable, disable subcommands.
Note Sun Cluster 3.1 required RBAC authorizations of a much ner granularity (separate authorizations to manage quorum, interconnect, resource groups, and so on). These old authorizations still are accessible and are supported for back-compatibility, but they are no longer required. Like any RBAC authorizations, the cluster authorizations can be:
q q
Assigned directly to a user Assigned directly to a role (allowed users have to assume the role, and then they would get the authorization) Assigned to a rights prole that is then given to a user or a role
There is a review of RBAC concepts in the appendix.. Note The Sun Java Web Console detects if the user you use for its own login has the ability to assume any roles. If it does, the Web Console gives you additional options to assume the role. The information is passed down to the Sun Cluster Manager, which follows the RBAC authorizations.
5-38
Controlling Clusters
Controlling Clusters
Basic cluster control includes starting and stopping clustered operation on one or more nodes and booting nodes in non-cluster mode.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-39
Controlling Clusters Jul 10 00:23:31 vincent rpcbind: rpcbind terminating on signal. Jul 10 00:23:32 vincent syslogd: going down on signal 15 Jul 10 00:23:33 rpc.metad: Terminated Jul 10 00:23:34 cl_eventlogd[2816]: Going down on signal 15. Jul 10 00:23:35 Cluster.PNM: PNM daemon exiting. Jul 10 00:23:35 Cluster.scdpmd: going down on signal 15 svc.startd: The system is down. WARNING: Failfast: Node-level failfast controls for zone global (zone_id 0) disallow the firing of a failfast. WARNING: Failfast: Node-level failfast controls for zone global (zone_id 0) disallow the firing of a failfast. NOTICE: CMM: Node theo (nodeid = 2) is down. NOTICE: clcomm: Path vincent:hme0 - theo:hme0 being drained NOTICE: CMM: Cluster members: vincent. NOTICE: clcomm: Path vincent:qfe0 - theo:qfe0 being drained TCP_IOC_ABORT_CONN: local = 000.000.000.000:0, remote = 172.016.004.002:0, start = -2, end = 6 WARNING: Failfast: Node-level failfast controls for zone global (zone_id 0) disallow the firing of a failfast. TCP_IOC_ABORT_CONN: aborted 0 connection WARNING: Failfast: Node-level failfast controls for zone global (zone_id 0) disallow the firing of a failfast. WARNING: Failfast: Node-level failfast controls for zone global (zone_id 0) disallow the firing of a failfast. WARNING: Failfast: Node-level failfast controls for zone global (zone_id 0) disallow the firing of a failfast. NOTICE: CMM: node reconfiguration #8 completed. syncing file systems... done NOTICE: CMM: Node theo (nodeid = 2) is dead. WARNING: CMM: Node being shut down. Program terminated {1} ok
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-40
Controlling Clusters
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Rebooting with command: boot -x Boot device: /pci@1f,4000/scsi@3/disk@0,0:a File and args: -x SunOS Release 5.10 Version Generic_118833-17 64-bit Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: vincent vincent console login: root Password: Jul 10 00:31:50 vincent login: ROOT LOGIN /dev/console Last login: Mon Jul 10 00:19:32 on console Sun Microsystems Inc. SunOS 5.10 Generic January 2005 # cluster status cluster: (C374108) Node is not in cluster mode.
5-41
Controlling Clusters
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-42
Controlling Clusters You need to use the arrows to highlight the line that begins with kernel, then press the e again to edit that specic line and add the -x [ Minimal BASH-like line editing is supported. For the first word, TAB lists possible command completions. Anywhere else TAB lists the possible completions of a device/filename. ESC at any time exits. ] grub edit> kernel /platform/i86pc/multiboot -x
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Press return to get back to the screen listing boot parameters, where you will see your -x in the appropriate place. Now you can press b to boot. GNU GRUB version 0.95 (615K lower / 2095552K upper memory) +----------------------------------------------------------------------+ | root (hd0,0,a) | | kernel /platform/i86pc/multiboot -x | | module /platform/i86pc/boot_archive | +----------------------------------------------------------------------+ Use the ^ and v keys to select which entry is highlighted. Press 'b' to boot, 'e' to edit the selected command in the boot sequence, 'c' for a command-line, 'o' to open a new line after ('O' for before) the selected line, 'd' to remove the selected line, or escape to go back to the main menu. SunOS Release 5.10 Version Generic_118855-14 64-bit Copyright 1983-2005 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. Hostname: gabi gabi console login: root Password: Jul 10 00:31:50 gabi login: ROOT LOGIN /dev/console Last login: Mon Jul 10 00:19:32 on console Sun Microsystems Inc. SunOS 5.10 Generic January 2005 # cluster status cluster: (C374108) Node is not in cluster mode.
5-43
Controlling Clusters
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
--- Quorum Votes by Node --Node Name --------vincent theo Present ------0 1 Possible -------0 1 Status -----Offline Online
--- Quorum Votes by Device --Device Name ----------d4 Present ------0 Possible -------0 Status -----Offline
In addition, the vote count for any dual-ported quorum device physically attached to the node is also set to 0. You can reset the maintenance state for a node by rebooting the node into the cluster. The node and any dual-ported quorum devices regain their votes.
5-44
Controlling Clusters
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Switch
Node 1 (1)
Node 2 (1)
Node 3 (1)
Node 4 (1)
Figure 5-5
Now, imagine that Nodes 3 and 4 are down. The cluster can still continue with Nodes 1 and 2 because you still have four out of the possible seven quorum votes. Now imagine that you want to halt Node 1 and keep Node 2 running as the only node in the cluster. If there were no such thing as maintenance state, this would be impossible. Node 2 would panic with only three out of seven quorum votes. But if you put Nodes 3 and 4 into maintenance state, you reduce the total possible votes to three and Node 2 can continue with two out of three votes.
5-45
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
*** Main Menu *** Select from one of the following options: 1) Change Network Addressing and Ranges for the Cluster Transport 2) Show Network Addressing and Ranges for the Cluster Transport ?) Help with menu options q) Quit Option: 1
>>> Change Network Addressing and Ranges for the Cluster Transport <<< Network addressing for the cluster transport is currently configured as follows:
5-46
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The default network address for the cluster transport is 172.16.0.0. Do you want to use the default (yes/no) [yes]? What network address do you want to use? . . . . . . Maximum number of nodes anticipated for future growth [2]? 8 Maximum number of private networks anticipated for future growth [2]?4 Specify a netmask of 255.255.255.128 to meet anticipated future requirements of 8 cluster nodes and 4 private networks. What netmask do you want to use [255.255.255.128]? <CR> Is it okay to proceed with the update (yes/no) [yes]? yes no
192.168.5.0
Press Enter to continue: After you reboot into the cluster, your new private network information is automatically applied by the cluster. For cluster-aware applications, the same clprivnet0 private hostnames (by default, clusternode1-priv, and so on) now resolves to the new addresses.
5-47
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Verifying Basic Cluster Conguration and Status Task 2 Reassigning a Quorum Device Task 3 Adding a Quorum Server Quorum Device Task 4 Placing a Node into Maintenance State Task 5 Preventing Cluster Amnesia Task 6 Changing the Cluster Private IP Address Range Task 7 (Optional) Navigating Sun Cluster Manager
Preparation
Join all nodes in the cluster and run the cconsole tool on the administration workstation. Note During this exercise, when you see italicized names, such as IPaddress, enclosure_name, node1, or clustername embedded in a command string, substitute the names appropriate for your cluster.
5-48
Exercise: Performing Basic Cluster Administration 3. Record the quorum conguration from the previous step.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Quorum votes needed: Quorum votes present: Quorum votes possible: 4. 5. 6. # cldev status
Verify all cluster disk paths. Verify the interconnect status: # clintr status Verify the revision of the currently installed Sun Cluster software on each cluster node. # clnode show-rev -v
5-49
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5-50
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note The number of possible quorum votes should be reduced 4. Boot Node 2 again. This should reset its maintenance state. You should see the following message on both nodes: NOTICE: CMM: Votecount changed from 0 to 1 for node nodename 5. Verify the cluster quorum conguration again. The number of possible quorum votes should be back to normal.
2. 3.
5-51
Exercise: Performing Basic Cluster Administration 5. Now boot Node 2. Both nodes should complete the cluster software startup sequence and join in clustered operation.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
f. g. h. i. j.
5-52
Exercise: Performing Basic Cluster Administration 6. 7. When the command succeeds, reboot all nodes back into the cluster.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Verify that the private network addresses have changed. a. b. Use ifconfig -a and verify the information for the physical private network adapters and for the clprivnet0 adapter. Verify the output of cluster show -t global.
Note Consult your instructor about how this will be accomplished. 2. 3. 4. Log in as the user root with the root password. Navigate to the Sun Cluster Manager application. Familiarize yourself with Sun Cluster Manager navigation and the topological views.
5-53
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
5-54
Module 6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Describe the most important concepts of VERITAS Volume Manager (VxVM) and issues involved in using VxVM in the Sun Cluster 3.2 software environment Differentiate between bootdg/rootdg and shared storage disk groups Initialize a VERITAS Volume Manager disk Describe the basic objects in a disk group Describe the types of volumes that will be created for Sun Cluster 3.2 software environments Describe the general procedures and restrictions for installing and administering VxVM in the cluster Use basic commands to put disks in disk groups and build volumes Describe the two ags used to mark disks under the typical hot relocation Register, manage, and resynchronize VxVM disk groups as cluster device groups Create global and failover le systems on VxVM volumes Describe the procedure used for mirroring the boot drive Describe ZFS and its built-in volume management in contrast to traditional le systems used with a volume manager like VxVM
q q q
q q
q q q
6-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
Why is it so important to mirror JBOD disks with VxVM if the cluster already provides node-to-node failover? Are there any VxVM feature restrictions when VxVM is used in the Sun Cluster software environment? In what way does Sun Cluster provide high availability for disk groups? Can your prevent Sun Cluster from providing a global device service for your volumes?
6-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971. Sun Cluster Software Installation Guide for Solaris OS, part number 8192970. Sun Cluster Concepts Guide for Solaris OS, part number 819-2969. Veritas Storage Foundation Installation Guide (version 5.0 for Solaris), Symantec Corporation (available with Veritas Storage Foundation software).
q q
6-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
VxVM 4.1 MP1 Solaris 9 OS/Solaris 10 OS (SPARC) VxVM 5.0 Solaris 9 OS/Solaris 10 OS (SPARC) VxVM 4.1 MP2 for Solaris 10 OS (x86 AMD 64 platform)
6-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Node 1
Node 2
Figure 6-1
6-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
It physically reads and writes data to the drives within the disk group. It manages the disk group. VxVM commands pertaining to that diskgroup can be issued only from the node importing the disk group. It can voluntarily give up ownership of the disk group by deporting the disk group.
6-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The private region is used to store conguration information. The public region is used for data storage.
Data storage
Public region
Figure 6-2
6-9
Initializing a VERITAS Volume Manager Disk Starting in VxVM 4.0, the new default way to partition disks is called Cross-Platform Data Sharing (CDS) disks. The VxVM conguration on these disks can be read by VxVM running on all its supported platforms, not just on the Solaris OS. As shown in Figure 6-3 this layout combines the conguration and data storage into one large partition (slice 7) covering the entire disk.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Private
Public
Figure 6-3
In the Sun Cluster environment you are not able to access VxVM disks from any servers other than those in the same cluster, it does not matter which disk layout you choose for your data disks. You cannot use CDS disks for the bootdg. While initializing a disk and putting it in a disk group are two separate operations, the vxdiskadd utility can perform both steps for you. The defaults are always to use CDS disks, but the vxdiskadd command can guide you through setting up your disks either way. Note You cannot use the CDS layout for EFI disks. The EFI label is required for disks of size 1 Terabyte or greater. VxVM can deal with these properly, but only using the sliced layout that is specic to the Solaris OS.
6-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Subdisk
A subdisk is a contiguous piece of a physical disk that is used as a building block for a volume. The smallest possible subdisk is 1 block and the largest is the whole public region of the disk.
Plex
A plex is a collection of subdisk objects, and describes the layout of the subdisks as either concatenated, striped, or RAID5. A plex, or data plex, has the following characteristics:
q q q q
A plex is a copy of the data in the volume. A volume with one data plex is not mirrored. A volume with two data plexes is mirrored. A volume with ve data plexes is mirrored ve ways.
6-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Volume
The volume is the actual logical storage device created and used by all disk consumers. The disk consumers may be le systems, swap space, or applications, such as Oracle using raw volumes. Only volumes are given block and character device les. The basic volume creation method (vxassist) lets you specify parameters for your volume. This method automatically creates the subdisks and plexes for you.
Layered Volume
A layered volume is a technique used by VxVM that lets you create RAID 1+0 congurations. For example, you can create two mirrored volumes, and then use those as components in a larger striped volume. This striped volume contains a single plex composed of sub-volumes rather than subdisks. While this sounds complicated, the automation involved in the recommended vxassist command makes such congurations easy to create.
6-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Simple Mirrors
Figure 6-4 demonstrates subdisks from disks in different storage arrays forming the plexes of a mirrored volume.
disk01-01
disk02-01
plex1
plex2
Figure 6-4
6-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
c1t1d0 (disk03)
c2t1d0 (disk04)
Figure 6-5
Mirror-Stripe Volume
6-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
disk01-01
disk02-01
plex2
disk04-01
c1t1d0 disk03
plex1
plex2
c2t1d0 disk04
Figure 6-6
Stripe-Mirror Volume
Fortunately, as demonstrated later in the module, this is easy to create. Often, the RAID 1+0 conguration is preferred because there might be less data to resynchronize after disk failure, and you could suffer simultaneous failures in each storage array. For example, you could lose both disk01 and disk04, and your volume will still be available.
6-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Size of DRL
DRLs are small. Use the vxassist command to size them. The size of the logs that the command chooses depends on the version of VxVM, but might be, for example, 26 kilobytes (Kbytes) of DRL per 1 Gbyte of storage. If you grow a volume, just delete the DRL, and then use the vxassist command to add it back again so that it is sized properly.
Location of DRL
You can put DRL subdisks on the same disk as the data if the volume will not be heavily written. An example of this is Web data. For a heavily written volume, performance is better with a DRL on a different disk than the data. For example, you could dedicate one disk to holding all the DRLs for all the volumes in the group. They are so small that performance on this one disk is still ne.
6-16
Viewing the Installation and bootdg/rootdg Requirements in the Sun Cluster Environment
The rest of this module is dedicated to general procedures for installing and administering VxVM in the cluster. It describes both easy automatic ways, and more difcult manual ways for meeting the requirements described in the following list:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The vxio major number must be identical on all nodes This is the entry in the /etc/name_to_major le for the vxio device driver. If the major number is not the same on all nodes, the global device infrastructure cannot work on top of VxVM. VxVM must be installed on all nodes physically connected to shared storage On non-storage nodes, you can install VxVM if you will use it to encapsulate and mirror the boot disk. If not (for example, you might be using Solaris Volume Manager software to mirror root storage), then a non-storage node requires only that the vxio major number be added to the /etc/name_to_major le.
A license is required on all storage nodes not attached to Sun StorEdge A5x00 arrays. If you choose not to install VxVM on a nonstorage node you do not need a license for that node.
6-17
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Unique volume name across all nodes for the /global/.devices/node@# volume Unique minor number across nodes for the /global/.devices/node@# volume
The reason for these restrictions is that this le system is mounted as a global le system. The typical Solaris OS /etc/mnttab logic still demands that each device and major/minor combination be unique. If you want to encapsulate the boot disk, you should use the clvxvm utility which automates the correct creation of this volume (with different volume names and minor numbers on each node). This is described later in this module.
Have no multiple paths at all from a node to the same storage Have multiple paths from a node under the control of:
q q q
Sun StorEdge Trafc Manager software Sun Dynamic Link Manager, for Hitachi storage EMC PowerPath software, for EMC storage
6-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-19
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Using the installer or installvm utilities to install and initialize VxVM 4.1 or 5.0 Installing the packages by hand Using the clvxvm utility to synchronize vxio major numbers and to encapsulate the root disk Addressing VxVM cluster-specic issues if you installed VxVM before the cluster was installed
q q
They can install the software on multiple servers (multiple nodes) if rsh/rcp or ssh/scp is enabled between the nodes (or you can choose to run it separately on each node). They will install quite of few packages (30 packages for example, for VxVM 5.0), even if you choose only the required packages. These include Web servers and Java environments to run the GUI backends. This is many more than the minimal software required to run VxVM in the cluster. They guide you through entering licenses and initializing the software at the end of the installation.
6-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
VRTSvxvm VRTSvlic
The VxVM 4.1 and 5.0 packages are distributed on the media as .tar.gz les. The installer and installvm utilities extract these correctly. If you want to install VxVM without using the utilities you copy and extract these archives manually and then install them with pkgadd on each node. You then have to manually use vxinstall to license and initialize the software.
6-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
VxVM can be installed without optional packages to conserve disk space. Additional packages are typically installed to simplify future upgrades. 1) 2) 3) Required Veritas Volume Manager packages - 839 MB required All Veritas Volume Manager packages - 863 MB required Storage Foundation Enterprise packages - 1051 MB required
6-22
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
It encapsulates your boot disk in a disk group named rootdg, and creates the bootdg link. It gives different volume names for the volumes containing the /global/.devices/node@# le systems on each node. It edits the vfstab le for this same volume, replacing the DID device with the original c#t#d#s#. This allows VxVM encapsulation to properly recognize this is a partition on the OS disk. It installs a script to reminor the bootdg on reboot. It reboots you into a state where VxVM on your node is fully operational. It reboots the rst time using -x into non-clustered mode because the second reboot that is performed by VxVM automatically takes you back into the cluster.
q q
Note VxVM itself (not our utilities) still puts in a nologging option for the root le system when you encapsulate. This is a leftover from an old problem that has been xed. You must remove the nologging option from /etc/vfstab manually and reboot one more time.
6-23
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-24
Fixing the vxio Major Number Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
If you need to change the vxio major number to make it agree with the other nodes, do the following: 1. 2. 3. 4. Manually edit the /etc/name_to_major le. Unmirror and unencapsulate the root disk, if it is encapsulated. Reboot. Reencapsulate and remirror the boot disk, if desired.
2.
6-25
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid invalid
The command vxdisk list with the option -o alldgs scans every disk to see if there is any disk group information, even about disk groups not currently imported on this node. In this output you can see that there is no disk group information congured on other disk other than the rst disk. Disks with the invalid state are not yet initialized by VxVM.
6-26
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note The ID of a disk group contains the name of the node on which it was created. Later, you might see the same disk group with the same ID imported on a different node. # vxdisk list DEVICE TYPE c0t0d0s2 auto:sliced c0t1d0s2 auto:none c1t0d0s2 auto:cdsdisk c1t1d0s2 auto:cdsdisk c1t2d0s2 auto:none c1t3d0s2 auto:none c1t8d0s2 auto:none c1t9d0s2 auto:none c1t10d0s2 auto:none c1t11d0s2 auto:none c2t0d0s2 auto:cdsdisk c2t1d0s2 auto:cdsdisk c2t2d0s2 auto:none c2t3d0s2 auto:none c2t8d0s2 auto:none c2t9d0s2 auto:none c2t10d0s2 auto:none c2t11d0s2 auto:none
STATUS online online online online online online online online online online online online online online online online online online
invalid
6-27
Creating Shared Disk Groups and Volumes The following shows the status of an entire group where no volumes have been created yet: # vxprint -g nfsdg TY NAME ASSOC dg nfsdg nfsdg dm dm dm dm nfs1 nfs2 nfs3 nfs4 c1t0d0s2 c2t0d0s2 c1t1d0s2 c2t1d0s2 KSTATE LENGTH 71061376 71061376 71061376 71061376 PLOFFS STATE TUTIL0 PUTIL0 -
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-28
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-29
Creating Shared Disk Groups and Volumes sd sd pl sd sd nfs1-01 nfs3-01 nfsvol-02 nfs2-01 nfs4-01 nfsvol-01 nfsvol-01 nfsvol nfsvol-02 nfsvol-02 nfs1 nfs3 ENABLED nfs2 nfs4 0 0 ACTIVE 0 0 102400 102400 204800 102400 102400 0/0 1/0 STRIPE 0/0 1/0 c1t0d0 c1t1d0 2/128 c2t0d0 c2t1d0 ENA ENA RW ENA ENA
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
nfsdg remove volume nfsvol nfsdg make nfsvol 100m layout=stripe-mirror mirror=ctlr -g nfsdg
1152725899.26.vincent 71061376 71061376 71061376 71061376 204800 204800 102400 102400 102400 102400 102400 102400 102400 102400 102400 102400 102400 102400 SELECT STRIPE 0/0 1/0 nfsvol-0 fsgen 2/128 RW 2/2 ENA 2/2 ENA fsgen RW c1t0d0 ENA RW c2t0d0 ENA fsgen RW c1t1d0 ENA RW c2t1d0 ENA
nfs1 nfs2 nfs3 nfs4 nfsvol nfsvol-03 nfsvol-S01 nfsvol-S02 nfsvol-L01 nfsvol-P01 nfs1-02 nfsvol-P02 nfs2-02 nfsvol-L02 nfsvol-P03 nfs3-02 nfsvol-P04 nfs4-02
ENABLED ACTIVE ENABLED ACTIVE nfsvol-L01 1 nfsvol-L02 1 ENABLED ENABLED nfs1 ENABLED nfs2 ENABLED ENABLED nfs3 ENABLED nfs4 ACTIVE ACTIVE 0 ACTIVE 0 ACTIVE ACTIVE 0 ACTIVE 0
6-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
SPARE Disks marked with the SPARE ag are the preferred disks to use for hot relocation. These disks can still be used to build normal volumes.
NOHOTUSE Disks marked with the NOHOTUSE ag are excluded from consideration for hot relocation.
In the following example, disk nfs2 is set as a preferred disk for hot relocation, and disk nfs1 is excluded from hot relocation usage: # vxedit -g nfsdg set spare=on nfs2 # vxedit -g nfsdg set nohotuse=on nfs1 The ag settings are visible in the output of the vxdisk list and vxprint commands.
6-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# cldevicegroup show # cldg status Cluster Device Groups === --- Device Group Status --Device Group Name ----------------Primary ------Secondary --------Status ------
6-32
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The -n option (node list) should list all the nodes and only the nodes physically connected to the disk groups disks. The preferenced=true/false affects whether the nodelist indicates an order of failover preference. On a two node cluster this option is only meaningful if failback is true. The failback=true/false affects whether a preferred node takes back its device group when it joins the cluster. The default value is false. If it is true, it only works if preferenced is also set to true.
6-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note By default, even if there are more than two nodes in the node list for a device group, only one node shows up as secondary. If the primary fails, the secondary becomes primary and another (spare) node becomes secondary. The numsecondaries=# parameter is used on the cldg command line to allow more than one secondary. When VxVM disk groups are registered as Sun Cluster software device groups and have the status Online, never use the vxdg import and vxdg deport commands to control ownership of the disk group. This causes the cluster to treat the device group as failed. Instead, use the following command syntax to control disk group ownership: # cldg switch -n node_to_switch_to nfsdg
6-34
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If you do this command on the node that is not primary for the device group, you may get a warning that makes it seem like the command failed. It is just a warning, and the command succeeds.
6-35
Managing VxVM Device Groups Type the following command to put the device group online on the rst node in the list: # cldg online nfsdg Alternatively, you can choose exactly which node will become the device group primary. # cldg switch -n theo nfsdg
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You do not need to perform both commands. One or the other will do.
6-36
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Global le systems Accessible to all cluster nodes simultaneously, even those not physically connected to the storage Failover le systems Mounted only on the node running the failover data service, which must be physically connected to the storage
UFS and VxFS are two le system types that can be used as either global or failover le systems. These examples and the exercises assume you are using UFS. Note There is a convention that the mount point for global le systems be somewhere underneath the /global directory. This is just a convention.
A local failover le system entry looks like the following, and it should be identical on all nodes who may run services which access the le system (can only be nodes physically connected to the storage):
/dev/vx/dsk/nfsdg/nfsvol /dev/vx/rdsk/nfsdg/nfsvol /localnfs ufs 2 no -
6-37
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note The logging option is the default for all OS versions supported with Sun Cluster 3.2. You do not need to specify it explicitly.
6-38
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Caution Do not mirror each volume separately with vxassist. This leaves you in a state that the mirrored drive cannot be unencapsulated. That is, if you ever need to unencapsulate the second disk you will not be able to. If you mirror the root drive correctly as shown, the second drive can also be unencapsulated. That is because correct restricted subdisks are laid down onto the new drive. These can later be turned into regular Solaris OS partitions because they are on cylinder boundaries. # vxmirror -g bootdg rootdg_1 rootmir . . 3. The previous command also creates aliases for both the original root partition and the new mirror. You need to manually set your bootdevice variable so you can boot off both the original and the mirror. # eeprom|grep vx devalias vx-rootdg_1 /pci@1f,4000/scsi@3/disk@0,0:a devalias vx-rootmir /pci@1f,4000/scsi@3/disk@8,0:a # eeprom boot-device="vx-rootdg_1 vx-rootmir" 4. Verify that the system has been instructed to use these device aliases on boot: # eeprom|grep use-nvramrc use-nvramrc?=true
6-39
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-40
ZFS as Failover File System Only state: ONLINE scrub: none requested config: NAME marcpool mirror c1t0d0 c2t0d0 STATE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
errors: No known data errors Now we can create lesystems that occupy the pool. ZFS automatically creates mount points and mounts the le systems. You never even need to make /etc/vfstab entries. vincent:/# zfs create marcpool/myfs1 vincent:/# zfs create marcpool/myfs2
vincent:/# zfs list NAME marcpool marcpool/myfs1 marcpool/myfs2 vincent:/# df -k . . marcpool marcpool/myfs1 marcpool/myfs2
1% 1% 1%
The mount points are defaulted to /poolname/fsname, but you can change them to whatever you want: vincent:/# zfs set mountpoint=/oracle marcpool/myfs1 vincent:/# zfs set mountpoint=/shmoracle marcpool/myfs2 vincent:/# df -k |grep pool marcpool 34836480 26 34836332 1% marcpool/myfs1 34836480 24 34836332 1% marcpool/myfs2 34836480 24 34836332 1%
Note Module 9 includes an exercise to create a zpool and zfs lesystem, and to use HAStoragePlus to make the pool and all its zfs lesystems fail over.
6-41
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Selecting Disk Drives Task 2 Using pkgadd to Install and Initialize VxVM Software (on All Storage Nodes) Task 3 Using clvxvm to Verify the vxio Major Number Task 4 Adding vxio on Any Non-Storage Node on Which You Have Not Installed VxVM Task 5 Rebooting All Nodes Task 6 Conguring Demonstration Volumes Task 7 Registering Demonstration Disk Groups Task 8 Creating a Global nfs File System Task 9 Creating a Global web File System Task 10 Testing Global File Systems Task 11 Managing Disk Device Groups Task 12 (Optional) Viewing and Managing VxVM Device Groups Using Sun Cluster Manager Task 13 (Optional) Encapsulating the Boot Disk on a Cluster Node
q q
q q q q q q q q
6-42
Preparation
Record the location of the VxVM software you will install during this exercise. Location: _____________________________ During this exercise, you create two data service disk groups. Each data service disk group contains a single mirrored volume. Encapsulating the boot disk is an optional exercise at the end. The setup is shown in Figure 6-7.
Node 1 c0 bootdg
ccccc c1 c
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Boot Disk
Node 2 c0
c2
c1
c2
Quorum Disk
Array A Disk
Figure 6-7
Note During this exercise, when you see italicized names, such as IPaddress, enclosure_name, node1, or clustername embedded in a command string, substitute the names appropriate for your cluster.
6-43
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6-44
Task 2 Using pkgadd to Install and Initialize VxVM Software (on All Storage Nodes)
In this lab, just to save time, you add VxVM packages manually, rather than using installer or installvm. This is completely supported. You need to install VxVM only on nodes connected to the shared storage. If you have a non-storage node (Pair +1 cluster), you do not need to install VxVM on that node. Perform the following steps on all cluster nodes that are connected to shared storage: 1. Spool and expand the VxVM packages. # cd vxvm_sw_dir/volume_manager/pkgs # cp VRTSvlic.tar.gz VRTSvxvm.tar.gz \ VRTSvmman.tar.gz /var/tmp # # # # 2. cd /var/tmp gzcat VRTSvlic.tar.gz | tar xf gzcat VRTSvxvm.tar.gz | tar xf gzcat VRTSvmman.tar.gz | tar xf -
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Add the new VxVM packages. Answer yes to all questions (including conicts in directories). # pkgadd -d /var/tmp VRTSvlic VRTSvxvm VRTSvmman Add the VxVM 5.0 patch provided with the courseware. At the time of writing, there were no patches. Consult your instructor. # cd location_of_vxvm_patch_for_course # patchadd patchid
3.
4.
Run vxinstall to initialize VxVM: a. b. c. Enter the license information as directed by your instructor Answer No when asked about enclosure-based naming. Answer No when asked about a default disk group.
6-45
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 4 Adding vxio on Any Non-Storage Node on Which You Have Not Installed VxVM
If you have a non-storage node on which you have not installed VxVM, edit /etc/name_to_major and add a line at the bottom containing the same vxio major number as used on the storage nodes: vxio same_major_number_as_other_nodes
Note Be careful and assure that on the non-storage node the major number is not already used by some other device driver on another line in the le. If it is, you will have to change the vxio major number on all nodes, picking a number higher than currently exists on any node.
6-46
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# vxdiskadd c#t#d# c#t#d# Which disk group [<group>,none,list,q,?] (default: none) nfsdg Create a new group named nfsdg? [y,n,q,?] (default: y) y Create the disk group as a CDS disk group? [y,n,q,?] (default: y) y Use default disk names for these disks? [y,n,q,?] (default: y) y Add disks as spare disks for nfsdg? [y,n,q,?] (default: n) n Exclude disks from hot-relocation use? [y,n,q,?] (default: n) y Add site tag to disks? [y,n,q,?] (default: n) n . . Continue with operation? [y,n,q,?] (default: y) y . Do you want to use the default layout for all disks being initialized? [y,n,q,?] (default: y) y
Caution If you are prompted about encapsulating the disk, you should reply no. If you are prompted about clearing old disk usage status from a previous training class, you should reply yes. In your work environment, be careful when answering these questions because you can destroy critical data or cluster node access information. 2. Verify the status of the disk group and the names and ownership of the disks in the nfsdg disk group. # vxdg list # vxdisk list 3. On Node 1, verify that the new nfsdg disk group is globally linked. # ls -l /dev/vx/dsk/nfsdg lrwxrwxrwx 1 root root 12 Dec /dev/vx/dsk/nfsdg -> /global/.devices/node@1/dev/vx/dsk/nfsdg 4. 6 2006
On Node 1, create a 500-Mbyte mirrored volume in the nfsdg disk group. # vxassist -g nfsdg make nfsvol 500m layout=mirror
6-47
Exercise: Configuring Volume Management 5. On Node 1, create the webdg disk group with your previously selected logical disk addresses. Answers in the dialog can be similar to those in Step 1. # vxdiskadd c#t#d# c#t#d# Which disk group [<group>,none,list,q,?] (default: none) webdg 6. On Node 1, create a 500-Mbyte mirrored volume in the webdg disk group. # vxassist -g webdg make webvol 500m layout=mirror 7. 8. Use the vxprint command on both nodes. Notice that Node 2 does not see the disk groups created and still imported on Node 1. Issue the following command on the node that currently owns the disk group: # newfs /dev/vx/rdsk/webdg/webvol It should fail with a no such device or address error.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Put the local node (Node 1) rst in the node list. 2. 3. On Node 1, use the clsetup utility to register the webdg disk group. # clsetup From the main menu, complete the following steps: a. b. Select option 5, Device groups and volumes. From the device groups menu, select option 1, Register a VxVM disk group as a device group.
6-48
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Name of the VxVM disk group you want to register? webdg Do you want to configure a preferred ordering (yes/no) [yes]? yes Are both nodes attached to all disks in this group (yes/no) [yes]? yes
Note Read the previous question carefully. On a cluster with more than two nodes, you will be asked if all nodes are connected to the disks. In a Pair +1 conguration, for example, you need to answer no and respond when you are asked which nodes are connected to the disks. Which node is the preferred primary for this device group? node2 Enable "failback" for this disk device group (yes/no) [no]? yes
Note Make sure you specify node2 as the preferred primary node. You might see warnings about disks congured by a previous class that still contain records about a disk group named webdg. These warnings can be safely ignored. 5. From either node, verify the status of the disk groups. # cldg status
6-49
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Do not use the line continuation character (\) in the vfstab le. 3. 4. On Node 1, mount the /global/nfs le system. # mount /global/nfs Verify that the le system is mounted and available on all nodes. # mount # ls /global/nfs lost+found
6-50
Exercise: Configuring Volume Management 3. On all nodes, add a mount entry in the /etc/vfstab le for the new le system with the global mount option. /dev/vx/dsk/webdg/webvol /dev/vx/rdsk/webdg/webvol \ /global/web ufs 2 yes global
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Do not use the line continuation character (\) in the vfstab le. 4. 5. On Node 2, mount the /global/web le system. # mount /global/web Verify that the le system is mounted and available on all nodes. # mount # ls /global/web lost+found
6-51
Adding a Volume to a Disk Device Group Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Perform the following steps to add a volume to an existing device group: 1. 2. Make sure the device group is online (to the Sun Cluster software). # cldg status On the node that is primary for the device group, create a 100-Mbyte test volume in the nfsdg disk group. # vxassist -g nfsdg make testvol 100m layout=mirror 3. 4. Verify the status of the volume testvol. # vxprint -g nfsdg testvol Prove that the new volume can not be used in the cluster (you will get an error message): # newfs /dev/vx/rdsk/nfsdg/testvol 5. Synchronize the changes to the nfsdg disk group conguration, and prove that the volume is now usable: # cldg sync nfsdg # newfs /dev/vx/rdsk/nfsdg/testvol
6-52
Migrating Device Groups Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You use the cldg show command to determine current device group conguration parameters. Perform the following steps to verify device group behavior: 1. Verify the current demonstration device group conguration. # cldg show 2. From either node, switch the nfsdg device group to Node 2. # cldg switch -n node2name nfsdg Type the init 0 command to shut down Node 1. 3. 4. Boot Node 1. The nfsdg disk group should automatically migrate back to Node 1. Verify this with cldg status from any node. Disable the device group failback feature. It is inconsistent with how the device groups work with the applications that will be congured in Modules 9 and 10: # cldg set -p failback=false nfsdg # cldg set -p failback=false webdg # cldg show
6-53
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 12 (Optional) Viewing and Managing VxVM Device Groups Using Sun Cluster Manager
In this task, you view and manage VxVM device groups through Sun Cluster Manager. Perform the following steps on your administration workstation or display station. 1. In a Web browser, log in to Sun Java Web Console on any cluster node: https://nodename:6789 2. 3. 4. 5. 6. 7. Enter the Sun Cluster Manager Application. Open up the arrow pointing to the Storage folder on the left. From the subcategories revealed for Storage, click Device Groups. You will see error messages about the singleton device groups for each DID. This is normal. Click the name of your VxVM device groups (near the bottom of the list). Use the Switch Primaries button to switch the primary for the device group.
6-54
Exercise: Configuring Volume Management Complete the following steps on any node on which you want to encapsulate root (encapsulate one node at time and wait for reboot if you want to do it on more than one node): 1. 2. 3. Run clvxvm encapsulate Wait until your nodes have been rebooted twice. Verify that the OS has successfully been encapsulated. # df -k # swap -l # grep /dev/vx /etc/vfstab 4. Manually edit /etc/vfstab and replace the nologging option on the line for the root volume (rootvol) with a minus sign (-). Make sure you still end with seven elds on the line. Reboot the node one more time so that the root le system logging is re-enabled.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
5.
6-55
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
6-56
Module 7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Describe Solaris Volume Manager traditional partitioning and soft partitions Differentiate between shared disksets and local disksets Describe Solaris Volume Manager multi-owner disksets Describe volume database (metadb) management issues and the role of Solaris Volume Manager mediators Congure the Solaris Volume Manager software Create the local metadb replicas Add disks to shared disksets Build Solaris Volume Manager mirrors with soft partitions in disksets Use Solaris Volume Manager status commands Perform Sun Cluster software-level device group management Create global le systems Mirror the boot disk with Solaris Volume Manager Describe ZFS and its built-in volume management in contrast to traditional le systems used with a volume manager like Solaris VM
q q q
q q q q
q q q q q
7-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
Is it easier to manage Solaris Volume Manager devices with or without soft partitions? How many different collections of metadbs will you need to manage? What is the advantage of using DID devices as Solaris Volume Manager building blocks in the cluster? How are Solaris Volume Manager disksets registered in the cluster?
7-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971. Sun Cluster Software Installation Guide for Solaris OS, part number 8192970. Sun Cluster Concepts Guide for Solaris OS, part number 819-2969. Solaris Volume Manager Administration Guide, part number 816-4520.
q q
7-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Using Solaris Volume Manager with and without soft partitions Identifying the purpose of Solaris Volume Manager disksets in the cluster Managing local diskset database replicas (metadb) and shared diskset metadb replicas Initializing Solaris Volume Manager to build cluster disksets Building cluster disksets Mirroring with soft partitions in cluster disksets Encapsulating and mirroring the cluster boot disk with Solaris Volume Manager
q q q q
7-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Volume d6 submirror Slice 7 (metadb) Slice 3 Slice 4 Volume d12 submirror Slice 6 Physical Disk Drive
Figure 7-1
7-5
Exploring Solaris Volume Manager Disk Space Management The following are limitations with using only partitions as Solaris Volume Manager building blocks:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The number of building blocks per disk or LUN is limited to the traditional seven (or eight, if you want to push it) partitions. This is particularly restrictive for large LUNs in hardware RAID arrays. Disk repair is harder to manage because an old partition table on a replacement disk must be recreated or replicated manually.
They allow you to manage volumes whose size meets your exact requirements without physically repartitioning Solaris OS disks. They allow you to create more than seven volumes using space from a single large drive or LUN. They can grow if there is space left inside the parent component. The space does not have to be physically contiguous. Solaris Volume Manager nds whatever space it can to allow your soft partition to grow within the same parent component.
7-6
Exploring Solaris Volume Manager Disk Space Management There are two ways to deal with soft partitions in Solaris Volume Manager:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Make soft partitions on disk partitions, and then use those to build your submirrors as shown in Figure 7-2.
Volume d10 submirror Slice 7 (metadb) soft partition d18 slice s0 soft partition d28 soft par tition d38 Volume d30 submirror Physical Disk Drive V olume d20 submirror
Figure 7-2
7-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Build mirrors using entire partitions, and then use soft partitions to create the volumes with the size you want as shown in Figure 7-3.
Slice 7 (metadb)
Slice 7 (metadb)
slice s0
slice s0
submirror
submirror
Figure 7-3
All the examples later in this module use the second strategy. The advantages of this are the following:
q q
Volume management is simplied. Hot spare management is consistent with how hot spare pools work.
The only disadvantage of this scheme is that you have large mirror devices even if you are only using a small amount of space. This slows down resynching of mirrors.
7-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Node 1
Node 2
Figure 7-4
7-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You must add local replicas manually. You can put local replicas on any dedicated partitions on the local disks. You might prefer to use slice 7 (s7) as a convention because the shared diskset replicas have to be on that partition. You must spread local replicas evenly across disks and controllers. If you have less than three local replicas, Solaris Volume Manager logs warnings. You can put more than one copy in the same partition to satisfy this requirement. For example, with two local disks, set up a dedicated partition on each disk and put three copies on each disk.
q q
7-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
If fewer than 50 percent of the dened metadb replicas are available, Solaris Volume Manager ceases to operate. If fewer than or equal to 50 percent of the dened metadb replicas are available at boot time, you cannot boot the node. However, you can boot to single-user mode, and just use the metadb command to delete ones that are not available.
There are separate metadb replicas for each diskset. They are automatically added to disks as you add disks to disk sets. They will be, and must remain, on slice 7. You must use the metadb command to remove and add replicas if you replace a disk containing replicas.
If fewer than 50 percent of the dened replicas for a diskset are available, the diskset ceases to operate. If exactly 50 percent of the dened replicas for a diskset are available, the diskset still operates, but it cannot be taken or switched over.
7-12
Using Solaris Volume Manager Database Replicas (metadb Replicas) This is unacceptable in the Sun Cluster environment, because it can take a while to x a broken array or controller, and you still want to be able to gracefully survive a node failure during that period. The Sun Cluster 3.2 software environment includes special add-ons to Solaris Volume Manager called mediators. Mediators allow you to identify the nodes themselves as tie-breaking votes in the case of failure of exactly 50 percent of the metadb replicas of a shared diskset. The mediator data is stored in the memory of a running Solaris OS process on each node. If you lose an array, the node mediators will change to golden status, indicating they count as two extra votes for the shared diskset quorum mathematics (one from each node). This allows you to maintain normal diskset operations with exactly 50 percent of the metadb replicas surviving. You can also lose a node at this point (you would still have one tie-breaking, golden mediator).
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
md_nsets
7-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-16
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Aliases
7-17
Creating Shared Disksets and Mediators # metadb -s nfsds flags a u a u # medstat -s nfsds Mediator vincent theo
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
r r
first blk 16 16
Status Ok Ok
Golden No No
A small portion of the drive (starting at cylinder 0) is mapped to slice 7 or slice 6 to be used for state database replicas (at least 4 Mbytes in size).
Note Sun Cluster 3.2 is the rst Sun Cluster version to fully support disks with Extensible Firmware Interface (EFI) labels, which are required for disks of size greater than 1 Terabyte (Tbyte). These disks have no slice 7. Solaris Volume Manager automatically detects an EFI disk, and uses slice 6 for the metadb partition.
q q
One metadb is added to slice 7 or 6 as appropriate. Slice 7 or 6 is marked with the ag bits 01 (this shows up as wu when you view the disk from the format command writable but unmountable). The rest of the drive is mapped to slice 0 (on a standard VTOC disk, even slice 2 is deleted).
The drive is not repartitioned if the disk already has no slice 2 and slice 7 already has the following characteristics:
q q q
It starts at cylinder 0. It has at least 4 Mbytes (large enough to hold a state database). The ag bits are already 01 (wu in format command).
7-18
Creating Shared Disksets and Mediators Regardless of whether the disk is repartitioned, diskset metadb replicas are added automatically to slice 7 or slice 6 as appropriate (as shown in Figure 7-5). If you have exactly two disk controllers, you should always add an equivalent numbers of disks or LUNs from each controller to each diskset to maintain the balance of metadb replicas in the diskset across controllers.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Slice 7 (metadb)
Slice 0
Figure 7-5
Note If it is impossible to add the same number of disks from each controller to a diskset, you should manually run the command metadb -s setname -b in order to balance the number of metadb replicas across controllers.
7-19
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Always use slice 0 as is in the diskset (almost the entire drive or LUN). Build sub-mirrors out of slice 0 of disks across two different controllers. Build a mirror out of those sub-mirrors. Use soft partitioning of the large mirrors to size the volumes according to your needs.
q q
Figure 7-6 demonstrates the strategy again in terms of volume (d#) and DID (d#s#). While Solaris Volume Manager would allow you to use c#t#d#, always use DID numbers in the cluster to guarantee a unique, agreed-upon device name from the point of view of all nodes.
d100 (mirror)
Figure 7-6
7-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Use -s disksetname with the command Use disksetname/d# for volume operands in the command
This module uses the former model for all the examples. The following is an example of the commands that are used to build the conguration described in Figure 7-6 on page 7-20: # metainit -s nfsds d101 1 1 /dev/did/rdsk/d9s0 nfsds/d101: Concat/Stripe is setup # metainit -s nfsds d102 1 1 /dev/did/rdsk/d17s0 nfsds/d102: Concat/Stripe is setup # metainit -s nfsds d100 -m d101 nfsds/d100: Mirror is setup # metattach -s nfsds d100 d102 nfsds/d100: submirror nfsds/d102 is attached # metainit -s nfsds d10 -p d100 200m d10: Soft Partition is setup # metainit -s nfsds d11 -p d100 200m d11: Soft Partition is setup The following commands show how a soft partition can be grown if there is space in the parent volume, even non-contiguous space. You will see in the output of metastat on the next page how both soft partitions contain noncontiguous space, because of the order in which they are created and grown. # metattach -s nfsds d10 nfsds/d10: Soft Partition / metattach -s nfsds d11 nfsds/d11: Soft Partition 400m has been grown 400m has been grown
Note The volumes are being increased by 400 Mbytes, to a total size of 600 Mbytes.
7-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
nfsds/d102: Submirror of nfsds/d100 State: Resyncing Size: 71118513 blocks (33 GB) Stripe 0:
7-22
Using Solaris Volume Manager Status Commands Device d17s0 Start Block 0 Dbase No State Reloc Hot Spare Okay No
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
nfsds/d10: Soft Partition Device: nfsds/d100 State: Okay Size: 1228800 blocks (600 MB) Extent Start Block 0 32 1 819296 Device Relocation Information: Device Reloc Device ID d17 No d9 No -
7-23
Managing Solaris Volume Manager Disk9]H and Sun Cluster Device Groups
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Managing Solaris Volume Manager Disk9]H and Sun Cluster Device Groups
When Solaris Volume Manager is used in the cluster environment, the commands are tightly integrated into the cluster. Creation of a shared diskset automatically registers that diskset as a cluster-managed device group: # cldg status Cluster Device Groups === --- Device Group Status --Device Group Name ----------------nfsds Primary ------vincent Secondary --------theo Status -----Online
# cldg show nfsds Device Groups === Device Group Name: Type: failback: Node List: preferenced: numsecondaries: diskset name: nfsds SVM false vincent, theo true 1 nfsds
Addition or removal of a directly connected node to the diskset using metaset -s nfsds -a -h newnode automatically updates the cluster to add or remove the node from its list. There is no need to use cluster commands to resynchronize the device group if volumes are added and deleted. In the Sun Cluster environment, use only the cldg switch command (rather than the metaset -[rt] commands) to change physical ownership of the diskset. As demonstrated in this example, if a mirror is in the middle of synching, this forces a switch anyway and restarts the synch of the mirror all over again. # cldg switch -n theo nfsds
7-24
Managing Solaris Volume Manager Disk9]H and Sun Cluster Device Groups Dec 10 14:06:03 vincent Cluster.Framework: stderr: metaset: vincent: Device busy [a mirror was still synching, will restart on other node] # cldg status Cluster Device Groups === --- Device Group Status --Device Group Name ----------------nfsds Primary ------theo Secondary --------vincent Status -----Online
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-25
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-26
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Global le systems These are accessible to all cluster nodes simultaneously, even those not physically connected to the storage. Failover le systems These are mounted only on the node running the failover data service, which must be physically connected to the storage.
The le system type can be UFS or VxFS regardless of whether you are using global or failover le systems. The examples and the lab exercises in this module assumes that you are using UFS.
A failover le system entry looks similar to the following, and it should be identical on all nodes that might run services that access the le system (they can only be nodes that are physically connected to the storage).
/dev/md/nfsds/dsk/d10 /dev/md/nfsds/rdsk/d10 /localnfs ufs 2 no -
7-27
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The boot drive is mirrored after cluster installation. All partitions on the boot drive are mirrored. The example has root (/), swap, and /global/.devices/node@1. That is, you will be manually creating three separate mirror devices. The geometry and partition tables of the boot disk and the new mirror are identical. Copying the partition table from one disk to the other was previously demonstrated in Repartitioning a Mirror Boot Disk and Adding metadb Replicas on page 7-16. Both the boot disk and mirror have three copies of the local metadb replicas. This was done previously. Soft partitions are not used on the boot disk. That way, if you need to back out, you can just go back to mounting the standard partitions by editing the /etc/vfstab manually. The /global/.devices/node@1 is a special, cluster-specic case. For that one device you must use a different volume d# for the top-level mirror on each node. The reason is that each of these le systems appear in the /etc/mnttab of each node (as global le systems) and the Solaris OS does not allow duplicate device names.
7-28
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
/dev/did/dsk/d11s3 491520 /dev/did/dsk/d1s3 491520 3677 438691 1% /global/.devices/node@1 3735 438633 1% /global/.devices/node@2
# swap -l swapfile /dev/dsk/c0t0d0s1 # metadb flags a m pc luo a pc luo a pc luo a m pc luo a pc luo a pc luo
dev 32,1
3. 4.
7-29
Using Solaris Volume Manager to Mirror the Boot Disk 2. Create a mirror using only the sub-mirror mapping to the existing partition. Do not attach the other half of the mirror until after a reboot. Use the metaroot command, which will automatically edit the /etc/vfstab and /etc/system les.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
3.
# metainit -f d31 1 1 c0t0d0s3 d11: Concat/Stripe is setup # metainit d32 1 1 c0t8d0s3 d12: Concat/Stripe is setup //following volume name (d30) must be different on each // node # metainit d30 -m d31 d10: Mirror is setup
7-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
(change correct line (for /global/.devices/node@# manually) /dev/md/dsk/d30 /dev/md/rdsk/d30 /global/.devices/node@1 ufs 2 no global
5% 0% 0% 0% 0% 1% 0% 0% 1% 1% 7% 6%
7-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-32
ZFS as Failover File System Only state: ONLINE scrub: none requested config: NAME marcpool mirror c1t0d0 c2t0d0 STATE ONLINE ONLINE ONLINE ONLINE READ WRITE CKSUM 0 0 0 0 0 0 0 0 0 0 0 0
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
errors: No known data errors Now we can create lesystems that occupy the pool. ZFS automatically creates mount points and mounts the le systems. You never even need to make /etc/vfstab entries. vincent:/# zfs create marcpool/myfs1 vincent:/# zfs create marcpool/myfs2
vincent:/# zfs list NAME marcpool marcpool/myfs1 marcpool/myfs2 vincent:/# df -k . . marcpool marcpool/myfs1 marcpool/myfs2
1% 1% 1%
The mount points are defaulted to /poolname/fsname, but you can change them to whatever you want: vincent:/# zfs set mountpoint=/oracle marcpool/myfs1 vincent:/# zfs set mountpoint=/shmoracle marcpool/myfs2 vincent:/# df -k |grep pool marcpool 34836480 26 34836332 1% marcpool/myfs1 34836480 24 34836332 1% marcpool/myfs2 34836480 24 34836332 1%
Note Module 9 includes an exercise to create a zpool and zfs lesystem, and to use HAStoragePlus to make the pool and all its zfs lesystems fail over.
7-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Initializing the Solaris Volume Manager Local metadb Replicas Task 2 Selecting the Solaris Volume Manager Demo Volume Disk Drives Task 3 Conguring Solaris Volume Manager Disksets Task 4 Conguring Solaris Volume Manager Demonstration Volumes Task 5 Creating a Global nfs File System Task 6 Creating a Global web File System Task 7 Testing Global File Systems Task 8 Managing Disk Device Groups Task 9 Viewing and Managing Solaris Volume Manager Device Groups Using Sun Cluster Manager Task 10 (Optional) Managing and Mirroring the Boot Disk With Solaris Volume Manager
q q
q q q q q
Preparation
This exercise assumes you are running the Sun Cluster 3.2 software environment on Solaris 10 OS. At least one local disk on each node must have a small unused slice that you can use for the local metadb replicas. The examples in this exercise use slice 7 of the boot disk. If you have encapsulated (but not mirrored) your root disk in a previous VERITAS Volume Manager lab, you can still do this lab if you have a second local disk. To run Solaris Volume Manager, you must have at least one local disk that has room for local metadb replicas and is not under control of VERITAS Volume Manager.
7-34
Exercise: Configuring Solaris Volume Manager During this exercise, you create two data service disksets that each contain a single mirrored volume as shown in Figure 7-7.
Node 1 c0 (replicas) (replicas) c1 c2 c1 c2 Boot Disk Boot Disk Node 2 c0
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Quorum Disk
Primary Disk
Figure 7-7
Note During this exercise, when you see italicized names, such as IPaddress, enclosure_name, node1, or clustername embedded in a command string, substitute the names appropriate for your cluster.
7-35
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2.
On each node in the cluster, use the metadb command to create three replicas on the unused boot disk slice. # metadb -a -c 3 -f replica_slice
Caution Make sure you reference the correct slice address on each node. You can destroy your boot disk if you make a mistake. 3. On all nodes, verify that the replicas are congured and operational. # metadb flags first blk a u 16 a u 8208 a u 16400 block count 8192 /dev/dsk/c0t0d0s7 8192 /dev/dsk/c0t0d0s7 8192 /dev/dsk/c0t0d0s7
7-36
Task 2 Selecting the Solaris Volume Manager Demo Volume Disk Drives
Perform the following steps to select the Solaris Volume Manager demo volume disk drives: 1. 2. 3. On Node 1, type the cldev list -v command to list all of the available DID drives. If you happen to also be using VxVM, type vxdisk -o alldgs list to ensure that you pick drives that do not conict with VxVM. Record the logical path and DID path numbers of four disks that you will use to create the demonstration disksets and volumes in Table 7-2. Remember to mirror across arrays.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You need to record only the last portion of the DID path. The rst part is the same for all DID devices: /dev/did/rdsk. Table 7-2 Logical Path and DID Numbers Diskset example nfsds webds Volumes d100 d100 d100 Primary Disk c2t3d0 d4 Mirror Disk c3t18d0 d15
Note Make sure the disks you select are not local devices. They must be dual-hosted and available to more than one cluster host.
7-37
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# cldg status
7-38
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7-39
Exercise: Configuring Solaris Volume Manager 4. Create a 500-Mbyte soft partition on top of your mirror. This is the volume you will actually use for your le system data. # metainit -s webds d100 -p d99 500m 5. Verify the status of the new volume. # metastat -s webds
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If you have already done the VxVM exercises, and already have a mounted /global/nfs le system, you can choose a different mount point for this one. # mkdir /global/nfs 3. On all nodes, add a mount entry in the /etc/vfstab le for the new le system with the global mount option. /dev/md/nfsds/dsk/d100 /dev/md/nfsds/rdsk/d100 \ /global/nfs ufs 2 yes global
Note Do not use the line continuation character (\) in the vfstab le. 4. 5. On Node 1, mount the /global/nfs le system. # mount /global/nfs Verify that the le system is mounted and available on all nodes. # mount # ls /global/nfs lost+found
7-40
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If you have already done the VxVM exercises, and already have a mounted /global/web le system, you can choose a different mount point for this one. # mkdir /global/web 3. On all nodes, add a mount entry in the /etc/vfstab le for the new le system with the global mount option. /dev/md/webds/dsk/d100 /dev/md/webds/rdsk/d100 \ /global/web ufs 2 yes global
Note Do not use the line continuation character (\) in the vfstab le. 4. 5. On Node 1, mount the /global/web le system. # mount /global/web Verify that the le system is mounted and available on all nodes. # mount # ls /global/web lost+found
7-41
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You can bring a device group online to a selected node as follows: # cldg switch -n node_to_switch_to devgrpname 2. Verify the current demonstration device group conguration. # cldg show 3. Shut down Node 1. The nfsds and webds disksets should automatically migrate to Node 2 (verify with the cldg status command). 4. 5. Boot Node 1. Both disksets should remain mastered by Node 2. Use the cldg switch command from either node to migrate the nfsds diskset to Node 1. # cldg switch -n node1 nfsds
7-42
Task 9 Viewing and Managing Solaris Volume Manager Device Groups Using Sun Cluster Manager
In this task you view and manage Solaris Volume Manager device groups through Sun Cluster Manager. Perform the following steps on your administration workstation or display station. 1. In a Web browser, log in to Sun Java Web Console on any cluster node: https://nodename:6789 2. 3. 4. 5. 6. 7. Enter the Sun Cluster Manager Application. Open up the arrow pointing to the Storage folder on the left. From the subcategories revealed for Storage, click Device Groups. You will see error messages about the singleton device groups for each DID. This is normal. Click the name of your Solaris Volume Manager device groups (near the bottom of the list). Use the Switch Primaries button to switch the primary for the device group.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 10 (Optional) Managing and Mirroring the Boot Disk With Solaris Volume Manager
If you have time and interest, try this procedure. Typically, you would run it on all nodes but feel free to run it on one node only as a proof of concept. This task assumes that you have a second drive with identical geometry to the rst. If you do not, you can still perform the parts of the lab that do not refer to the second disk. That is, create mirrors with only one submirror for each partition on your boot disk. The example uses c0t8d0 as the second drive.
7-43
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# metadb -a -c 3 c0t8d0s7
u u u u u u
7-44
Example for /global/.devices/node@# Partition Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# metainit -f d31 1 1 c0t0d0s3 d11: Concat/Stripe is setup # metainit d32 1 1 c0t8d0s3 d12: Concat/Stripe is setup //following volume name (d30) must be different on each // node # metainit d30 -m d31 d10: Mirror is setup # vi /etc/vfstab (change correct line (for /global/.devices/node@# manually) /dev/md/dsk/d30 /dev/md/rdsk/d30 /global/.devices/node@1 ufs 2 no global
// this one has to be different mirror volume # for each node # metattach d30 d32
7-45
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
7-46
Module 8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Dene the purpose of IPMP Dene the concepts of an IPMP group List examples of network adapters in IPMP groups on a single Solaris OS server Describe the operation of the in.mpathd daemon List the options to the ifconfig command that support IPMP and congure IPMP using /etc/hostname.xxx les Perform a forced failover of an adapter in an IPMP group Congure IPMP manually with ifconfig commands Describe the integration of IPMP into the Sun Cluster software environment
q q
q q q
8-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
Should you congure IPMP before or after you install the Sun Cluster software? Why is IPMP required even if you do not have redundant network adapters? Is the conguration of IPMP any different in the Sun Cluster environment than on a stand-alone Solaris OS? Is the behavior of IPMP any different in the Sun Cluster environment?
8-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971 Sun Cluster Software Installation Guide for Solaris OS, part number 8192970 Sun Cluster Concepts Guide for Solaris OS, part number 819-2969 Solaris Administration Guide: IP Services, part number 816-4554
q q
8-3
Introducing IPMP
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Introducing IPMP
IPMP has been a standard part of the base Solaris OS since Solaris 8 OS Update 3 (01/01). IPMP enables you to congure redundant network adapters, on the same server (node) and on the same subnet, as an IPMP failover group. The IPMP daemon detects failures and repairs of network connectivity for adapters, and provides failover and failback of IP addresses among members of the same group. Existing TCP connections to the IP addresses that fail over as part of an IPMP group are interrupted for short amounts of time without data loss and without being disconnected. In the Sun Cluster 3.2 software environment, you must use IPMP to manage any public network adapters on which you will be placing the IP addresses associated with applications running in the cluster. These application IP addresses are implemented by mechanisms known as LogicalHostname and SharedAddress resources. The conguration of these resources requires that the public network adapters being used are under the control of IPMP. Module 9, Introducing Data Services, Resource Groups, and HA-NFS, and Module 10, Conguring Scalable Services and Advanced Resource Group Relationships, provide detail about the proper conguration of applications and their IP addresses in the Sun Cluster software environment.
8-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A network adapter can be a member of only one group. When both IPv4 and IPv6 are congured on a physical adapter, the group names are always the same (and thus need not be specied explicitly for IPv6). All members of a group must be on the same subnet. The members, if possible, should be connected to physically separate switches on the same subnet. Adapters on different subnets must be in different groups. You can have multiple groups on the same subnet. You can have a group with only one member. However, there is no redundancy and there is no automatic failover. Each Ethernet adapter must have a unique MAC address. This is achieved by setting the local-mac-address? variable to true in the OpenBoot PROM. Network adapters in the same group must be the same type. For example, you cannot combine Ethernet adapters and Asynchronous Transfer Mode (ATM) adapters in the same group. In Solaris 9 OS, when more than one adapter is in an IPMP group, each adapter requires a dedicated test IP address, or test interface. This is an extra static IP for each group member specically congured for the purposes of testing the health of the adapter using ping trafc. The test interface enables test trafc on all the members of the IPMP group. This is the reason that local-mac-address?=true is required for IPMP.
q q q
8-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Solaris 10 OS does not require test addresses, even with multiple adapters in a group. If you set up adapters in Solaris 10 OS IPMP groups without test addresses, the health of the adapter is determined solely by the link state of the adapter.
Note Using IPMP without test addresses reduces network trafc and reduces the administrative strain of allocating the addresses. However, the testing is less robust. For example, this author has personally experienced adapters with a valid link state but broken receive logic. Such an adapter would be properly faulted using test addresses. The remaining examples in the module focus on creating IPMP congurations with test addresses.
q
If both IPv4 and IPv6 are congured on the same adapters, it is possible to have test addresses for both IPv4 and IPv6, but you do not have to. If a failure is detected (even if you have only an IPv4 test address), all IPv4 and IPv6 addresses (except the test address) fail over to the other physical adapter. If you do choose to have an IPv6 test address, that test address will always be the link-local IPv6 address (the address automatically assigned to the adapter, that can only be used on the local subnet). See Conguring Adapters for IPV6 (Optional) on page 8-16 to see how to set up IPv6 on your adapters.
8-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You are allowed (and must) congure only the test interface on the adapter. Any attempt to manually congure any other addresses fails. Additional IP addresses are added to the adapter only as a result of a failure of another member of the group. The standby adapter is preferred as a failover target if another member of the group fails. You must have at least one member of the group that is not a standby adapter.
Note The examples of two-member IPMP groups in the Sun Cluster software environment will not use any standby adapters. This allows the Sun Cluster software to balance additional IP addresses associated with applications across both members of the group. If you had a three-member IPMP group in the Sun Cluster software environment, setting up one of the members as a standby would still be a valid option.
8-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
mutual failover
Solaris OS Server
Figure 8-1
8-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Group: therapy
qfe0
qfe4
qfe8 (standby)
failover
Figure 8-2
Group: therapy qfe0 qfe4 qfe1 Group: mentality qfe5 Solaris OS Server
Figure 8-3
8-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Solaris OS Server
Figure 8-4
8-10
Describing IPMP
Describing IPMP
The in.mpathd daemon controls the behavior of IPMP. This behavior can be summarized as a three-part scheme:
q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Network path failure detection Network path failover Network path failback
The in.mpathd daemon starts automatically when an adapter is made a member of an IPMP group through the ifconfig command.
If no targets are discovered through the routing table, ther targets are.discovered by a ping command to the 224.0.0.1 (all-hosts) multicast IP address.
8-11
Describing IPMP To ensure that each adapter in the group functions properly, the in.mpathd daemon probes all the targets separately through all the adapters in the multipathing group, using each adapters test address. If there are no replies to ve consecutive probes, the in.mpathd daemon considers the adapter as having failed. The probing rate depends on the failure detection time (FDT). The default value for failure detection time is 10 seconds. For a failure detection time of 10 seconds, the probing rate is approximately one probe every two seconds. You might need to manipulate the choice of ping targets. For example, you might have default routers that are explicitly congured not to answer pings. One strategy is to enter specic static host routes to IP addresses that are on the same subnet as your adapters. These will then be chosen rst as the targets. For example, if you wanted to use 192.168.1.39 and 192.168.1.5 as the targets, you could run these commands: # route add -host 192.168.1.39 192.168.1.39 -static # route add -host 192.168.1.5 192.168.1.5 -static You can put these commands in a boot script. The Solaris Administration Guide: IP Services from docs.sun.com listed in the resources section at the beginning of this module suggests making a boot script /etc/rc2.d/S70ipmp.targets. This works on both Solaris 9 and Solaris 10.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-12
Configuring IPMP
Conguring IPMP
This section outlines how IPMP is congured directly through options to the ifconfig command. Most of the time, you will just specify the correct IPMP-specic ifconfig options in the /etc/hostname.xxx les. After these are correct, you never have to change them again.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
group groupname Adapters on the same subnet are placed in a failover group by conguring them with the same group name. The group name can be any name and must be unique only within a single Solaris OS (node). It is irrelevant if you have the same or different group names across the different nodes of a cluster. -failover Use this option to demarcate the test address for each adapter. The test interface is the only interface on an adapter that does not fail over to another adapter when a failure occurs. deprecated This option, while not required, is typically used on the test interfaces. Any IP address marked with this option is not used as the source IP address for any client connections initiated from this machine. standby This option, when used with the physical adapter (-failover deprecated standby), turns that adapter into a standby-only adapter. No other virtual interfaces can be congured on that adapter until a failure occurs on another adapter within the group. addif Use this option to create the next available virtual interface for the specied adapter. The maximum number of virtual interfaces per physical interface is 8192. removeif Use this option to remove a virtual interface, by just giving the IP address assigned to the interface. You do not need to know the virtual interface number.
8-13
Configuring IPMP
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-14
Configuring IPMP
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# ifconfig qfe1 vincent group therapy netmask + broadcast + up # ifconfig qfe1 addif vincent-qfe1-test -failover deprecated \ netmask + broadcast + up # ifconfig qfe2 plumb # ifconfig qfe2 vincent-qfe2-test group therapy -failover deprecated \ netmask + broadcast + up # ifconfig -a . qfe1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 172.20.4.192 netmask ffffff00 broadcast 172.20.4.255 groupname therapy ether 8:0:20:f1:2b:d qfe1:1:flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAI LOVER> mtu 1500 index 2 inet 172.20.4.194 netmask ffffff00 broadcast 172.20.4.255 qfe2:flags=9040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,IPv4,NOFAILO VER> mtu 1500 index 3 inet 172.20.4.195 netmask ffffff00 broadcast 172.20.4.255 groupname therapy ether 8:0:20:f1:2b:e
8-15
Configuring IPMP
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-16
Configuring IPMP
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Generally, you do not need to edit the default /etc/default/mpathd conguration le. The three settings you can alter in this le are:
q
FAILURE_DETECTION_TIME You can lower the value of this parameter. If the load on the network is too great, the system cannot meet the failure detection time value. Then the in.mpathd daemon prints a message on the console, indicating that the time cannot be met. It also prints the time that it can meet currently. If the response comes back correctly, the in.mpathd daemon meets the failure detection time provided in this le. FAILBACK After a failover, failbacks take place when the failed adapter is repaired. However, the in.mpathd daemon does not fail back the addresses if the FAILBACK option is set to no. TRACK_INTERFACES_ONLY_WITH_GROUPS In stand-alone servers, you can set this to no so that in.mpathd monitors trafc even on adapters not in groups. In the cluster environment make sure you leave the value yes so that private transport adapters are not probed.
8-17
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-19
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-20
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Store cluster-wide status of IPMP groups on each node in the CCR, and enable retrieval of this status with the clnode status -m command Facilitate application failover in the case where all members of an IPMP group on one node have failed, but the corresponding group on the same subnet on another node has a healthy adapter
Clearly, IPMP itself is unaware of any of these cluster requirements. Instead, the Sun Cluster 3.2 software uses a cluster-specic public network management daemon (pnmd) to perform this cluster integration.
Populate CCR with public network adapter status Facilitate application failover
When pnmd detects that all members of a local IPMP group have failed, it consults a le named /var/cluster/run/pnm_callbacks. This le contains entries that would have been created by the activation of LogicalHostname and SharedAddress resources. (There is more information about this in Module 9, Introducing Data Services, Resource Groups, and HA-NFS, and Module 10, Conguring Scalable Services and Advanced Resource Group Relationships.)
8-21
Integrating IPMP Into the Sun Cluster 3.2 Software Environment It is the job of the hafoip_ipmp_callback, in the following example, to decide whether to migrate resources to another node. # cat /var/cluster/run/pnm_callbacks therapy orangecat-nfs.mon \ /usr/cluster/lib/rgm/rt/hafoip/hafoip_ipmp_callback mon nfs-rg orangecatnfs
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
in.mpathd
Figure 8-5
8-22
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-23
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Verifying the local-mac-address? Variable Task 2 Verifying the Adapters for the IPMP Group Task 3 Verifying or Entering Test Addresses in the /etc/hosts File Task 4 Creating /etc/hostname.xxx Files Task 5 Rebooting and Verifying That IPMP Is Congured Task 6 Verifying IPMP Failover and Failback
Preparation
No preparation is required for this exercise.
8-24
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2.
8-25
2.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Enter the IP addresses in /etc/hosts on each node if they are not already there. The following is only an example. It really does not matter if you use the same names as another group working on another cluster if the IP addresses are different: # IPMP TEST ADDRESSES 172.20.4.194 vincent-qfe1-test 172.20.4.195 vincent-qfe2-test 172.20.4.197 theo-qfe1-test 172.20.4.198 theo-qfe2-test
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
8-27
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Module 9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Describe how data service agents enable a data service in a cluster to operate properly List the components of a data service agent Describe data service packaging, installation, and registration Describe the primary purpose of resource groups Differentiate between failover and scalable data services Describe how to use special resource types List the components of a resource group Differentiate between standard, extension, and resource group properties Register resource types Congure resource groups and resources Perform manual control and switching of resources and resource groups
q q q q q q q
q q q
9-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q q q
!
?
What is a data service agent for the Sun Cluster 3.2 software? What is involved in fault monitoring a data service? What is the implication of having multiple applications in the same failover resource group? How do you distinguish the resources representing different instances of the same application? What are the specic requirements for setting up NFS in the Sun Cluster 3.2 software environment?
9-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971 Sun Cluster Software Installation Guide for Solaris OS, part number 8192970 Sun Cluster Concepts Guide for Solaris OS, part number 819-2969
9-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Off-the-Shelf Application
For most applications supported in the Sun Cluster software environment, software customization is not required to enable the application to run correctly in the cluster. A Sun Cluster 3.2 software agent is provided to enable the data service to run correctly in the cluster environment.
9-4
Application Requirements Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You should identify requirements for all of the data services before you begin Solaris OS and Sun Cluster software installation. Failure to do so might result in installation errors that require you to completely reinstall the Solaris OS and Sun Cluster software.
The local disks of each cluster node Placing the software and conguration les on the individual cluster nodes lets you upgrade application software later without shutting down the service. The disadvantage is that you have several copies of the software and conguration les to maintain and administer. If you are running applications in non-global zones, you might install the application only in the local storage visible to the nonglobal zones in which you intend to run it.
A global le system or failover le system in the shared storage If you put the application binaries on a shared le system, you have only one copy to maintain and manage. However, you must shut down the data service in the entire cluster to upgrade the application software. If you can spare a small amount of downtime for upgrades, place a single copy of the application and conguration les on a shared global or failover le system.
9-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
fault monitors
Figure 9-1
9-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Methods to start and stop the service in the cluster Fault monitors for the service Methods to start and stop the fault monitoring Methods to validate conguration of the service in the cluster A registration information le that allows the Sun Cluster software to store all the information about the methods into the CCR You then only need to reference a resource type to refer to all the components of the agent.
Sun Cluster software provides an API that can be called from shell programs, and an API that can be called from C or C++ programs. Most of the program components of data service methods that are supported by Sun products are actually compiled C and C++ programs.
The daemons, by placing them under control of the process monitoring facility (rpc.pmfd). This facility calls action scripts if data service daemons stop unexpectedly. The service, by using client commands.
Note Data service fault monitors do not need to monitor the health of the public network itself because this is already done by the combination of in.mpathd and pnmd, as described in Module 8, Managing the Public Network With IPMP.
9-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Resource
Resource is a
is a
Resource Type
Resource Type
Figure 9-2
9-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Resources
In the context of clusters, the word resource refers to any element above the layer of the cluster framework that can be turned on or off, and can be monitored in the cluster. The most obvious example of a resource is an instance of a running data service. For example, an Apache web server, with a single httpd.conf le, counts as one resource. Each resource is an instance of a specic type. For example, the type for Apache web server is SUNW.apache. Other types of resources represent IP addresses and storage that are required by the cluster. A particular resource is identied by the following:
q q q
Its type, though the type is not unique for each instance A unique name, which is used as input and output within utilities A set of properties, which are parameters that dene a particular resource
You can congure multiple resources of the same type in the cluster, either in the same or different resource groups. For example, you might want to run two failover Apache web server application services that reside, by default, on different nodes in the cluster, but could still fail over to the same node if there were only one node available. In this case, you have two resources, both of type SUNW.apache, in two different resource groups.
Resource Groups
Resource groups are collections of resources. Resource groups are either failover or scalable.
9-10
Failover Resource Groups Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
For failover applications in the cluster, the resource group becomes the unit of failover. That is, the resource group is the collection of services that always run together on one node of the cluster at one time, and simultaneously fail over or switch over to another node or non-global zone.
9-11
Number of Data Services in a Resource Group Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You can put multiple data services in the same resource group. For example, you can run two Sun Java System Web Server application services, a Sun Java System Directory Server software identity management service, and a Sybase database as four separate resources in the same failover resource group. These applications always run on the same node at the same time, and always simultaneously migrate to another node. Or you can put all four of the preceding services in separate resource groups. The services could still run on the same node, but would fail over and switch over independently.
9-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Figure 9-3
The previous failover resource group has been dened with a unique name in the cluster. Congured resource types are placed into the empty resource group. These are known as resources. By placing these resources into the same failover resource group, they fail over together.
9-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-14
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Global raw devices Global le systems Traditional (non-ZFS) failover le systems Failover ZFS zpools, including all le systems within
You use the START method of SUNW.HAStoragePlus to check if the global devices or le systems in question are accessible from the node where the resource group is going online. You almost always use the Resource_dependencies standard property to place a dependency so that the real application resources depend on the SUNW.HAStoragePlus resource. In this way, the resource group manager does not try to start services if the storage the services depend on is not available. You can optionally still set the AffinityOn resource property to True, so that the SUNW.HAStoragePlus resource type attempts colocation of resource groups and device groups on the same node, thus enhancing the performance of disk-intensive data services.
9-16
Using Special Resource Types The conguration of a global or failover le system look very similar. Both use the FilesystemMountPoints property to point to the le system or le systems in question. The SUNW.HAStoragePlus methods distinguish between global and failover le systems by the /etc/vfstab entry.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A scalable service A failover service that must fail over to a node not physically connected to the storage A single le system that contains data for different failover services that may be running on different nodes. Access to the data from a node that is not currently running the service
If your storage is physically connected to all possible nodes for the resource group containing the HAStoragePlus, you can still use AffinityOn=true to migrate the underlying ownership storage along with the service, if that makes sense, as a performance benet.
The le system is for a failover service only. The Nodelist for the resource group contains only nodes that are physically connected to the storage; that is, if the Nodelist for the resource groups is the same as the Nodelist for the device group. Only services in a single resource group are using the le system.
If these conditions are true, a failover le system provides a higher level of performance than a global le system, especially for services that are le system intensive.
9-17
HAStoragePlus Resources in Zones Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
When a traditional (non-ZFS) le system that is controlled by an HAStoragePlus resource is contained in a resource group that is mastered by a non-global zone, the le system is still always mounted in the appropriate global zone, and then made available to the non-global zone through a loopback mount.
9-18
Using Special Resource Types Many Sun-supported data services (for example, DHCP, Samba, and many more) do not actually supply new resource types. Rather, they supply conguration scripts that congure resources to represent the applications as instances of SUNW.gds. In the lab exercises for this module, you will get an opportunity to use an instance of the SUNW.gds type to put your own customized application under control of the Sun Cluster software.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-19
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Resource B must be started rst (without the dependency, RGM might have been able to start them in parallel). Resource A must be stopped rst (without the dependency, RGM might have been able to stop them in parallel). The rgmd daemon will not try to start resource A if resource B fails to go online, and will not try to stop resource B if resource A fails to be stopped. You are allowed to specically, manually, disable resource B, even if resource A is still running. This is somewhat counterintuitive, but because you are explicitly issuing the command (clrs disable) to disable the dependee, you can do it. If A and B are in different resource groups, you are not allowed to put the resource group containing B ofine if resource A is still online.
9-20
Restart Dependencies (Stronger Than Regular Dependencies) Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Restart dependencies have all the attributes of regular dependencies, with the additional attribute that if the RGM is told to restart resource B, or is informed that some agent did the restart of resource B itself, the RGM will automatically restart resource A. Similar to the case of regular dependencies, you are allowed to manually, specically disable the dependee, resource B, while leaving resource A running. However, when you manually reenable resource B, the restart dependency begins, and the RGM restarts resource A
9-21
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-22
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-23
The Failover_mode Property Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The Failover_mode property describes what should happen to a resource if the resource fails to start up or shut down properly. Table 9-1 describes how values of the Failover_mode property work. Table 9-1 The Failover_mode Value Operation Can Fault Monitor Cause RGM to Fail RG Over? Yes Can Fault Monitor Cause RGM to Restart the Resource? Yes
Failure to Start
Failure to Stop
NONE
Other resources in the same resource group can still start (if nondependent). The whole resource group is switched to another node. The whole resource group is switched to another node. Other resources in the same resource group can still start (if nondependent). Other resources in the same resource group can still start (if non-dependent).
The STOP_FAILED ag is set on the resource. The STOP_FAILED ag is set on the resource. The node reboots. The STOP_FAILED ag is set on the resource. The STOP_FAILED ag is set on the resource.
SOFT
Yes
Yes
HARD
Yes
Yes
RESTART_ONLY
No
Yes
LOG_ONLY
No
No
If the STOP_FAILED ag is set, it must be manually cleared by using the clrs clear command before the service can start again. # clrs clear -n node -f STOP_FAILED resourcename
9-24
Extension Properties
Names of extension properties are specic to resources of a particular type. You can get information about extension properties from the man page for a specic resource type. For example: # man SUNW.apache # man SUNW.HAStoragePlus If you are setting up the Apache web server, for example, you must create a value for the Bin_dir property, which points to the directory containing the apachectl script that you want to use to start the web server. The HAStoragePlus type has the following important extension properties regarding lesystems. FilesystemMountPoints=list_of_storage_mount_points AffinityOn=True/False -------------------Zpools=list_of_zpools Use the rst of these extension properties to identify which storage resource is being described. The second extension property is a parameter that tells the cluster framework to switch physical control of the storage group to the node running the service. Switching control to the node that is running the service optimizes performance when services in a single failover resource group are the only services accessing the storage. The third property (Zpools) is used only for specifying pool names for a failover ZFS resource, in which case the rst two properties are not used. Many resource types (including LDAP and DNS) have an extension property called Confdir_list that points to the conguration. Confdir_list=/global/dnsconflocation Many have other ways of identifying their conguration and data. There is no hard-and-fast rule about extension properties.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-25
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-26
The Pingpong_interval Property Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
This property controls resource group behavior in a couple of different ways:
q
If a resource group fails to start twice on the same particular node or zone (failure of START methods of same or different resources in the group) within the interval (expressed in seconds), then the RGM does not consider that node or zone a candidate for the group failover. If the fault monitor for one particular resource requests that a group be failed off a particular node or zone, and then the fault monitor for the same resource requests another failover that would bring the group back to the original node or zone, the RGM rejects the second failover if it is within the interval.
Note The Pingpong_interval property is meant to prohibit faulty start scripts or properties and faulty fault monitors, or problem applications, from causing endless ping-ponging between nodes or zones.
9-27
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
When specifying or modifying the node list for a resource type When specifying or modifying the node list for a resource group When switching a resource group to a particular node When enabling or disabling a resource or resource ag on a particular node
The commands to perform these operations are presented in the next few sections, but there are some general rules in using zone names:
q
The form nodename:zonename is used to specify a zone on a particular node. You can easily give zones on different nodes the same or different zone names. They are considered to be different virtual nodes by the cluster either way. The syntax -n nodename -z zonename is identical to -n nodename:zonename. The syntax -n node1,node2,... -z zonename is a shorthand for -n node1:zonename,node2:zonename,... In other words, it is a shorthand you can choose when specifying multiple zones on different nodes if you happen to have chosen to use the same zone name.
9-28
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Yo
9-29
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
vincent:/# clrt show apache Registered Resource Types === Resource Type: RT_description: RT_version: API_version: RT_basedir: Single_instance: Proxy: Init_nodes: Installed_nodes: Failover: Pkglist: RT_system: Global_zone: SUNW.apache:4 Apache Web Server on Sun Cluster 4 2 /opt/SUNWscapc/bin False False All potential masters vincent theo False SUNWscapc False False
Note The Failover (False) and Global_zone (False) mean that the resource type is not limited to being failover only (that is, it can be failover or scalable) and is not limited to the global zone only.
Unregistering Types
You can unregister only unused types (for which there are no resource instances): # clrt unregister nfs
9-30
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If you omit the node list when creating a group, it defaults to all physical nodes of the cluster, and none of the non-global zones.
9-31
Configuring Resource Groups Using the clresourcegroup (clrg) Command Nodelist: # clrg show ora-rg Resource Groups and Resources === Resource Group: RG_description: RG_mode: RG_state: Failback: Nodelist: # clrg show -p pathprefix nfs-rg Resource Groups and Resources === Resource Group: Pathprefix: # clrg show -v nfs-rg Resource Groups and Resources === Resource Group: RG_description: RG_mode: RG_state: RG_project_name: RG_affinities: RG_SLM_type: Auto_start_on_new_cluster: Failback: Nodelist: Maximum_primaries: Desired_primaries: RG_dependencies: Implicit_network_dependencies: Global_resources_used: Pingpong_interval: Pathprefix: RG_System: Suspend_automatic_recovery: nfs-rg <NULL> Failover Unmanaged default <NULL> manual True False vincent theo 1 1 <NULL> True <All> 3600 /global/nfs/admin False False nfs-rg /global/nfs/admin ora-rg <NULL> Failover Unmanaged False vincent:frozone theo:frozone vincent theo
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-32
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Hostnamelist: List of IPs (on a single subnet) corresponding to this resource Multiple logical IP resources on different subnets must be separate resources, although they can still be in the same resource group. Each name in Hostnamelist can be an IPV4 address (associated with an address in the /etc/hosts le), an IPV6 address (associated with an IPV6 address in the /etc/inet/ipnodes le), or both (associated with both an IPV4 and an IPV6 address in the /etc/inet/ipnodes) le. If you do not specify the Hostnamelist, these commands assume that the resource name is also the value of Hostnamelist.
NetIfList: Indication of which adapters to use on each node, in the format: ipmp_grp@node_id,ipmp_grp@node_id,.. If you do not specify NetIfList, the commands try to gure out if there is only one IPMP group per node on the subnet indicated by Hostnamelist. If so, the value of NetIfList is automatically derived. When you use the -N shorthand of the clrslh command you can use the node names, but the actual value of the property will contain the node ids.
9-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-34
Configuring Other Resources Using the clresource (clrs) Command # clrs show -p resource_dependencies nfs-res
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Resources === Resource: Resource_dependencies: --- Standard and extension properties --nfs-res nfs-stor
9-35
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-36
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
at_creation You can only set the property as you do the clrs create command to add the resource. when_disabled You can change with the clrs set command if the resource is disabled. anytime You can change anytime with the clrs set command.
9-37
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-38
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Online an ofine resource group onto its preferred node. With -M, it will bring the group into the managed state rst if it is unmanaged. Without -M, the group must already be in the managed state. With -e, all disabled resources are enabled. Without -e, the enabled/disabled state of each resource is preserved. # clrg online [-M] [-e] nfs-rg Switch a failover resource group to a specic node. (It might be online already on another node, or ofine.) With -M, it will bring the group into the managed state rst if it is unmanaged. Without -M, the group must already be in the managed state. With -e, all disabled resources are enabled. Without -e, the enabled/disabled state of each resource is preserved. # clrg switch [-M] [-e] -n node nfs-rg Ofine a resource group. The group remains in the managed state, so it is still subject to automatic recovery: # clrg offline nfs-rg Restart a resource group: # clrg restart nfs-rg
Resource Operations
Use these commands to do the following:
q
Disable a resource and its fault monitor: # clrs disable nfs-res Enable a resource and its fault monitor: # clrs enable nfs-res
Note The enable/disable is a ag that persists or can be set even when the resource group containing the resource is ofine or unmanaged.
q
9-39
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Disable the fault monitor for a resource, but leave the resource enabled: # clrs unmonitor nfs-res Reenable fault monitor for a resource (resource itself already enabled): # clrs monitor nfs-res
9-40
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The -M switch for clrg switch or clrg online manages a group and then brings it online. By default, all transitions preserve the state of the enabled/disabled ags (for the service and its fault monitoring). You can force all resources to enabled by adding a -e option to clrg switch/online.
res1: disabled/enabled res2: disabled/enabled clrs disable/enable res clrg offline rg (enabled/disabled state preserved)
clrg switch -n node rg clrg online rg (brings it on preferred node) (enabled/disabled state of resource preserved) Group: Offline No resources running
Resources may be enabled or disabled clrs disable/enable res (affects whether they will run when group switched on) clrg manage rg
Group: Unmanaged
Figure 9-4
9-41
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
You can still transition the resource group any way you like using the commands presented on the previous pages. You can still enable/disable any individual resources, using the commands presented on the previous pages. If the resource group is online, the resources will go on and off accordingly. The fault monitors for resources will still be started. Resources will not automatically be restarted by fault monitors, nor will entire groups automatically fail over, even if an entire node fails.
q q
The reason you might want to suspend an online resource group is to perform maintenance on itthat is, start and stop some applications manually, but while preserving the online status of the group and other components, so that dependencies can still be honored correctly. The reason you might suspend an ofine resource group is so that it does not go online automatically when you did not intend it to do so. For example, when you put a resource group ofine (but it is still managed and not suspended), a node failure still causes the group to go online. To suspend a group, type: # clrg suspend grpname To remove the suspension of a group, type: # clrg resume grpname To see whether a group, is currently suspended, use clrg status , as demonstrated on the next page.
9-42
Displaying Resource and Resource Group Status Using the clrg status and clrs
Displaying Resource and Resource Group Status Using the clrg status and clrs status Commands
There are separate commands to show the status of resource groups and resources. When viewing resources, you can still use -g grpname to limit the output to a single group.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# clrs status -g nfs-rg Cluster Resources === Resource Name ------------nfs-stor Node Name --------vincent theo vincent theo vincent theo State ----Offline Online Offline Online Offline Online Status Message -------------Offline Online Offline Online - LogicalHostname online. Offline Online - Service is online.
orangecat-nfs
nfs-res
9-43
Using the clsetup Utility for Resource and Resource Group Operations
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Using the clsetup Utility for Resource and Resource Group Operations
The clsetup utility has an extensive set of menus pertaining to resource and resource group management. These menus are accessed by choosing Option 2 (Resource Group) from the main menu of the clsetup utility. The clsetup utility is an intuitive, menu-driven interface that guides you through the options without having to remember the exact command-line syntax. The clsetup utility calls the clrg, clrs, clrslh, and clrssa commands that have been described in previous sections of this module. The Resource Group menu for the clsetup utility looks similar to the following: *** Resource Group Menu *** Please select from one of the following options: 1) 2) 3) 4) 5) 6) 7) 8) 9) 10) 11) 12) Create a resource group Add a network resource to a resource group Add a data service resource to a resource group Resource type registration Online/Offline or Switchover a resource group Suspend/Resume recovery for a resource group Enable/Disable a resource Change properties of a resource group Change properties of a resource Remove a resource from a resource group Remove a resource group Clear the stop_failed error flag from a resource
Option:
9-44
Using the Data Service Wizards in clsetup and Sun Cluster Manager
Using the Data Service Wizards in clsetup and Sun Cluster Manager
The data service wizards, new to Sun Cluster 3.2, are yet another way to congure application resource groups and resources. These wizards guide you through the entire task of integrating some of the popular applications into the cluster. They are available through the Tasks item in Sun Cluster Manager and the Data Services submenu of clsetup: *** Data Services Menu *** Please select from one of the following options: 1) 2) 3) 4) 5) 6) 7) 8) Apache Web Server Oracle NFS Oracle Real Application Clusters SAP Web Application Server Highly Available Storage Logical Hostname Shared Address
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
?) Help q) Return to the Main Menu Option: The wizards are intended as an alternative to the Resource Group menu items presented on the previous page. The NFS wizard, for example, can create and online the NFS resource group and resources, just like the Resource Group menu items could. It can also create the vfstab entry for your storage (it will not actually build your volume or le sytstem), and provision the proper share command for you. The idea is that it can guide you through the task in a wizard-like fashion, without even mentioning the terminology of resources and resource groups.
9-45
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Installing and Conguring the HA-NFS Agent and Server Task 2 Registering and Conguring the Sun Cluster HA-NFS Data Services Task 3 Verifying Access by NFS Clients Task 4 Observing Sun Cluster HA-NFS Failover Behavior Task 5 Generating Cluster Failures and Observing Behavior of the NFS Failover Task 6 Conguring NFS to Use a Failover File System Task 7 Putting Failover Application Data in a ZFS File System Task 8 Making a Customized Application Fail Over With a Generic Data Service Resource Task 9 Viewing and Managing Resources and Resource Groups Using Sun Cluster Manager
q q q
q q q
Preparation
The following tasks are explained in this section:
q
Preparing to register and congure the Sun Cluster HA for NFS data service Registering and conguring the Sun Cluster HA for NFS data service Verifying access by NFS clients Observing Sun Cluster HA for NFS failover behavior
q q q
Note During this exercise, when you see italicized terms, such as IPaddress, enclosure_name, node1, or clustername, embedded in a command string, substitute the names appropriate for your cluster.
9-46
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note This should speed up HA-NFS switchovers and failovers. If your client and server are both running Solaris 10 OS, you will be using NFS Version 4 by default. This is a stateful protocol which will intentionally delay resumption of NFS activity so that clients have a chance to reclaim their state any time the server is recovering (which will include any cluster switchover or failover). The GRACE_PERIOD controls the length of the delay. 3. Verify that the /global/nfs le system is mounted and ready for use. # df -k
9-47
Exercise: Installing and Configuring HA-NFS 4. Add an entry to the /etc/hosts le on each cluster node and on the administrative workstation for the logical host name resource clustername-nfs. Substitute the IP address supplied by your instructor. IP_address clustername-nfs
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note In the RLDC, the /etc/hosts le on the vnchost already contains the appropriate entry for each cluster. Verify that the entry for your cluster exists on the vnchost, and use the same IP address to create the entry on your cluster nodes. Perform the remaining steps on just one node of the cluster. 5. Create the administrative directory that contains the dfstab.nfsres le for the NFS resource. # # # # 6. cd /global/nfs mkdir admin cd admin mkdir SUNW.nfs
Create the dfstab.nfs-res le in the /global/nfs/admin/SUNW.nfs directory. Add the entry to share the /global/nfs/data directory. # cd SUNW.nfs # vi dfstab.nfs-res share -F nfs -o rw /global/nfs/data
7.
Create the directory specied in the dfstab.nfs-res le. # # # # cd /global/nfs mkdir /global/nfs/data chmod 777 /global/nfs/data touch /global/nfs/data/sample.file
Note You are changing the mode of the data directory only for the purposes of this lab. In practice, you would be more specic about the share options in the dfstab.nfs-res le.
9-48
Task 2 Registering and Configuring the Sun Cluster HA-NFS Data Services
Perform the following steps on one node only to congure HA-NFS in the cluster: 1. Register the NFS and SUNW.HAStoragePlus resource types. # clrt register SUNW.nfs # clrt register SUNW.HAStoragePlus # clrt list -v 2. Create the failover resource group. Note that on a three-node cluster the node list (specied with -n) can include nodes not physically connected to the storage if you are using a global le system. # clrg create -n node1,node2,[node3] \ -p Pathprefix=/global/nfs/admin nfs-rg 3. 4. Create the logical host name resource to the resource group. # clrslh create -g nfs-rg clustername-nfs Create the SUNW.HAStoragePlus resource. If all of your nodes are connected to the storage (two-node cluster, for example), you should set the value of AffinityOn to true. If you have a third, nonstorage node, you should set the value of AffinityOn to false. # clrs create -t HAStoragePlus -g nfs-rg \ -p AffinityOn=[true|false] \ -p FilesystemMountPoints=/global/nfs nfs-stor 5. Create the SUNW.nfs resource. # clrs create -t nfs -g nfs-rg \ -p Resource_dependencies=nfs-stor nfs-res 6. Bring the resource group to a managed state and then online. # clrg online -M nfs-rg
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note The very rst time you do this you may see a console warning message concerning lockd: cannot contact statd. Both daemons are restarted (successfully) and you can ignore the message. 7. Verify that the data service is online. # clrs list -v # clrs status # clrg status
9-49
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-50
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-51
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 5 Generating Cluster Failures and Observing Behavior of the NFS Failover
Generate failures in your cluster to observe the recovery features of Sun Cluster 3.2 software. If you have physical access to the cluster, you can physically pull out network cables or power down a node. If you do not have physical access to the cluster, you can still bring a node to the ok prompt using a break signal through your terminal concentrator. Try to generate the following failures:
q q
Node failure (power down a node or bring it to the ok prompt) Single public network interface failure (pull a public network cable , or sabotage the adapter using ifconfig modinsert as you did in Module 8) Multiple public network failure on a single node
Try your tests while the resource group is in its normal, nonsuspended state. Repeat some tests after suspending the resource group: # clrg suspend nfs-rg When you are satised with your results, remove the suspension: # clrg resume nfs-rg
9-52
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Unmount the global NFS le system. # umount /global/nfs On each node, edit the /etc/vfstab le to make /global/nfs a local le system (if you have a non-storage node, you can either make this edit, or just remove the line completely): a. b. Change yes to no in the mount-at-boot column. Remove the word global from the mount options. Replace it with a minus sign (-). Make sure you still have seven elds on that line.
5. 6.
Restart the NFS resource group. # clrg online nfs-rg Observe the le system behavior. The le system should be mounted only on the node running the NFS service and should fail over appropriately.
Note Ordinarily, just by convention, you wouldnt have a failover le system mounted under a subdirectory of /global. However, it is all just a convention, and keeping the same mount point simplies the ow of these exercises.
9-53
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2.
Disable your old (non-ZFS) NFS resource, and remount the ZFS under /global/nfs. If you have already done task 6 then the traditional /global/nfs is already a failover le system, and it will be unmounted by the clrs disable (so the umount is there just in case you had not done task 6). # # # # clrs disable nfs-stor clrs delete nfs-stor umount /global/nfs zfs set mountpoint=/global/nfs nfspool/nfs
9-54
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
6.
Recreate your NFS resources using the ZFS storage: # clrs create -g nfs-rg -t HAStoragePlus \ -p Zpools=nfspool nfs-stor # clrs create -g nfs-rg -t nfs \ -p Resource_depdendencies=nfs-stor nfs-res # clrg status # clrs status # df -k
7.
Observe switchover and failover behavior of the NFS application, which will now include the zpool containing your ZFS le system.
9-55
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 8 Making a Customized Application Fail Over With a Generic Data Service Resource
In this task, you can see how easy it is to get any daemon to fail over in the cluster, by using the Generic Data Service (so that you do not have to invent your own resource type). Perform the following steps on the nodes indicated: 1. On all nodes (or do it on one node and copy the le to other nodes in the same location), create a daemon that represents your customized application: # vi /var/tmp/myappdaemon #!/bin/ksh while : do sleep 10 done 2. 3. Make sure the le is executable on all nodes. From any one node, create a new failover resource group for your application: # clrg create -n node1,node2,[node3] myapp-rg 4. 5. From one node, register the Generic Data Service resource type: # clrt register SUNW.gds From one node, create the new resource and enable the group: # clrs create -g myapp-rg -t SUNW.gds \ -p Start_Command=/var/tmp/myappdaemon \ -p Probe_Command=/bin/true -p Network_aware=false \ myapp-res # clrg online -M myapp-rg 6. Verify the behavior of your customized application. a. b. Verify that you can manually switch the group from node to node. Kill the daemon. Wait a little while and note that it restarts on the same node. Wait until clrs status shows that the resource is fully online again. Repeat step b a few times. Eventually, the group switches over to the other node.
c.
Task 9 Viewing and Managing Resources and Resource Groups Using Sun Cluster Manager
Perform the following steps on your administration workstation: 1. In a Web browser, log in to Sun Java Web Console on any cluster node: https://nodename:6789 2. 3. 4. 5. 6. Log in as root and enter the Sun Cluster Manager Application. Click the Resource Groups folder on the left. Investigate the status information and graphical topology information that you can see regarding resource groups. Go back to the Status Panel for Resource Groups. Select the check box next to a resource group, and use the Switch Primary button to switch the primary node for a resource group.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
9-57
Exercise Summary
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
!
?
9-58
Module 10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Describe the characteristics of scalable services Describe the function of the load-balancer Create the failover resource group for the SharedAddress resource Create the scalable resource group for the scalable service Describe how the SharedAddress resource works with scalable services Add auxiliary nodes Create these resource groups Control scalable resources and resource groups View scalable resource and group status Congure and run Apache as a scalable service in the cluster
q q q q q
10-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content of this module:
q
!
?
Web servers are supported as scalable applications. But will every application within a web server necesarily behave properly in a scalable environment? Why might you want a client afnity for load balancing? What reason might you have to have separate resource groups with a requirement that group A run only on a node also running group B? Why not just combine them into one group?
q q
10-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Sun Cluster System Administration Guide for Solaris OS, part number 819-2971 Sun Cluster Software Installation Guide for Solaris OS, part number 8192970 Sun Cluster Concepts Guide for Solaris OS, part number 819-2969
10-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Client 1 Requests Network Node 1 Global Interface xyz.com Request Distribution Transport
Client 2 Requests
Client 3
Web Page
Node 2
Node 3
HTTP Application
HTTP Application
HTTP Application
<HTML> </HTML>
<HTML> </HTML>
<HTML> </HTML>
10-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Client Affinity
Certain types of applications require that load balancing in a scalable service be on a per client basis, rather than a per connection basis. That is, they require that the same client IP always have its requests forwarded to the same node or zone. The prototypical example of an application requiring such client afnity is a shopping cart application, where the state of the clients shopping cart is recorded only in the memory of the particular node where the cart was created. A single SharedAddress resource can provide standard (per connection load balancing) and the type of sticky load balancing required by shopping carts. A scalable services agent registers which type of load balancing is required, based on a property of the data service.
Load-Balancing Weights
Sun Cluster 3.2 software also lets you control, on a scalable resource by scalable resource basis, the weighting that should be applied for load balancing. The default weighting is equal (connections or clients, depending on the stickiness) per node or zone, but through the use of properties described in the following sections, a better node or zone can be made to handle a greater percentage of requests. Note You could have a single SharedAddress resource that provides load balancing for the same application using multiple IP addresses on the same subnet. You could have multiple SharedAddress resources that provide load balancing for the same application using IP addresses on different subnets. The load balancing properties discussed on this page are set per scalable application, not per address.
10-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
SUNW.HAStoragePlus Resource
SUNW.SharedAddress Resource
Resource_dependencies
SUNW.apache Resource
10-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Resource group name: sa-rg Properties:[Nodelist=vincent,theo Mode=Failover Failback=False...] Resource group contents: See Table 10-1.
Table 10-1 sa-rg Resource Group Contents Resource Name apache-lh Resource Type SUNW.SharedAddress Properties HostnameList=apache-lh Netiflist=therapy@1,therapy@2
q q
Resource group name: apache-rg Properties: [Nodelist=vincent,theo Mode=Scalable Desired_primaries=2 Maximum_primaries=2] Resource group contents: See Table 10-2.
Table 10-2 apache-rg Resource Group Contents Resource Name web-stor Resource Type SUNW.HAStoragePlus Properties FilesystemMountPoints=/global/web AffinityOn=False apache-res SUNW.apache Bin_dir=/global/web/bin Load_balancing_policy=LB_WEIGHTED Load_balancing_weights=3@1,1@2 Scalable=TRUE Port_list=80/tcp Resource_dependencies=web-stor, apache-lh
10-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Lb_weighted Client connections are all load balanced. This is the default. Repeated connections from the same client might be serviced by different nodes. Lb_sticky Connections from the same client IP to the same server port all go to the same node. Load balancing is only for different clients. This is only for the ports listed in the Port_list property. Lb_sticky_wild Connections from the same client to any server port go to the same node. This is good when port numbers are generated dynamically and not known in advance.
10-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Online an ofine resource group onto its preferred nodes. # clrg online [-M] [-e] apache-rg Online a resource group on specic nodes. Any nodes not mentioned are not affected (if the group is running on other nodes, it just stays running there). # clrg online -n node,node [-M] [-e] apache-rg Switch a scalable resource group to the specied nodes. On any other nodes, it is switched off. # clrg switch [-M] [-e] -n node,node,.. apache-rg Ofine a resource group (it goes off of all nodes): # clrg offline [-M] [-e] apache-rg Ofine a resource group on specic nodes. Any nodes not mentioned are not affected (if the group is running on other nodes, it just stays running there). # clrg offline -n node,node [-M] [-e] apache-rg Restart a resource group: # clrg restart apache-rg Restart a resource group on a particular node or nodes: # clrg restart -n node,node,... apache-rg
10-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Resource Operations
Use the following commands to:
q
Disable a resource and its fault monitor on all nodes: # clrs disable apache-res Disable a resource and its fault monitor on specied nodes: # clrs disable -n node,node,... apache-res Enable a resource and its fault monitor on all nodes: # clrs enable apache-res Enable a resource and its fault monitor on specied nodes: # clrs enable -n node,node,... apache-res
Disable just the fault monitor on all nodes: # clrs unmonitor apache-res Disable just the fault monitor on specied nodes: # clrs unmonitor -n node,node,... apache-res Enable a fault monitor on all nodes: # clrs monitor apache-res Enable a fault monitor on specied nodes: # clrs monitor -n node,node,... apache-res
10-14
Using the clrg status and clrs status Commands for a Scalable Application
Using the clrg status and clrs status Commands for a Scalable Application
Use the status subcommands as follows to show the state of resource groups and resources for a scalable application: # clrg status Cluster Resource Groups === Group Name ---------sa-rg Node Name --------vincent theo vincent theo Suspended --------No No No No Status -----Online Offline Online Online
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
web-rg
# clrs status -g sa-rg,web-rg Cluster Resources === Resource Name ------------orangecat-web Node Name --------vincent theo vincent theo vincent theo State ----Online Offline Online Online Online Online Status Message -------------Online - SharedAddress online. Offline Online Online Online - Service is online Online - Service is online.
web-stor
apache-res
10-15
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-16
Advanced Resource Group Relationships The following will be affected by a weak afnity (if the target group is actually online):
q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Failover of the source group If the target is online, when the source group needs to fail over it will fail over to the node running the target group, even if that node is not a preferred node on the source groups node list.
Putting the resource group online onto a nonspecied node: # clrg online source-grp Similary, when a source group goes online and you do not specify a specic node, it will go onto the same node as the target, even if that node is not a preffered node on the source groups node list.
However, weak afnities are not enforced when you manually bring or switch a group onto a specic node. The following command will succeed, even if the source group has a weak afnity for a target running on a different node. # clrg switch -n specific-node src-grp There can be multiple resource groups as the value of the property. In other words, a source can have more than one target. In addition, a source can have both weak positive and weak negative afnities. In these cases, the source prefers to choose a node satisfying the greatest possible number of weak afnities. For example, select a node that satises two weak positive afnities and two weak negative afnities rather than a node that satises three weak positive afnities and no weak negative afnities.
The only node or nodes on which the source can be online are nodes on which the target is online. If the source and target are currently running on one node, and you switch the target to another node, it drags the source with it.
10-17
If you ofine the target group, it will ofine the source as well. An attempt to switch the source to a node where the target is not running will fail. If a resource in the source group fails, the source group still cannot fail over to a node where the target is not running. (See the next section for the solution to this example.)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The source and target are closely tied together. If you have two failover resource groups with a strong positive afnity relationship, it might make sense to make them just one group. So why does strong positive afnity exist?
q
The relationship can be between a failover group (source) and a scalable group (target). That is, you are saying the failover group must run on some node already running the scalable group. You might want to be able to ofine the source group but leave the target group running, which works with this relationship. For some reason, you might not be able to put some resources in the same group (the resources might check and reject you if you try to put it in the same group as another resource). But you still want them all to be running on the same node or nodes.
The only difference between the +++ and the ++ is that here (with +++), if a resource in the source group fails and its fault monitor suggests a failover, the failover can succeed. What happens is that the RGM moves the target group over to where the source wants to fail over to, and then the source is dragged correctly.
10-18
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-19
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Exercise: Installing and Conguring Sun Cluster Scalable Service for Apache
In this exercise, you complete the following tasks:
q q q
Task 1 Preparing for Apache Data Service Conguration Task 2 Conguring the Apache Environment Task 3 Testing the Server on Each Node Before Conguring the Data Service Resources Task 4 Registering and Conguring the Sun Cluster Apache Data Service Task 5 Verifying Apache Web Server Access Task 6 Observing Cluster Failures Task 7 Conguring Advanced Resource Group Relationships
q q q
Preparation
The following tasks are explained in this section:
q
Preparing for Sun Cluster HA for Apache registration and conguration Registering and conguring the Sun Cluster HA for Apache data service Verifying Apache web server access and scalable capability
Note During this exercise, when you see italicized names, such as IPaddress, enclosure_name, node1, or clustername embedded in a command string, substitute the names appropriate for your cluster.
10-20
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note In the RLDC, the vnchost already contains the appropriate entry for each cluster. Verify that the entry for your cluster exists on the vnchost, and use the same IP address to create the entry on your cluster nodes. In a non-RLDC environment, create the /etc/hosts entry on your administrative workstation as well.
10-21
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache 2. Copy the sample /etc/apache2/httpd.conf-example to /global/web/conf/httpd.conf.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# mkdir /global/web/conf # cp /etc/apache2/httpd.conf-example /global/web/conf/httpd.conf 3. Edit the /global/web/conf/httpd.conf le, and change the following entries, as shown in Table 10-3. The changes are shown in their order in the le, so you can search for the rst place to change, change it, then search for the next, and so on.
Table 10-3 Entries in the /global/web/conf/httpd.conf File Old Entry KeepAlive On Listen 80 ServerName 127.0.0.1 DocumentRoot "/var/apache2/htdocs" <Directory "/var/apache2/htdocs"> ScriptAlias /cgi-bin/ "/var/apache2/cgi-bin/" [one line] <Directory "/var/apache2/cgi-bin"> 4. New Entry KeepAlive Off Listen clustername-web:80 ServerName clustername-web DocumentRoot "/global/web/htdocs" <Directory "/global/web/htdocs"> ScriptAlias /cgi-bin/ "/global/web/cgi-bin/" [one line] <Directory "/global/web/cgi-bin">
Create directories for the Hypertext Markup Language (HTML) and CGI files and populate with the sample files. # cp -rp /var/apache2/htdocs /global/web # cp -rp /var/apache2/cgi-bin /global/web
5.
Copy the le called test-apache.cgi from the classroom server to /global/web/cgi-bin. You use this le to test the scalable service. Make sure that test-apache.cgi can be executed by all users. # chmod 755 /global/web/cgi-bin/test-apache.cgi
10-22
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache
Task 3 Testing the Server on Each Node Before Configuring the Data Service Resources
Perform the following steps. Repeat the steps on each cluster node (one at a time). 1. 2. 3. Temporarily congure the logical shared address (on one node). Start the server (on that node). Verify that the server is running. ? start ? start ? start ? start ? start ? start 0:00 /usr/apache2/bin/httpd 0:00 /usr/apache2/bin/httpd 0:00 /usr/apache2/bin/httpd 0:01 /usr/apache2/bin/httpd 0:00 /usr/apache2/bin/httpd 0:00 /usr/apache2/bin/httpd # ifconfig pubnet_adapter addif clustername-web netmask + broadcast + up # /global/web/bin/apachectl start vincent:/# ps -ef|grep apache2 webservd 4604 4601 0 10:20:05 -f /global/web/conf/httpd.conf -k webservd 4603 4601 0 10:20:05 -f /global/web/conf/httpd.conf -k webservd 4605 4601 0 10:20:05 -f /global/web/conf/httpd.conf -k root 4601 1 0 10:20:04 -f /global/web/conf/httpd.conf -k webservd 4606 4601 0 10:20:05 -f /global/web/conf/httpd.conf -k webservd 4602 4601 0 10:20:05 -f /global/web/conf/httpd.conf -k
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-23
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache 4. Connect to the server from the web browser on your administration or display station. Use http://clustername-web (Figure 10-3).
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If you cannot connect, you might need to disable proxies or set a proxy exception in your web browser. 5. 6. Stop the Apache web server. Verify that the server has stopped. 0:00 grep apache2
# /global/web/bin/apachectl stop # ps -ef | grep apache2 root 8394 8393 0 17:11:14 pts/6 7.
10-24
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache
Task 4 Registering and Configuring the Sun Cluster Apache Data Service
Perform the following steps only on (any) one node of the cluster: 1. 2. Register the resource type required for the Apache data service. # clrt register SUNW.apache Create a failover resource group for the shared address resource. If you have more than two nodes, you can include more than two nodes after the -n. # clrg create -n node1,node2 sa-rg 3. Create the SharedAddress logical host name resource to the resource group. # clrssa create -g sa-rg clustername-web 4. 5. Bring the failover resource group online. # clrg online -M sa-rg Create a scalable resource group to run on all nodes of the cluster. (The example assumes two nodes.)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# clrg create -n node1,node2 -p Maximum_primaries=2 \ -p Desired_primaries=2 web-rg 6. Create a storage resource in the scalable group. # clrs create -g web-rg -t SUNW.HAStoragePlus \ -p FilesystemMountPoints=/global/web -p AffinityOn=false web-stor 7. Create an application resource in the scalable resource group.
# clrs create -g web-rg -t SUNW.apache -p Bin_dir=/global/web/bin \ -p Scalable=TRUE -p Resource_dependencies=clustername-web,web-stor \ apache-res 8. Bring the scalable resource group online: # clrg online -M web-rg 9. # clrg status # clrs status Verify that the data service is online.
10-25
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
2.
10-26
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
web-rg
Set a weak positive afnity so that myapp-rg (source) always prefers to run on the same node as nfs-rg (target). # clrg set -p RG_affinities=+nfs-rg myapp-rg Print out the value of the Nodelist for myapp-rg. # clrg show -p Nodelist myapp-rg Switch your nfs-rg so that it is not on the preferred node of myapprg: # clrg switch -n nonpreferrednode nfs-rg
8. 9.
10-27
Exercise: Installing and Configuring Sun Cluster Scalable Service for Apache 10. Put myapp-rg online without specifying the node. Where does it end up? Why does it not end up on the rst node of its own node list? # clrg online myapp-rg # clrg status 11. Switch the myapp-rg so it is no longer on the same node as nfs-rg. Can you do it? Is a weak afnity a preference or a requirement? # clrg switch -n othernode myapp-rg # clrg status 12. Switch the myapp-rg ofine and now change the afnity to a strong positive afnity with delegation. What is the difference between ++ and +++ (use +++)? The answer lies a few steps further on. # clrg offline myapp-rg # clrg set -p RG_affinities=+++nfs-rg myapp-rg 13. Put myapp-rg online without specifying the node. Where does it end up? # clrg online myapp-rg # clrg status 14. Switch the myapp-rg so it is no longer on the same node as nfs-rg. Can you do it? # clrg switch -n othernode myapp-rg # clrg status 15. What happens if you switch the target group nfs-rg? # clrg switch -n othernode nfs-rg # clrg status 16. What happens if RGM wants to failover the source myapp-rg because its fault monitor indicates application failure? Kill myappdaemon on whichever node it is running a few times (be patient with restarts) and observe.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
10-28
Exercise Summary
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
!
?
10-29
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Module 11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Congure a scalable application in non-global zones Congure HA-Oracle in a Sun Cluster 3.2 software environment as a failover application Congure Oracle RAC 10g in a Sun Cluster 3.2 software environment
11-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Relevance
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Relevance
Discussion The following questions are relevant to understanding the content this module
q
!
?
Why is it more difcult to install Oracle software for a standard Oracle HA database on the local disks of each node rather than installing it in the shared storage? Is there any advantage to installing software on the local disks of each node? How is managing Oracle RAC different from managing the other types of failover and scalable services presented in this course?
11-2
Additional Resources
Additional Resources
Additional resources The following references provide additional information on the topics described in this module:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Conguring and Installing the Zones Task 2 Booting the Zones Task 3 Modifying Your Resource Groups So Apache Runs in the Non-global Zones Task 4 Verify That the Apache Application Is Running Correctly in the Non-global Zones
Preparation
Before continuing with this exercise, read the background information in this section.
11-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Only scalable, load-balanced applications require a dedicated nonglobal zone IP address. You could run failover applications in non-global zones without dedicated public network IP addresses in the zones. 2. Congure the zone. # zonecfg -z myzone myzone: No such zone configured Use 'create' to begin configuring a new zone. zonecfg:myzone> create zonecfg:myzone> set zonepath=/myzone zonecfg:myzone> set autoboot=true zonecfg:myzone> add net zonecfg:myzone:net> set address=x.y.z.w zonecfg:myzone:net> set physical=node_pubnet_adapter zonecfg:myzone:net> end zonecfg:myzone> commit zonecfg:myzone> exit 3. Install the zone.
# zoneadm -z myzone install Preparing to install zone <myzone>. Creating list of files to copy from the global zone. . . . The file </myzone/root/var/sadm/system/logs/install_log> contains a log of the zone installation.
11-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
4.
11-6
Task 3 Modifying Your Resource Groups So Apache Runs in the Non-global Zones
Perform the following in the global zone on the nodes indicated: 1. From any one node, ofine the scalable and shared address resource groups: # clrg offline web-rg sa-rg 2. On all nodes (still in the global zone), manually do a loopback mount of the /global/web lesystem. This will allow the apache resource to validate correctly when the node list of its group is changed to the zones. Once the resource group is running in the zones, the HAStoragePlus methods will automatically be doing the mounts. # mkdir -p /myzone/root/global/web # mount -F lofs /global/web /myzone/root/global/web 3. From any one node, modify the node lists: # clrg set -n node1:myzone,node2:myzone[,node3:myzone] sa-rg # clrg set -n node1:myzone,node2:myzone[,node3:myzone] web-rg 4. From any one node, unmanage and then remanage all the groups. This is required to properly recongure the shared address in all the zones, and start the load-balanced service: # clrs disable -g sa-rg,web-rg + # clrg unmanage sa-rg web-rg # clrg online -Me sa-rg web-rg
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You may see console messages about the scalable service being registered with the physical node name, although the application will really be running in the zone. This is normal.
11-7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 4 Verify That the Apache Application Is Running Correctly in the Non-global Zones
Perform the following steps to verify the correct operation of Apache, and to demonstrate that the load balancing is working when the scalable application is running across non-global zones. 1. On any node, from the global zone, verify the cluster application status: # clrg status # clrs status 2. On each node, log in to the non-global zone and verify that the Apache application is running inside the zone: # zlogin myzone (in the non-global zone) # ps -ef|grep apache2 # exit 3. On the administration workstation or display, point a Web browser to http://clustername-web/cgi-bin/test-apache.cgi. Press reload or refresh several times to demonstrate that load balancing is working across the zones. It is not round-robin load balancing; therefore, you might get a response from the same zone several times in a row.
11-8
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover Application
In this exercise you integrate Oracle 10g into Sun Cluster 3.2 software as a failover application. In this exercise, you complete the following tasks:
q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Creating a Logical IP Entry in the /etc/hosts File Task 2 Creating oracle and dba Accounts Task 3 Creating a Shared Storage File System for Oracle Software (VxVM) Task 4 Creating a Shared Storage File System for Oracle Software (SVM) Task 5 Preparing the oracle User Environment Task 6 Disabling Access Control of X Server on the Admin Workstation Task 7 Running the runInstaller Installation Script Task 8 Preparing an Oracle Instance for Cluster Integration Task 9 Verifying That Oracle Software Runs on Other Nodes Task 10 Registering the SUNW.HAStoragePlus Type Task 11 Installing and Registering Oracle Data Service Task 12 Creating Resources and Resource Groups for Oracle Task 13 Verifying That Oracle Runs Properly in the Cluster
q q
q q q q q q q
Preparation
Before continuing with this exercise, read the background information in this section.
11-9
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Background Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
It is relatively straightforward to congure the Oracle 10g Database software as a failover application in the Sun Cluster 3.2 environment. Like the majority of failover applications, the Oracle software is clusterunaware. You will be installing and conguring the software exactly as you would on a stand-alone system and then manipulating the conguration so that it listens on a failover logical IP address.
11-10
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-11
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 3 Creating a Shared Storage File System for Oracle Software (VxVM)
Perform the following steps on any node physically connected to the storage. Only step 6 is performed on all nodes. 1. Select two disks from shared storage (one from one array and one from the other array) for a new disk group. Make sure that you do not use any disks already in use in existing device groups. Note the logical device name (referred to as cAtAdA and cBtBdB in step 2). The following example checks against both VxVM and Solaris Volume Manager disks, just in case you are running both. # vxdisk -o alldgs list # metaset # cldev list -v 2. Create a disk group using these disks. # /etc/vx/bin/vxdisksetup -i cAtAdA # /etc/vx/bin/vxdisksetup -i cBtBdB # vxdg init ora-dg ora1=cAtAdA ora2=cBtBdB 3. 4. Create a mirrored volume to hold the Oracle binaries and data. # vxassist -g ora-dg make oravol 3g layout=mirror Register the new disk group (and its volume) with the cluster. The nodelist property contains all nodes physically connected to the storage. # cldg create -t vxvm -n node1,node2 ora-dg 5. 6. Create a UFS le system on the volume by typing: # newfs /dev/vx/rdsk/ora-dg/oravol Create a mount point and an entry in /etc/vfstab on all nodes: # mkdir /global/oracle # vi /etc/vfstab
/dev/vx/dsk/ora-dg/oravol /dev/vx/rdsk/ora-dg/oravol /global/oracle ufs 2 yes global
7. 8.
On any cluster node, mount the le system by typing: # mount /global/oracle From any cluster node, modify the owner and group associated with the newly mounted global le system by typing: # chown oracle:dba /global/oracle
11-12
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Task 4 Creating a Shared Storage File System for Oracle Software (SVM)
Perform the following steps on any node physically connected to the storage. Only step 5 is performed on all nodes. 1. Select two disks from shared storage (one from one array and one from the other array) for a new diskset. Make sure you do not use any disks already in use in existing device groups. Note the DID device names (referred to as dA and dB in step 2). The following example checks against both VxVM and Solaris Volume Manager disks, just in case you are running both. # vxdisk -o alldgs list # metaset # scdidadm -L 2. Create a diskset using these disks. # metaset -s ora-ds -a -h node1 node2 # metaset -s ora-ds -a /dev/did/rdsk/dA # metaset -s ora-ds -a /dev/did/rdsk/dB 3. Create a soft partition (d100) of a mirrored volume for Oracle. # # # # # 4. 5. metainit -s ora-ds d11 1 1 /dev/did/rdsk/dAs0 metainit -s ora-ds d12 1 1 /dev/did/rdsk/dBs0 metainit -s ora-ds d10 -m d11 metattach -s ora-ds d10 d12 metainit -s ora-ds d100 -p d10 3g
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Create a le system: # newfs /dev/md/ora-ds/rdsk/d100 Create a mount point and an entry in /etc/vfstab on all nodes:
6. 7.
On any cluster node, mount the le system by typing: # mount /global/oracle From any cluster node, modify the owner and group associated with the newly mounted global le system by typing:
11-13
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-14
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover 2. Mount the /global/oracle directory as a nonglobal le system local to the node on which the installation is to be performed. We are doing this just to speed up the installation. # mount -o noglobal /global/oracle
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note If you get a no such device error, the other node is probably still primary for the underlying storage. Use cldg switch -n this_node ora-dg and try to mount again. 3. 4. 5. 6. Switch the user to the oracle user by typing: # su - oracle Change directory to the location of the Oracle software by typing: $ cd ORACLE-10gR2-db-software-location Run the runInstaller script by typing: $ ./runInstaller Respond to the dialogs by using Table 11-1.
Table 11-1 The runInstaller Script Dialog Answers Dialog Welcome Action Click Advanced Installation (near the bottom). [The basic installation makes some wrong choices for Sun Cluster.] Click Next. Specify Inventory Directory and Credentials Select Installation Type Specify Home Details Verify and click Next. Select the Custom radio button, and click Next. Verify and click Next. [The path is taken from the $ORACLE_HOME that you set in your prole, so if it looks wrong, you need to quit it and start again.]
11-15
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover Table 11-1 The runInstaller Script Dialog Answers (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Enterprise Edition Options Oracle Enterprise Manager Console DB 10.2.0.1.0 Oracle Programmer 10.2.0.1.0 Oracle XML Development Kit 10.2.0.1.0
q q
Click Next. Product Specic Prerequisite Checks Most likely, these checks will all succeed, and you will be moved automatically to the next screen without having to touch anything. If you happen to get a warning, click Next, and if you get a pop-up warning window, click Yes to proceed. Privileged Operating System Groups Create Database Summary Oracle Net Conguration Assistant Welcome Listener Conguration, Listener Name Select Protocols TCP/IP Protocol Conguration More Listeners Listener Conguration Done Naming Methods Conguration Verify that they both say dba, and click Next. Verify that the Create a Database radio button is selected, and click Next. Verify, and click install. Verify that the Perform Typical Conguration check box is not selected, and click Next. Verify that the Listener name is LISTENER, and click Next. Verify that TCP is among the Selected Protocols, and click Next. Verify that the Use the Standard Port Number of 1521 radio button is selected, and click Next. Verify that the No radio button is selected, and click Next. Click Next. Verify that the No, I Do Not Want to Congure Additional Naming Methods radio button is selected, and click Next.
11-16
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover Table 11-1 The runInstaller Script Dialog Answers (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Action Click Finish. Select the General Purpose radio button, and click Next. Type MYORA in the Global Database Name text eld, observe that it is echoed in the SID text eld, and click Next. Verify this is not available (because you deselected the software option earlier), and click Next. Verify that the Use the Same Password for All Accounts radio button is selected. Enter cangetin as the password, and click Next.
Verify that File System is selected, and click Next. Verify that Use Common Location is selected, and verify the path. Click Next.
Uncheck all the boxes, and click Next. Uncheck Sample Schemas, and click Next. On the Memory tab, select the Typical radio button. Change the Percentage to a ridiculously small number (1%). Click Next and accept the error telling you the minimum memory required. The percentage will automatically be changed on your form. Click Next.
Click Next. Verify that Create Database is selected, and click Finish.
11-17
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover Table 11-1 The runInstaller Script Dialog Answers (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Dialog Conrmation
Action Verify summary of database creation options, parameters, character sets, and data les, and click OK. Note This might be a good time to take a break. The database conguration assistant takes 15 to 20 minutes to complete.
Click Exit. Open a terminal window as user root on the installer node and run:
/global/oracle/oraInventory/orainstRoot.sh
Now run
/global/oracle/product/10.2.0/db_1/root.sh
Accept the default path name for the local bin directory. Click OK in the script prompt window. End of Installation Click Exit and conrm.
$ sqlplus /nolog SQL> connect / as sysdba SQL> create user sc_fm identified by sc_fm; SQL> grant create session, create table to sc_fm; SQL> grant select on v_$sysstat to sc_fm; SQL> alter user sc_fm default tablespace users quota 1m on users; SQL> quit
11-18
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover 2. Test the new Oracle user account used by the fault monitor by typing (as user oracle): $ sqlplus sc_fm/sc_fm SQL> select * from sys.v_$sysstat; SQL> quit 3. Create a sample table (as user oracle). You can be creative about your data values:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
$ sqlplus /nolog SQL> connect / as sysdba SQL> create table mytable (Name VARCHAR(20), age NUMBER(4)); SQL> insert into mytable values ('vincent', 14); SQL> insert into mytable values ('theo', 14); SQL> commit; SQL> select * from mytable; SQL> quit
Note This sample table is created only for a quick verication that the database is running properly. 4. Shut down the Oracle instance by typing: $ sqlplus /nolog SQL> connect / as sysdba SQL> shutdown SQL> quit 5. Stop the Oracle listener by typing: $ lsnrctl stop
11-19
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover 6. Congure the Oracle listener by typing (as user oracle): $ vi $ORACLE_HOME/network/admin/listener.ora Modify the HOST variable to match logical host name ora-lh. Add the following lines between the second-to-last and last right parentheses of the SID_LIST_LISTENER information: (SID_DESC = (SID_NAME = MYORA) (ORACLE_HOME = /global/oracle/product/10.2.0/db_1) (GLOBALDBNAME = MYORA) ) Your entire le should look identical to the following if your logical host name is literally ora-lh: SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SID_NAME = PLSExtProc) (ORACLE_HOME = /global/oracle/product/10.2.0/db_1) (PROGRAM = extproc) ) (SID_DESC = (SID_NAME = MYORA) (ORACLE_HOME = /global/oracle/product/10.2.0/db_1) (GLOBALDBNAME = MYORA) ) ) LISTENER = (DESCRIPTION_LIST = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = ora-lh)(PORT = 1521)) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC)) ) ) ) 7. Congure the tnsnames.ora le by typing (as user oracle): $ vi $ORACLE_HOME/network/admin/tnsnames.ora Modify the HOST variables to match logical host name ora-lh.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-20
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover 8. Rename the parameter le by typing (as user oracle):
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
$ cd /global/oracle/admin/MYORA/pfile $ mv init.ora.* initMYORA.ora 9. Unmount the /global/oracle le system and remount it using the global option (as user root) by typing: # umount /global/oracle # mount /global/oracle
11-21
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-22
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover 4. Put the resource group online (required by the dependency specication in the next step): # clrg online -M ora-rg 5. Create an oracle_server resource by creating and running a script. (the command is too long to type interactively in some shells): a. Write a script containing the clrs command by typing:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# vi /var/tmp/setup-ora-res.sh #!/bin/sh /usr/cluster/bin/clrs create -g ora-rg \ -t oracle_server \ -p Resource_dependencies=ora-stor \ -p ORACLE_sid=MYORA -x ORACLE_home=/global/oracle/product/10.2.0/db_1 \ -p Alert_log_file=/global/oracle/admin/MYORA/bdump/alert_MYORA.log \ -p Parameter_file=/global/oracle/admin/MYORA/pfile/initMYORA.ora \ -p Connect_string=sc_fm/sc_fm ora-server-res b. 6. Run the script by typing: # sh /var/tmp/setup-ora-res.sh Create an oracle_listener resource by typing: # clrs create -g ora-rg -t oracle_listener \ -p ORACLE_home=/global/oracle/product/10.2.0/db_1 \ -p Listener_name=LISTENER -p Resource_dependencies=ora-stor \ ora-listener-res 7. Verify the application: # clrg status # clrs status # ps -ef | grep oracle
11-23
Exercise 2: Integrating Oracle 10g Into Sun Cluster 3.2 Software as a Failover
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-24
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 1 Selecting the Nodes That Will Run Oracle RAC Task 1.5 Uncongure any Nonstorage Node From the Cluster Task 2 Creating User and Group Accounts Task 3A (If Using VxVM) Installing RAC Framework Packages for Oracle RAC With VxVM Cluster Volume Manager Task 3B (If Using Solaris Volume Manager) Installing RAC Framework Packages for Oracle RAC With Solaris Volume Manager Multi-Owner Disksets Task 4 Installing Oracle Distributed Lock Manager Task 5A (If Using VxVM) Creating and Enabling the RAC Framework Resource Group Task 5B (If Using Solaris Volume Manager) Creating and Enabling the RAC Framework Resource Group Task 6A Creating Raw Volumes (VxVM) Task 6B Creating Raw Volumes (Solaris Volume Manager) Task 7 Conguring Oracle Virtual IPs Task 8 Conguring the oracle User Environment Task 9A Creating the dbca_raw_config File (VxVM) Task 9B Creating the dbca_raw_config File (Solaris Volume Manager) Task 10 Disabling Access Control on X Server of the Admin Workstation Task 11 Installing Oracle CRS Software Respond to the dialog boxes by using Table 11-2. Task 12 Creating Resources and Resource Groups for Oracle Task 14 Verifying That Oracle RAC Works Properly in a Cluster
q q
q q q q q q
q q q q
11-25
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Preparation
Before continuing with this exercise, read the background information in this section.
Background
Oracle 10g RAC software in the Sun Cluster environment encompasses several layers of software, as follows:
q
RAC Framework This layer sits just above the Sun Cluster framework. It encompasses the UNIX distributed lock manager (udlm) and a RAC-specic cluster membership monitor (cmm). In the Solaris 10 OS, you must create a resource group rac-framework-rg to control this layer (in the Solaris 9 OS, it is optional).
Oracle Cluster Ready Services (CRS) CRS is essentially Oracles own implementation of a resource group manager. That is, for Oracle RAC database instances and their associated listeners and related resources, CRS takes the place of the Sun Cluster resource group manager.
Oracle Database The actual Oracle RAC database instances run on top of CRS. The database software must be installed separately (it is a different product) after CRS is installed and enabled. The database product has hooks that recognize that it is being installed in a CRS environment.
Sun Cluster control of RAC The Sun Cluster 3.2 environment features new proxy resource types that allow you to monitor and control Oracle RAC database instances using standard Sun Cluster commands. The Sun Cluster instances issue commands to CRS to achieve the underlying control of the database.
11-26
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software The various RAC layers are illustrated in Figure 11-1.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
ORACLE CRS
Sun Cluster RAC proxy resource lets you use Sun Cluster commands to control database by "proxying" through CRS
Controlled by rac-framework rg
RAC Framework
Raw devices using the VxVM Cluster Volume Manager (CVM) feature Raw devices using the Solaris Volume Manager multi-owner diskset feature Raw devices using no volume manager (assumes hardware RAID) Shared QFS le system on raw devices or Solaris Volume Manager multi-owner devices On a supported NAS device Starting in Sun Cluster 3.1 8/05, the only such supported device is a clustered Network Appliance Filer.
q q
Note Use of global devices (using normal device groups) or a global le system is not supported. The rationale is that if you used global devices or a global le system, your cluster transport would now be used for both the application-specic RAC trafc and for the underlying device trafc. The performance detriment this might cause might eliminate the advantages of using RAC in the rst place.
11-27
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software In this lab, you can choose to use either VxVM or Solaris Volume Manager. The variations in the lab are indicated by tasks that have an A or B sufx. You will always choose only one such task. When you have built your storage devices, there are very few variations in the tasks that actually install and test the Oracle software. This lab does not actually address careful planning of your raw device storage. In the lab, you create all the volumes using the same pair of disks.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-28
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
3.
11-29
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 3A (If Using VxVM) Installing RAC Framework Packages for Oracle RAC With VxVM Cluster Volume Manager
Perform the following steps on all selected cluster nodes 1. Install the appropriate packages from the software location: # cd sc32_location/Solaris_sparc/Product # cd sun_cluster_agents/Solaris_10/Packages # pkgadd -d . SUNWscucm SUNWudlm SUNWudlmr SUNWcvm SUNWcvmr SUNWscor 2. Enter a license for the CVM feature of VxVM. Your instructor will tell you from where you can paste a CVM license. # vxlicinst 3. 4. Verify that you now have a second license for the CVM_FULL feature: # vxlicrep Refresh the VxVM licenses that are in memory: # vxdctl license init
11-30
Task 3B (If Using Solaris Volume Manager) Installing RAC Framework Packages for Oracle RAC With Solaris Volume Manager Multi-Owner Disksets
Perform the following steps on all selected cluster nodes. For Solaris Volume Manager installations, no reboot is required. 1. Install the appropriate packages from the software location: # cd sc32_location/Solaris_sparc/Product # cd sun_cluster_agents/Solaris_10/Packages # pkgadd -d . SUNWscucm SUNWudlm SUNWudlmr SUNWscmd SUNWscor 2. 3. List the local metadbs: # metadb Add metadbs on the root drive, only if they do not yet exist: # metadb -a -f -c 3 c#t#d#s7
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The pkgadd command prompts you for the group that is to be DBA for the database. Respond by typing dba: Please enter the group which should be able to act as the DBA of the database (dba): [?] dba
11-31
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Task 5A (If Using VxVM) Creating and Enabling the RAC Framework Resource Group
On any one of the selected nodes, run the following commands as root to create the RAC framework resource group. # clrt register rac_framework # clrt register rac_udlm # clrt register rac_cvm # clrg create -n node1,node2 -p Desired_primaries=2 \ -p Maximum_primaries=2 rac-framework-rg # clrs create -g rac-framework-rg -t rac_framework rac-framework-res # clrs create -g rac-framework-rg -t rac_udlm \ -p Resource_dependencies=rac-framework-res rac-udlm-res # clrs create -g rac-framework-rg -t rac_cvm \ -p Resource_dependencies=rac-framework-res rac-cvm-res # clrg online -M rac-framework-rg If all goes well, you will see a message on the console: Unix DLM version(2) and SUN Unix DLM Library Version (1):compatible Following that there will be about 100 lines of console output as CVM initializes. If your nodes are still alive (not kernel panic), you are in good shape.
Task 5B (If Using Solaris Volume Manager) Creating and Enabling the RAC Framework Resource Group
On any one of the selected nodes, run the following commands as root to create the RAC framework resource group. As you create the group, use only those nodes on which you have selected to run RAC (the ones connected to the physical storage): # clrt register rac_framework # clrt register rac_udlm # clrt register rac_svm # clrg create -n node1,node2 -p Desired_primaries=2 \ -p Maximum_primaries=2 rac-framework-rg
11-32
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software # clrs create -g rac-framework-rg -t rac_framework rac-framework-res
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# clrs create -g rac-framework-rg -t rac_udlm \ -p Resource_dependencies=rac-framework-res rac-udlm-res # clrs create -g rac-framework-rg -t rac_svm \ -p Resource_dependencies=rac-framework-res rac-svm-res # clrg online -M rac-framework-rg
If all goes well, you will see a message on the both consoles: Unix DLM version(2) and SUN Unix DLM Library Version (1):compatible
11-33
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Just to speed up the lab, we will create unmirrored volumes. You can always add another disk and mirror all volumes later at your leisure, or pretend that you were going to. 3. Create a CVM shared disk group using this disk: # /etc/vx/bin/vxdisksetup -i c#t#d# # vxdg -s init ora-rac-dg ora1=cAtAdA 4. Create the volumes for the database. You can use the provided create_rac_vxvm_volumes script. This script is reproduced here, if you want to type it in yourself rather than use the script: make make make make make make make make make make make make make make make raw_system 500m raw_spfile 100m raw_users 120m raw_temp 100m raw_undotbs1 312m raw_undotbs2 312m raw_sysaux 300m raw_control1 110m raw_control2 110m raw_redo11 120m raw_redo12 120m raw_redo21 120m raw_redo22 120m raw_ocr 100m raw_css_voting_disk 20m
vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist vxassist
-g -g -g -g -g -g -g -g -g -g -g -g -g -g -g
ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg ora-rac-dg
11-34
Note You need to create 3 additional volumes for each additional node (more than 2) that you may be using. For example, for a third node create the volumes raw_undotbs3 (size 312m), raw_redo13 (120m), and raw_redo23 (120m) 5. Zero out the conguration and voting devices. This eliminates problems that might arise from data left over from a previous class:
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Do not be concerned with error messages that appear at the end of running the above commands. You will probably get errors indicating that the end of the device does not align exactly on a 16K boundary. 6. Change the owner and group associated with the newly created volumes (assumes sh or ksh syntax here): # # > > > cd /dev/vx/rdsk/ora-rac-dg for vol in * do vxedit -g ora-rac-dg set user=oracle group=dba $vol done
11-35
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Just to speed up the lab, we will create unmirrored volumes. You can always add another disk and mirror the parent volume later at your leisure, or pretend that you were going to.
2.
Create a multi-owner diskset using this disk. # metaset -s ora-rac-ds -a -M -h node1 node2 # metaset -s ora-rac-ds -a /dev/did/rdsk/d#
3.
Create a big mirror from which to partition all the necessary data volumes (still call it a mirror although it has only one submirror; then you could easily just add the second half of the mirror later at your leisure) # metainit -s ora-rac-ds d11 1 1 /dev/did/rdsk/d#s0 # metainit -s ora-rac-ds d10 -m d11
4.
Create the volumes for the database. You can use the provided create_rac_svm_volumes script. This script is reproduced here, including the comments indicating which volumes you are creating. The commands are highlighted, if you want to type them in yourself rather than use the script. # system volume d100 (500mb) metainit -s ora-rac-ds d100 -p d10 500m # spfile volume d101 (100mb) metainit -s ora-rac-ds d101 -p d10 100m # users volume d102 (120mb) metainit -s ora-rac-ds d102 -p d10 120m # temp volume d103 (100mb) metainit -s ora-rac-ds d103 -p d10 100m # undo volume #1 d104 (312mb) metainit -s ora-rac-ds d104 -p d10 312m # undo volume #2 d105 (312mb) metainit -s ora-rac-ds d105 -p d10 312m # sysaux volume d106 (300mb) metainit -s ora-rac-ds d106 -p d10 300m # control volume #1 d107 (100mb) metainit -s ora-rac-ds d107 -p d10 110m
11-36
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# control volume #2 d108 (100mb) metainit -s ora-rac-ds d108 -p d10 110m # redo volume #1.1 d109 (120m) metainit -s ora-rac-ds d109 -p d10 120m # redo volume #1.2 d110 (120m) metainit -s ora-rac-ds d110 -p d10 120m # redo volume #2.1 d111 (120m) metainit -s ora-rac-ds d111 -p d10 120m # redo volume #2.2 d112(120m) metainit -s ora-rac-ds d112 -p d10 120m # OCR Configuration volume d113 (100mb) metainit -s ora-rac-ds d113 -p d10 100m # CSS voting disk d114 (20mb) metainit -s ora-rac-ds d114 -p d10 20m
Note You need to create 3 additional volumes for each additional node (more than 2) that you may be using. For example, for a third node create d115 (312mb) for the undo #3 volume , d116 (120m) for redo #1.3, and d117 (120mb) for redo #2.3.
5.
From either node, zero out the conguration and voting devices. This eliminates problems that might arise from data left over from a previous class:
Note Do not be concerned with error messages that appear at the end of running the above commands. You will probably get errors indicating that the end of the device does not align exactly on a 16K boundary. 6. On all selected nodes, change the owner and group associated with the newly created volumes. # chown oracle:dba /dev/md/ora-rac-ds/dsk/* # chown oracle:dba /dev/md/ora-rac-ds/rdsk/*
11-37
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-38
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
$ vi .profile ORACLE_BASE=/oracle ORACLE_HOME=$ORACLE_BASE/product/10.2.0/db_1 CRS_HOME=$ORACLE_BASE/product/10.2.0/crs TNS_ADMIN=$ORACLE_HOME/network/admin DBCA_RAW_CONFIG=$ORACLE_BASE/dbca_raw_config #SRVM_SHARED_CONFIG=/dev/md/ora-rac-ds/rdsk/d101 SRVM_SHARED_CONFIG=/dev/vx/rdsk/ora-rac-dg/raw_spfile DISPLAY=display-station-name-or-IP:display# if [ `/usr/sbin/clinfo -n` -eq 1 ]; then ORACLE_SID=sun1 fi if [ `/usr/sbin/clinfo -n` = 2 ]; then ORACLE_SID=sun2 fi PATH=/usr/ccs/bin:$ORACLE_HOME/bin:$CRS_HOME/bin:/usr/bin:/usr/sbin export ORACLE_BASE ORACLE_HOME TNS_ADMIN DBCA_RAW_CONFIG CRS_HOME export SRVM_SHARED_CONFIG ORACLE_SID PATH DISPLAY 3. You might need to make a modication in the le, depending on your choice of VxVM or Solaris Volume Manager. Make sure the line beginning SRVM_SHARED_CONFIG lists the correct option and that the other choice is deleted or commented out. Make sure your actual X-Windows display is set correctly on the line that begins with DISPLAY=. Read in the contents of your new .profile le and verify the environment. $ . ./.profile $ env 6. Enable rsh for the oracle user. $ echo + >/oracle/.rhosts
4. 5.
11-39
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Add entries for undotbs3, redo1_3, and redo2_3 using the appropriate additional volumes names you created if you have a third node. Extrapolate appropriately for each additional node.
11-40
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Add entries for undotbs3, redo1_3, and redo2_3 using the appropriate additional volumes names you created if you have a third node. Extrapolate appropriately for each additional node.
11-41
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Table 11-2 Oracle CRS Installation Dialog Actions Dialog Welcome Specify Inventory directory and credentials Specify Home Details Action Click Next. Verify, and click Next. Change the Path to: /oracle/product/10.2.0/crs Click Next. Product Specic Prerequisite Checks Most likely, these checks will all succeed, and you will be moved automatically to the next screen without having to touch anything. If you happen to get a warning, click Next, and if you get a pop-up warning window, click Yes to proceed.
11-42
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software Table 11-2 Oracle CRS Installation Dialog Actions (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Action Enter the name of the cluster (this is actually unimportant). For each node listed, verify that the Virtual Host Name column contains nodename-vip similar to what you entered in the /etc/hosts le. Click Next. If you get some error concerning a node that cannot be clustered (null), it is probably because you do not have an /oracle/.rhosts le. You must have one even on the node on which you are running the installer.
Be very careful with this section! To mark an adapter in the instructions in this box, highlight it and click Edit. Then choose the appropriate radio button and click OK Mark your actual public adapters as public. Mark only the clprivnet0 interface as private. Mark all other adapters, including actual private network adapters, as Do Not Use. Click Next.
Choose the External Redundancy radio button. VxVM: Enter /dev/vx/rdsk/ora-rac-dg/raw_ocr SVM: Enter /dev/md/ora-rac-ds/rdsk/d113 Click Next.
Choose the External Redundancy radio button. VxVM: Enter /dev/vx/rdsk/ora-rac-dg/raw_css_voting_disk SVM: Enter /dev/md/ora-rac-ds/rdsk/d114 Click Next.
Summary
11-43
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software Table 11-2 Oracle CRS Installation Dialog Actions (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Action On all selected nodes, one at a time, starting with the node on which you are running the installer, open a terminal window as user root and run the scripts: /oracle/oraInventory/orainstRoot.sh /oracle/product/10.2.0/crs/root.sh The second script formats the voting device and enables the CRS daemons on each node. Entries are put in /etc/inittab so that the daemons run at boot time. On all but the rst node, the messages: EXISTING Configuration detected... NO KEYS WERE Written..... are correct and expected. Please read the following section carefully: On the last node only, the second script tries to congure Oracles Virtual IPs. There is a known bug on the SPARC version: If your public net addresses are in the known non-Internettable range (10, 172.16 through 31, 192.168) this part fails right here. If you get the The given interface(s) ... is not public message, then ... before you continue ... As root (on that one node with the error): Set your DISPLAY (for example, DISPLAY=machine:#;export DISPLAY) Run /oracle/product/10.2.0/crs/bin/vipca Use Table 11-3 to respond, and when you are done, RETURN HERE. When you have run to completion on all nodes, including running vipca by hand if you need to, click OK on the Execute Conguration Scripts dialog.
Conguration Assistants
11-44
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software Table 11-2 Oracle CRS Installation Dialog Actions (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Table 11-3 VIPCA Dialog Actions Dialog Welcome Network Interfaces Action Click Next. Verify that all/both of your public network adapters are selected Click Next. Virtual IPs for Cluster Nodes For each node, enter the nodename-vip name that you created in your hosts les previously. When you press TAB, the form might automatically ll in the IP addresses, and the information for other nodes. If not, ll in the form manually. Verify that the netmasks are correct. Click Next. Summary Conguration Assistant Progress Dialog Conguration Results Verify, and click Finish. Conrm that the utility runs to 100%, and click OK.
11-45
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Table 11-4 Install Oracle Database Software Dialog Actions Dialog Welcome Select Installation Type Specify Home Details Action Click Next. Select the Custom radio button, and click Next. Verify, especially that the destination path is /oracle/product/10.2.0/db_1. Click Next. Specify Hardware Cluster Installation Mode Verify that the Cluster Installation radio button is selected. Put check marks next to all of your selected cluster nodes. Click Next.
11-46
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software Table 11-4 Install Oracle Database Software Dialog Actions (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Enterprise Edition Options Oracle Enterprise Manager Console DB 10.2.0.1.0 Oracle Programmer 10.2.0.1.0 Oracle XML Development Kit 10.2.0.1.0
Click Next. Product Specic Prerequisite Checks Most likely, these checks will all succeed, and you will be moved automatically to the next screen without having to touch anything. If you happen to get a warning, click Next, and if you get a pop-up warning window, click Yes to proceed. Privileged Operating System Groups Create Database Summary Oracle Net Conguration Assistant Welcome Listener Conguration, Listener Name Select Protocols TCP/IP Protocol Conguration More Listeners Verify that dba is listed in both entries, and click Next. Verify that Create a Database is selected, and click Next. Verify, and click Install Verify that the Perform Typical Conguration check box is not selected, and click Next. Verify the Listener name is LISTENER, and click Next. Verify that TCP is among the Selected Protocols, and click Next. Verify that the Use the Standard Port Number of 1521 radio button is selected, and click Next. Verify that the No radio button is selected, and click Next (be patient with this one, it takes a few seconds to move on) Click Next. Verify that the No, I Do Not Want to Congure Additional Naming Methods radio button is selected, and click Next.
11-47
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software Table 11-4 Install Oracle Database Software Dialog Actions (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Dialog Done DBCA Step 1: Database Templates DBCA Step 2: Database Identication DBCA Step 3: Management Options DBCA Step 4: Database Credentials
Action Click Finish. Select the General Purpose radio button, and click Next. Type sun in the Global Database Name text eld (notice that your keystrokes are echoed in the SID Prex text eld), and click Next. Verify that Enterprise Manager is not available (you eliminated it when you installed the database software) and click Next. Verify that the Use the Same Password for All Accounts radio button is selected. Enter cangetin as the password, and click Next.
Select the Raw Devices radio button. Verify that the check box for Raw Devices Mapping File is selected, and that the value is /oracle/dbca_raw_config. Click Next.
DBCA Step 6: Recovery Conguration DBCA Step 7: Database Content DBCA Step 8: Database Services DBCA Step 9: Initialization Parameters
Leave the boxes unchecked, and click Next. Uncheck Sample Schemas, and click Next. Click Next. On the Memory tab, verify that the Typical radio button is selected. Change the percentage to a ridiculously small number (1%). Click Next and accept the error telling you the minimum memory required. The percentage will automatically be changed on your form. Click Next.
11-48
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software Table 11-4 Install Oracle Database Software Dialog Actions (Continued)
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Action Verify that the database storage locations are correct by clicking leaves in the le tree in the left pane and examining the values shown in the right pane. These locations are determined by the contents of the /oracle/dbca_raw_config le that you prepared in a previous task. Click Next. Verify that the Create Database check box is selected, and click Finish. Verify and click OK. Click Exit. Wait a while (anywhere from a few seconds to a few minutes: be patient) and you will get a pop-up window informing you that Oracle is starting the RAC instances. On each of your selected nodes, open a terminal window as root, and run the script: /oracle/product/10.2.0/db_1/root.sh Accept the default path name for the local bin directory. Click OK in the script prompt window.
End of Installation
Task 13 Create Sun Cluster Resources to Control Oracle RAC Through CRS
Perform the following steps on only one of your selected nodes to create Sun Cluster resources that monitor your RAC storage, and that allow you to use Sun Cluster to control your RAC instances through CRS: 1. Register the types required for RAC storage and RAC instance control:
11-49
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software 2. Create a CRS framework resource (the purpose of this resource is to try to cleanly shut down CRS if you are evacuating the node using cluster commands):
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# clrs create -g rac-framework-rg -t crs_framework \ -p Resource_dependencies=rac-framework-res \ crs-framework-res 3. Create a group to hold the resource to monitor the shared storage: # clrg create -n node1,node2 \ -p Desired_primaries=2 \ -p Maximum_primaries=2 \ -p RG_affinities=++rac-framework-rg \ rac-storage-rg 4. Create the resource to monitor the shared storage. Do one of the following commands: If you are using VxVM, do this one and skip the next one: # clrs create -g rac-storage-rg -t ScalDeviceGroup \ -p DiskGroupName=ora-rac-dg \ -p Resource_dependencies=rac-cvm-res{local_node} \ rac-storage-res If you are using Solaris Volume Manager, perform this one: # clrs create -g rac-storage-rg -t ScalDeviceGroup \ -p DiskGroupName=ora-rac-ds \ -p Resource_dependencies=rac-svm-res{local_node} \ rac-storage-res 5. Everyone should continue here. Bring the resource group that monitors the shared storage online. Create a group and a resource to allow you to run cluster commands that control the database through CRS.
11-50
Note For the following command, make sure you understand which node has node id 1, which has node id 2, and so forth, so that you match correctly with the names of the database sub-instances. You can use clinfo -n to verify the node id on each node.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
# vi /var/tmp/cr_rac_control clrs create -g rac-control-rg -t scalable_rac_server_proxy \ -p DB_NAME=sun \ -p ORACLE_SID{name_of_node1}=sun1 \ -p ORACLE_SID{name_of_node2}=sun2 \ -p ORACLE_home=/oracle/product/10.2.0/db_1 \ -p Crs_home=/oracle/product/10.2.0/crs \ -p Resource_dependencies_offline_restart=rac-storage-res{local_node} \ -p Resource_dependencies=rac-framework-res \ rac-control-res # sh /var/tmp/cr_rac_control # clrg online -M rac-control-rg
11-51
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
11-52
Exercise 3: Running Oracle 10g RAC in Sun Cluster 3.2 Software # clrs enable -n node2 rac-control-res
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
7.
On that (affected node), you should be able to repeat step 5. It might take a few attempts before the database is initialized and you can successfully access your data. Cause a crash of node 1: # <Control-]> telnet> send break
8.
9.
On the surviving node, you should see (after 45 seconds or so), the CRS-controlled virtual IP for the crashed node migrate to the surviving node. Run the following as user oracle: $ crs_stat -t|grep vip ora.node2.vip ora.node1.vip application application ONLINE ONLINE ONLINE node2 ONLINE node2
10. While this virtual IP has failed over, verify that there is actually no failover listener controlled by Oracle CRS. This virtual IP fails over merely so a client quickly gets a TCP disconnect without having to wait for a long time-out. Client software then has a client-side option to fail over to the other instance. $ sqlplus SYS@sun1 as sysdba SQL*Plus: Release 10.2.0.1.0 - Production on Tue May 24 10:56:18 2005 Copyright (c) 1982, 2005, ORACLE. Enter password: ERROR: ORA-12541: TNS:no listener All rights reserved.
Enter user-name: ^D 11. Boot the node that you had halted, by typing boot or go at the OK prompt in the console. If you choose the latter, the node will panic and reboot. 12. After the node boots, monitor the automatic recovery of the virtual IP, the listener, and the database instance by typing, as user oracle, on the surviving node: $ crs_stat -t $ /usr/cluster/bin/clrs status
11-53
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
It can take several minutes for the full recovery. Keep repeating the steps. 13. Verify the proper operation of the Oracle database by contacting the various sub-instances as the user oracle on the various nodes: $ sqlplus SYS@sun1 as sysdba Enter password: cangetin SQL> select * from mytable; SQL> quit
Exercise Summary
Exercise Summary
Discussion Take a few minutes to discuss what experiences, issues, or discoveries you had during the lab exercises.
q q q q
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
!
?
11-55
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Appendix A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Terminal Concentrator
This appendix describes the conguration of the Sun Terminal Concentrator (TC) as a remote connection mechanism to the serial consoles of the nodes in the Sun Cluster software environment.
A-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Setup Device
Figure A-1
Note If the programmable read-only memory (PROM) operating system is older than version 52, you must upgrade it.
A-2
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Setup Port
Serial port 1 on the TC is a special purpose port that is used only during initial setup. It is used primarily to set up the IP address and load sequence of the TC. You can access port 1 from either a tip connection or from a locally connected terminal.
Terminal Concentrator
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
A-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Connecting to Port 1
To perform basic TC setup, you must connect to the TC setup port. Figure A-2 shows a tip hardwire connection from the administrative console. You can also connect an American Standard Code for Information Interchange (ASCII) terminal to the setup port.
Figure A-2
STATUS
POWER UNIT NET ATTN LOAD ACTIVE
Power indicator
Test button
Figure A-3
After you have enabled Setup mode, a monitor:: prompt should appear on the setup device. Use the addr, seq, and image commands to complete the conguration.
A-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Terminal Concentrator
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
A-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Although you can load the TCs operating system from an external server, this introduces an additional layer of complexity that is prone to failure.
Note Do not dene a dump or load address that is on another network because you receive additional questions about a gateway address. If you make a mistake, you can press Control-C to abort the setup and start again.
A-6
Setting Up the Terminal Concentrator Password: type the password annex# admin Annex administration MICRO-XL-UX R7.0.1, 8 ports admin: show port=1 type mode Port 1: type: hardwired mode: cli admin:set port=1 type hardwired mode cli admin:set port=2-8 type dial_in mode slave admin:set port=1-8 imask_7bits Y admin: quit annex# boot bootfile: <CR> warning: <CR>
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Do not perform this procedure through the special setup port; use public network access.
Terminal Concentrator
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
A-7
Setting Up the Terminal Concentrator admin: admin: admin: admin: admin: set port=2 port_password homer set port=3 port_password marge reset 2-3 set annex enable_security Y reset annex security
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A-8
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
129.50.2.12
Network 129.50.1 Terminal Concentrator (129.50.1.35) config.annex %gateway net default gateway 129.50.1.23 metric 1 active 129.50.2
Node 1
Node 2
Node 3
Node 4
Figure A-4
Terminal Concentrator
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
A-9
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note You must enter an IP routing address appropriate for your site. While the TC is rebooting, the node console connections are not available.
A-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Network Network TC TC
Node 2 Node 1
Figure A-5
Terminal Concentrator
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
A-11
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A-12
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Terminal Concentrator
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
A-13
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
A-14
Appendix B
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
B-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Multi-Initiator Overview
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Multi-Initiator Overview
This section applies only to SCSI storage devices and not to Fibre Channel storage used for the multihost disks. In a stand-alone server, the server node controls the SCSI bus activities using the SCSI host adapter circuit connecting this server to a particular SCSI bus. This SCSI host adapter circuit is referred to as the SCSI initiator. This circuit initiates all bus activities for this SCSI bus. The default SCSI address of SCSI host adapters in Sun systems is 7. Cluster congurations share storage between multiple server nodes. When the cluster storage consists of singled-ended or differential SCSI devices, the conguration is referred to as multi-initiator SCSI. As this terminology implies, more than one SCSI initiator exists on the SCSI bus. The SCSI specication requires that each device on a SCSI bus have a unique SCSI address. (The host adapter is also a device on the SCSI bus.) The default hardware conguration in a multi-initiator environment results in a conict because all SCSI host adapters default to 7. To resolve this conict, on each SCSI bus leave one of the SCSI host adapters with the SCSI address of 7, and set the other host adapters to unused SCSI addresses. Proper planning dictates that these unused SCSI addresses include both currently and eventually unused addresses. An example of addresses unused in the future is the addition of storage by installing new drives into empty drive slots. In most congurations, the available SCSI address for a second host adapter is 6. You can change the selected SCSI addresses for these host adapters by setting the scsi-initiator-id OpenBoot PROM property. You can set this property globally for a node or on a per host adapter basis. Instructions for setting a unique scsi-initiator-id for each SCSI host adapter are included in the chapter for each disk enclosure in Sun Cluster Hardware Guides.
B-2
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
For the procedure on installing host adapters, refer to the documentation that shipped with your nodes. 3. Connect the cables to the device, as shown in Figure B-1 on page B-4.
B-3
Installing a Sun StorEdge Multipack Device Make sure that the entire SCSI bus length to each device is less than 6 meters. This measurement includes the cables to both nodes, as well as the bus length internal to each device, node, and host adapter. Refer to the documentation that shipped with the device for other restrictions regarding SCSI operation.
Node 1 Host Adapter A Host Adapter B Node 2 Host adapter B Host Adapter A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
SCSI IN
9-14 1-6
SCSI IN
9-14 1-6
IN
IN
OUT
OUT
SCSI cables
Figure B-1
4. 5.
Connect the AC power cord for each device of the pair to a different power source. Without allowing the node to boot, power on the rst node. If necessary, abort the system to continue with OpenBoot PROM Monitor tasks. Find the paths to the host adapters. ok show-disks Identify and record the two controllers that will be connected to the storage devices, and record these paths. Use this information to change the SCSI addresses of these controllers in the nvramrc script. Do not include the trailing /disk or /sd entries in the device paths.
6.
7.
Edit the nvramrc script to set the scsi-initiator-id for the host adapter on the rst node. For a list of nvramrc editor and nvedit keystroke commands, see the The nvramrc Editor and nvedit Keystroke Commands on page B-11.
B-4
Installing a Sun StorEdge Multipack Device The following example sets the scsi-initiator-id to 6. The OpenBoot PROM Monitor prints the line numbers (0:, 1:, and so on). nvedit 0: probe-all 1: cd /pci@1f,4000/scsi@4 2: 6 encode-int " scsi-initiator-id" property 3: device-end 4: cd /pci@1f,4000/scsi@4,1 5: 6 encode-int " scsi-initiator-id" property 6: device-end 7: install-console 8: banner <Control-C> ok
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Insert exactly one space after the rst double quotation mark and before scsi-initiator-id. 8. Store the changes. The changes you make through the nvedit command are done on a temporary copy of the nvramrc script. You can continue to edit this copy without risk. After you complete your edits, save the changes. If you are not sure about the changes, discard them.
q
9.
Verify the contents of the nvramrc script you created in step 7. ok printenv nvramrc nvramrc = probe-all cd /pci@1f,4000/scsi@4 6 encode-int " scsi-initiator-id" property device-end cd /pci@1f,4000/scsi@4,1 6 encode-int " scsi-initiator-id" property device-end install-console banner ok
B-5
Installing a Sun StorEdge Multipack Device 10. Instruct the OpenBoot PROM Monitor to use the nvramrc script.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
ok setenv use-nvramrc? true use-nvramrc? = true ok 11. Without allowing the node to boot, power on the second node. If necessary, abort the system to continue with OpenBoot PROM Monitor tasks. 12. Verify that the scsi-initiator-id for the host adapter on the second node is set to 7. ok cd /pci@1f,4000/scsi@4 ok .properties scsi-initiator-id 00000007 ... 13. Continue with the Solaris OS, Sun Cluster software, and volume management software installation tasks.
B-6
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
B-7
Installing a Sun StorEdge D1000 Array Make sure that the entire bus length connected to each array is less than 25 meters. This measurement includes the cables to both nodes, as well as the bus length internal to each array, node, and the host adapter.
Node 1 Host Adapter A Host Adapter B Node 2 Host Adapter B Host Adapter A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Disk array 1
Disk array 2
Figure B-2 4. 5. 6.
Connect the AC power cord for each array of the pair to a different power source. Power on the rst node and the arrays. Find the paths to the host adapters. ok show-disks Identify and record the two controllers that will be connected to the storage devices and record these paths. Use this information to change the SCSI addresses of these controllers in the nvramrc script. Do not include the /disk or /sd trailing parts in the device paths.
7.
Edit the nvramrc script to change the scsi-initiator-id for the host adapter on the rst node. For a list of nvramrc editor and nvedit keystroke commands, see The nvramrc Editor and nvedit Keystroke Commands on page B-11.
B-8
Installing a Sun StorEdge D1000 Array The following example sets the scsi-initiator-id to 6. The OpenBoot PROM Monitor prints the line numbers (0:, 1:, and so on). nvedit 0: probe-all 1: cd /pci@1f,4000/scsi@4 2: 6 encode-int " scsi-initiator-id" property 3: device-end 4: cd /pci@1f,4000/scsi@4,1 5: 6 encode-int " scsi-initiator-id" property 6: device-end 7: install-console 8: banner <Control-C> ok
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Insert exactly one space after the rst double quotation mark and before scsi-initiator-id. 8. Store or discard the changes. The edits are done on a temporary copy of the nvramrc script. You can continue to edit this copy without risk. After you complete your edits, save the changes. If you are not sure about the changes, discard them.
q
9.
Verify the contents of the nvramrc script you created in step 7. ok printenv nvramrc nvramrc = probe-all cd /pci@1f,4000/scsi@4 6 encode-int " scsi-initiator-id" property device-end cd /pci@1f,4000/scsi@4,1 6 encode-int " scsi-initiator-id" property device-end install-console banner ok
B-9
Installing a Sun StorEdge D1000 Array 10. Instruct the OpenBoot PROM Monitor to use the nvramrc script.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
ok setenv use-nvramrc? true use-nvramrc? = true ok 11. Without allowing the node to boot, power on the second node. If necessary, abort the system to continue with OpenBoot PROM Monitor tasks. 12. Verify that the scsi-initiator-id for each host adapter on the second node is set to 7. ok cd /pci@1f,4000/scsi@4 ok .properties scsi-initiator-id 00000007 differential isp-fcode 1.21 95/05/18 device_type scsi ... 13. Continue with the Solaris OS, Sun Cluster software, and volume management software installation tasks. For software installation procedures, refer to Sun Cluster 3.1 Installation Guide.
B-10
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
nvrun
B-11
The nvramrc Editor and nvedit Keystroke Commands Table B-2 lists more useful nvedit commands.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Table B-2 The nvedit Keystroke Commands Keystroke ^A ^B ^C ^F ^K ^L ^N ^O ^P ^R Delete Return Description Moves to the beginning of the line Moves backward one character Exits the script editor Moves forward one character Deletes until end of line Lists all lines Moves to the next line of the nvramrc editing buffer Inserts a new line at the cursor position and stays on the current line Moves to the previous line of the nvramrc editing buffer Replaces the current line Deletes previous character Inserts a new line at the cursor position and advances to the next line
B-12
Appendix C
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Quorum Server
This appendix describes basic operation of the quorum server software.
C-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
The actual quorum server daemon, /usr/cluster/lib/sc/scqsd The utility to start, stop, query, and clear the quorum server, /usr/cluster/bin/clquorumserver (short name clqs) A single conguration le, /etc/scqsd/scqsd.conf An rc script (Solaris 9) or SMF service (Solaris 10) to automatically start the quorum server daemon at boot time.
q q
Note Regardless of the OS that the quorum server is running on, you always use the clquorumserver (clqs) command if you need to start and stop the daemon manually. This includes starting the daemon the rst time after installation, without having to reboot. See page 4.
q
An entry in /etc/services for the default port: scqsd 9000/tcp # Sun Cluster quorum server daemon
C-2
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Quorum Server
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
C-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Normally, you will not need to run these commands very much, since the boot/shutdown scripts will do the job for you.
C-4
Starting and Stopping a Quorum Server Daemon Node ID: Registration key: --1 0x448de82f00000001
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
For each cluster, the command displays the registrants (nodes that are currently not fenced out of the cluster) and the reservation holder. The reservation holder would be equivalent to the last node to ever perform any quorum fencing in the cluster; it is really the registrants that are more interesting.
Quorum Server
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
C-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Note Be careful not to clear quorum server information that is actually still in use. It is exactly equivalent to physically yanking out a JBOD disk device, which is a quorum disk device. In other words, the cluster itself using the quorum server as a quorum device will not nd out until it is too late!
C-6
Appendix D
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
D-1
Copyright 2006 Sun Microsystems, Inc. All Rights Reserved. Sun Services, Revision A
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Role
A role is like a user, except for the following:
q
You can not log in to it directly. You must log in as a user and then use the su command to switch to the role. Only certain users are allowed to switch to the role.
A role is meant to replace an old-style privileged user, where the intention is that multiple administrators assume this identity when they need to do privileged tasks. The advantage is that you must log in as a regular user and also know the password for the role and be allowed to assume the role to assume the identity. Roles have entries in the passwd and shadow databases just like regular users. Most often, roles are assigned one of the pf-shells so they can run privileged commands, but this is not a requirement.
Authorization
Authorizations are meant to convey certain rights to users or roles, but authorizations are interpreted only by RBAC-aware applications. In other words, having a particular authorization means nothing unless the software you are running is specically instrumented to check for your authorization and grant you a particular right to do a task. Authorizations have names like solaris.cluster.modify and solaris.cluster.read. The names form a natural hierarchy, so that assigning solaris.cluster.* has the meaning that you expect.
D-2
Assigning Authorizations
Authorizations can be assigned to users or roles, or they can be part of rights proles, which are collections of authorizations and command privileges. The rights proles could then be assigned to users or roles, rather than the individual authorizations.
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Command Privileges
Command privliges have been enhanced in the Solaris 10 OS:
Profiles
Proles are collections of authorizations and command privileges. Authorizations can be assigned directly to a user or role or through inclusion in a prole assigned to the user or role. Command privileges can be assigned only to a user or role through its inclusion in a prole.
D-3
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
Remove the offending authorizations from the Basic Solaris User prole. Create a custom prole that includes only the offending authorizations, and assign that to the users to whom you want to give those authorizations.
RBAC Relationships
Figure 11-2 shows the relationships between command privileges authorizations, proles, users, and roles.
users su roles
profiles
auths
command privs
D-4
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
solaris.cluster.system.{read, admin, modify} This, just for ease of notation, denotes three separate authorizations: solaris.cluster.system.read solaris.cluster.system.admin solaris.cluster.system.modify
q q q q q q q q q
solaris.cluster.node.{read,admin,modify} solaris.cluster.quorum.{read.admin.modify} solaris.cluster.device.{read,admin,modify} solaris.cluster.transport.{read,admin, modify} solaris.cluster.resource.{read,admin,modify} solaris.cluster.network.{read,admin,modify} solaris.cluster.install solaris.cluster.appinstall solaris.cluster.gui
D-5
Do not duplicate or distribute without permission from Sun Microsystems, Inc. Use subject to license.
D-6