Académique Documents
Professionnel Documents
Culture Documents
ibm.com/redbooks
International Technical Support Organization SAN Volume Controller Best Practices and Performance Guidelines December 2008
SG24-7521-01
Note: Before using this information and the product it supports, read the information in Notices on page xi.
Second Edition (December 2008) This edition applies to Version 4, Release 3, Modification 0 of the IBM System Storage SAN Volume Controller.
Copyright International Business Machines Corporation 2008. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.
Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii The team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi Summary of changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii December 2008, Second Edition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Chapter 1. SAN fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 SVC SAN topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Topology basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.3 ISL oversubscription . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.4 Single switch SVC SANs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.5 Basic core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1.6 Four-SAN core-edge topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.1.7 Common topology issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2 SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.1 Selecting SAN switch models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2.2 Switch port layout for large edge SAN switches . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.3 Switch port layout for director-class SAN switches . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.4 IBM System Storage/Brocade b-type SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.5 IBM System Storage/Cisco SANs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.6 SAN routing and duplicate WWNNs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Zoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.1 Types of zoning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.3.2 Pre-zoning tips and shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.3 SVC intra-cluster zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.4 SVC storage zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.5 SVC host zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3.6 Sample standard SVC zoning configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.7 Zoning with multiple SVC clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.3.8 Split storage subsystem configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4 Switch Domain IDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5 Distance extension for mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.1 Optical multiplexors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.2 Long-distance SFPs/XFPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5.3 Fibre Channel: IP conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.6 Tape and disk traffic sharing the SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.7 Switch interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.8 TotalStorage Productivity Center for Fabric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 2. SAN Volume Controller cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Advantages of virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 How does the SVC fit into your environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Scalability of SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copyright IBM Corp. 2008. All rights reserved.
23 24 24 24 iii
2.2.1 Advantage of multi-cluster as opposed to single cluster . . . . . . . . . . . . . . . . . . . . 2.2.2 Performance expectations by adding an SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Growing or splitting SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 SVC performance scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Cluster upgrade. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 3. SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 SVC Console installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Software only installation option . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.2 Combined software and hardware installation option . . . . . . . . . . . . . . . . . . . . . . 3.1.3 SVC cluster software and SVC Console compatibility . . . . . . . . . . . . . . . . . . . . . 3.1.4 IP connectivity considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Using the SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 SSH connection limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Managing multiple SVC clusters using a single SVC Console . . . . . . . . . . . . . . . 3.2.3 Managing an SVC cluster using multiple SVC Consoles . . . . . . . . . . . . . . . . . . . 3.2.4 SSH key management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.5 Administration roles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.6 Audit logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.7 IBM Support remote access to the SVC Console . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.8 SVC Console to SVC cluster connection problems . . . . . . . . . . . . . . . . . . . . . . . 3.2.9 Managing IDs and passwords. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.10 Saving the SVC configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.11 Restoring the SVC cluster configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Controller affinity and preferred path. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 ADT for DS4000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Ensuring path balance prior to MDisk discovery . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Pathing considerations for EMC Symmetrix/DMX and HDS . . . . . . . . . . . . . . . . . . . . . 4.3 LUN ID to MDisk translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 ESS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 DS6000 and DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 MDisk to VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Mapping physical LBAs to VDisk extents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Investigating a medium error using lsvdisklba . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Investigating Space-Efficient VDisk allocation using lsmdisklba. . . . . . . . . . . . . . 4.6 Medium error logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.1 Host-encountered media errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6.2 SVC-encountered medium errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7 Selecting array and cache parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.1 DS4000 array width . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.2 Segment size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.7.3 DS8000 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Considerations for controller configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.1 Balancing workload across DS4000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.2 Balancing workload across DS8000 controllers . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.3 DS8000 ranks to extent pools mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.4 Mixing array sizes within an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8.5 Determining the number of controller ports for ESS/DS8000 . . . . . . . . . . . . . . . . 4.8.6 Determining the number of controller ports for DS4000 . . . . . . . . . . . . . . . . . . . . 4.9 LUN masking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.10 WWPN to physical port translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
25 27 28 32 34 37 38 38 39 39 41 41 42 44 45 46 47 49 50 50 52 53 55 57 58 58 59 59 59 59 60 61 62 62 63 63 63 64 66 66 66 67 67 67 68 71 71 71 72 72 75
iv
4.11 Using TPC to identify storage controller boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12 Using TPC to measure storage controller performance . . . . . . . . . . . . . . . . . . . . . . . 4.12.1 Normal operating ranges for various statistics . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.2 Establish a performance baseline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.3 Performance metric guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.12.4 Storage controller back end . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5. MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Back-end queue depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 MDisk transfer size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Host I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 FlashCopy I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Coalescing writes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Selecting LUN attributes for MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Adding MDisks to existing MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.1 Adding MDisks for capacity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.2 Checking access to new MDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.3 Persistent reserve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5.4 Renaming MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6 Restriping (balancing) extents across an MDG. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 Installing prerequisites and the SVCTools package . . . . . . . . . . . . . . . . . . . . . . . 5.6.2 Running the extent balancing script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7 Removing MDisks from existing MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.1 Migrating extents from the MDisk to be deleted . . . . . . . . . . . . . . . . . . . . . . . . . . 5.7.2 Verifying an MDisks identity before removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.8 Remapping managed MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.9 Controlling extent allocation order for VDisk creation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.10 Moving an MDisk between SVC clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6. Managed disk groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Availability considerations for MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Performance considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Selecting the number of LUNs per array . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Performance comparison of one LUN compared to two LUNs per array . . . . . . 6.3 Selecting the number of arrays per MDG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Striping compared to sequential type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 SVC cache partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 SVC quorum disk considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.7 Selecting storage subsystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7. VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 New features in SVC Version 4.3.0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Real and virtual capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Space allocation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Space-Efficient VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Testing an application with Space-Efficient VDisk . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 What is VDisk mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.6 Creating or adding a mirrored VDisk. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.7 Availability of mirrored VDisks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.8 Mirroring between controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Creating VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Selecting the MDisk Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
75 77 78 78 78 80 83 84 84 85 85 85 85 86 87 87 87 88 88 88 89 89 92 92 93 95 96 98
101 102 103 103 104 105 108 114 115 116 116 119 120 120 120 120 121 121 122 122 122 122 124 v
7.2.2 Changing the preferred node within an I/O Group . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 Moving a VDisk to another I/O Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 VDisk migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Migrating with VDisk mirroring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Migrating across MDGs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Image type to striped type migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Migrating to image type VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.5 Preferred paths to a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.6 Governing of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Cache-disabled VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Underlying controller remote copy with SVC cache-disabled VDisks . . . . . . . . . 7.4.2 Using underlying controller PiT copy with SVC cache-disabled VDisks . . . . . . . 7.4.3 Changing cache mode of VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 VDisk performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 The effect of load on storage controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. Copy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 SAN Volume Controller Advanced Copy Services functions. . . . . . . . . . . . . . . . . . . . 8.1.1 Setting up FlashCopy services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Steps to making a FlashCopy VDisk with application data integrity . . . . . . . . . . 8.1.3 Making multiple related FlashCopy VDisks with data integrity . . . . . . . . . . . . . . 8.1.4 Creating multiple identical copies of a VDisk . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.5 Creating a FlashCopy mapping with the incremental flag. . . . . . . . . . . . . . . . . . 8.1.6 Space-Efficient FlashCopy (SEFC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.7 Using FlashCopy with your backup application. . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.8 Using FlashCopy for data migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.9 Summary of FlashCopy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Metro Mirror and Global Mirror . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Using both Metro Mirror and Global Mirror between two clusters . . . . . . . . . . . . 8.2.2 Performing three-way copy service functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.3 Using native controller Advanced Copy Services functions . . . . . . . . . . . . . . . . 8.2.4 Configuration requirements for long distance links . . . . . . . . . . . . . . . . . . . . . . . 8.2.5 Saving bandwidth creating Metro Mirror and Global Mirror relationships . . . . . . 8.2.6 Global Mirror guidelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.7 Migrating a Metro Mirror relationship to Global Mirror. . . . . . . . . . . . . . . . . . . . . 8.2.8 Recovering from suspended Metro Mirror or Global Mirror relationships . . . . . . 8.2.9 Diagnosing and fixing 1920 errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.10 Using Metro Mirror or Global Mirror with FlashCopy. . . . . . . . . . . . . . . . . . . . . 8.2.11 Using TPC to monitor Global Mirror performance. . . . . . . . . . . . . . . . . . . . . . . 8.2.12 Summary of Metro Mirror and Global Mirror rules. . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Hosts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Configuration recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.1 The number of paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.2 Host ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.3 Port masking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 Host to I/O Group mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.5 VDisk size as opposed to quantity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.6 Host VDisk mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.7 Server adapter layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.8 Availability as opposed to error isolation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Host pathing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
124 125 126 127 127 127 127 129 130 133 134 134 135 138 141 147 151 152 152 153 156 158 158 159 159 160 161 162 162 162 163 164 165 167 169 170 170 172 173 174 175 176 176 177 177 178 178 178 182 183 183
vi
9.2.1 Preferred path algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Path selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Path management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Dynamic reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.5 VDisk migration between I/O Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 I/O queues. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Queue depths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Multipathing software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Host clustering and reserves. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 AIX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 SDD compared to SDDPCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.3 Virtual I/O server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.4 Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.5 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.6 Solaris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.7 VMware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Mirroring considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6.1 Host-based mirroring. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.1 Automated path monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7.2 Load measurement and stress tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Application workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 Transaction-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Throughput-based workloads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.3 Storage subsystem considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.4 Host considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Application considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Transaction environments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.2 Throughput environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Data layout overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Layers of volume abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Storage administrator and AIX LVM administrator roles . . . . . . . . . . . . . . . . . . 10.3.3 General data layout recommendations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Database strip size considerations (throughput workload) . . . . . . . . . . . . . . . . 10.3.5 LVM volume groups and logical volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 When the application does its own balancing of I/Os . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 DB2 I/O characteristics and data structures . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 DB2 data layout example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 SVC striped VDisk recommendation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Data layout with the AIX virtual I/O (VIO) server . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Data layout strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 VDisk size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 Failure boundaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11. Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Configuring TPC to analyze the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Using TPC to verify the fabric topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 SVC node port connectivity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Ensuring that all SVC ports are online . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Verifying SVC port zones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
183 183 184 185 187 188 189 190 191 193 196 197 199 200 201 203 204 204 204 205 206 207 208 208 208 209 209 209 210 210 211 211 212 212 215 215 216 216 217 218 218 219 219 220 220 221 222 223 223 225 226
Contents
vii
11.2.4 Verifying paths to storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.5 Verifying host paths to the SVC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Analyzing performance data using TPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Setting up TPC to collect performance information. . . . . . . . . . . . . . . . . . . . . . 11.3.2 Viewing TPC-collected information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.3 Cluster, I/O Group, and node reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 Managed Disk Group, Managed Disk, and Volume reports . . . . . . . . . . . . . . . 11.3.5 Using TPC to alert on performance constraints . . . . . . . . . . . . . . . . . . . . . . . . 11.3.6 Monitoring MDisk performance for mirrored VDisks . . . . . . . . . . . . . . . . . . . . . 11.4 Monitoring the SVC error log with e-mail notifications. . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Verifying a correct SVC e-mail configuration . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12. Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Configuration and change tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 SAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 SVC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.4 General inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.5 Change tickets and tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.6 Configuration archiving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Standard operating procedures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 Code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Upgrade code levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Upgrade frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 Upgrade sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.4 Preparing for upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.5 SVC upgrade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.6 Host code upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.7 Storage controller upgrades . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4 SAN hardware changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Cross-referencing the SDD adapter number with the WWPN . . . . . . . . . . . . . 12.4.2 Changes that result in the modification of the destination FCID . . . . . . . . . . . . 12.4.3 Switch replacement with a like switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.4 Switch replacement or upgrade with a different kind of switch . . . . . . . . . . . . . 12.4.5 HBA replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Naming convention . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Hosts, zones, and SVC ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Controllers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.3 MDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.4 VDisks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.5 MDGs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 13. Cabling, power, cooling, scripting, support, and classes . . . . . . . . . . . 13.1 Cabling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.1 General cabling advice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Long distance optical links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.3 Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.4 Cable management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.5 Cable routing and support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.6 Cable length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1.7 Cable installation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 Bundled uninterruptible power supply units . . . . . . . . . . . . . . . . . . . . . . . . . . .
228 230 233 234 234 235 240 241 242 243 244 245 246 246 249 249 249 250 250 251 253 253 254 254 254 255 256 256 256 256 257 258 259 259 259 259 260 260 260 260 261 262 262 262 262 263 263 264 264 264 264
viii
13.2.2 Power switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 Power feeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 SVC scripting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 Standard changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.5 IBM Support Notifications Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.6 SVC Support Web site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7 SVC-related publications and classes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.1 IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.7.2 Courses. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 14. Troubleshooting and diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Common problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.1 Host problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.2 SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.3 SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1.4 Storage subsystem problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 Collecting data and isolating the problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.1 Host data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.2 SVC data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.3 SAN data collection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.4 Storage subsystem data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Recovering from problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.1 Solving host problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.2 Solving SVC problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.3 Solving SAN problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3.4 Solving back-end storage problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.4 Livedump. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 15. SVC 4.3 performance highlights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.1 SVC and continual performance enhancements. . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.2 SVC 4.3 code improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.3 Performance increase when upgrading to 8G4 nodes . . . . . . . . . . . . . . . . . . . . . . . 15.3.1 Performance scaling of I/O Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Referenced Web sites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
265 265 265 266 266 266 267 267 267 268 269 270 270 270 272 272 274 274 277 279 281 282 283 284 288 288 292 293 294 296 296 299 301 301 301 302 303 303
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
Contents
ix
Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
xi
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
1350 AIX alphaWorks Chipkill DB2 DS4000 DS6000 DS8000 Enterprise Storage Server FlashCopy GPFS HACMP IBM Redbooks Redbooks (logo) ServeRAID System p System Storage System x System z Tivoli Enterprise Console Tivoli TotalStorage
The following terms are trademarks of other companies: Disk Magic, and the IntelliMagic logo are trademarks of IntelliMagic BV in the United States, other countries, or both. NetApp, and the NetApp logo are trademarks or registered trademarks of NetApp, Inc. in the U.S. and other countries. Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates. QLogic, and the QLogic logo are registered trademarks of QLogic Corporation. SANblade is a registered trademark in the United States. VMware, the VMware "boxes" logo and design are registered trademarks or trademarks of VMware, Inc. in the United States and/or other jurisdictions. Solaris, Sun, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Active Directory, Internet Explorer, Microsoft, Visio, Windows NT, Windows Server, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel Xeon, Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
xii
Preface
This IBM Redbooks publication captures several of the best practices based on field experience and describes the performance gains that can be achieved by implementing the IBM System Storage SAN Volume Controller. This book is intended for extremely experienced storage, SAN, and SVC administrators and technicians. Readers are expected to have an advanced knowledge of the SAN Volume Controller (SVC) and SAN environment, and we recommend these books as background reading: IBM System Storage SAN Volume Controller, SG24-6423 Introduction to Storage Area Networks, SG24-5470 Using the SVC for Business Continuity, SG24-7371
xiii
We extend our thanks to the following people for their contributions to this project. There are many people that contributed to this book. In particular, we thank the development and PFE teams in Hursley, England. Matt Smith was also instrumental in moving any issues along and ensuring that they maintained a high profile. Barry Whyte was instrumental in steering us in the correct direction and for providing support throughout the life of the residency. The authors of the first edition of this book were: Deon George Thorsten Hoss Ronda Hruby Ian MacQuarrie Barry Mellish Peter Mescher We also want to thank the following people for their contributions: Trevor Boardman Carlos Fuente Gary Jarman Colin Jewell Andrew Martin Paul Merrison Steve Randle Bill Scales Matt Smith Barry Whyte IBM Hursley Tom Jahn IBM Germany Peter Mescher IBM Raleigh
xiv
Paulo Neto IBM Portugal Bill Wiegand IBM Advanced Technical Support Mark Balstead IBM Tucson Dan Braden IBM Dallas Lloyd Dean IBM Philadelphia Dorothy Faurot IBM Raleigh Marci Nagel IBM Rochester Bruce McNutt IBM Tucson Glen Routley IBM Australia Dan C Rumney IBM New York Chris Saul IBM San Jose Brian Smith IBM San Jose Sharon Wang IBM Chicago Deanna Polm Sangam Racherla IBM ITSO
Preface
xv
Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html
Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review IBM Redbooks publications form found at: ibm.com/redbooks Send your comments in an e-mail to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400
xvi
Summary of changes
This section describes the technical changes made in this edition of the book and in previous editions. This edition might also include minor corrections and editorial changes that are not identified. Summary of Changes for SG24-7521-01 for SAN Volume Controller Best Practices and Performance Guidelines as created or updated on December 7, 2008.
New information
New material: Space-Efficient VDisks SVC Console VDisk Mirroring
xvii
xviii
Chapter 1.
SAN fabric
The IBM Storage Area Network (SAN) Volume Controller (SVC) has unique SAN fabric configuration requirements that differ from what you might be used to in your storage infrastructure. A quality SAN configuration can help you achieve a stable, reliable, and scalable SVC installation; conversely, a poor SAN environment can make your SVC experience considerably less pleasant. This chapter provides you with information to tackle this topic. Note: As with any of the information in this book, you must check the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156, and IBM System Storage SAN Volume Controller Restrictions, S1003283, for limitations, caveats, updates, and so on that are specific to your environment. Do not rely on this book as the last word in SVC SAN design. Also, anyone planning for an SVC installation must be knowledgeable about general SAN design principles.
You must refer to the IBM System Storage SAN Volume Controller Support Web page for updated documentation before implementing your solution. The Web site is:
http://www.ibm.com/storage/support/2145 Note: All document citations in this book refer to the 4.3 versions of the SVC product documents. If you use a different version, refer to the correct edition of the documents. As you read this chapter, remember that this is a best practices book based on field experiences. Although there will be many possible (and supported) SAN configurations that do not meet the recommendations found in this chapter, we think they are not ideal configurations.
1.1.1 Redundancy
One of the fundamental SVC SAN requirements is to create two (or more) entirely separate SANs that are not connected to each other over Fibre Channel in any way. The easiest way is to construct two SANs that are mirror images of each other. Technically, the SVC supports using just a single SAN (appropriately zoned) to connect the entire SVC. However, we do not recommend this design in any production environment. In our experience, we also do not recommend this design in development environments either, because a stable development platform is important to programmers, and an extended outage in the development environment can cause an expensive business impact. For a dedicated storage test platform, however, it might be acceptable.
storage traffic and inter-node traffic must never transit an ISL, except during migration scenarios.
High-bandwidth-utilization servers (such as tape backup servers) must also be on the same SAN switches as the SVC node ports. Putting them on a separate switch can cause unexpected SAN congestion problems. Putting a high-bandwidth server on an edge switch is a waste of ISL capacity. If at all possible, plan for the maximum size configuration that you ever expect your SVC installation to reach. As you will see in later parts of this chapter, the design of the SAN can change radically for larger numbers of hosts. Modifying the SAN later to accommodate a larger-than-expected number of hosts either produces a poorly-designed SAN or is difficult, expensive, and disruptive to your business, which does not mean that you need to purchase all of the SAN hardware initially, just that you need to lay out the SAN while considering the maximum size. Always deploy at least one extra ISL per switch. Not doing so exposes you to consequences from complete path loss (this is bad) to fabric congestion (this is even worse). The SVC does not permit the number of hops between the SVC cluster and the hosts to exceed three hops, which is typically not a problem.
The RPQ process involves a review of your proposed SAN design to ensure that it is reasonable for your proposed environment.
SVC Node 2 2 2
SVC Node
Core Switch
Core Switch
Edge Switch
Edge Switch
Edge Switch
Edge Switch
Host
Figure 1-1 Core-edge topology
Host
SVC Node
SVC Node
Core Switch
Core Switch
Core Switch
Core Switch
Edge Switch
Edge Switch
Edge Switch
Edge Switch
Host
Figure 1-2 Four-SAN core-edge topology
Host
While certain clients have chosen to simplify management by connecting the SANs together into pairs with a single ISL link, we do not recommend this design. With only a single ISL connecting fabrics together, a small zoning mistake can quickly lead to severe SAN congestion. Using the SVC as a SAN bridge: With the ability to connect an SVC cluster to four SAN fabrics, it is possible to use the SVC as a bridge between two SAN environments (with two fabrics in each environment). This configuration can be useful for sharing resources between the SAN environments without merging them. Another use is if you have devices with different SAN requirements present in your installation. When using the SVC as a SAN bridge, pay special attention to any restrictions and requirements that might apply to your installation.
SVC Node 2 2
SVC Node
Switch
Switch
Switch
Switch
SVC -> Storage Traffic should be zoned to never travel over these links SVC-attach host
Figure 1-3 Spread out disk paths
Non-SVC-attach host
If you have this type of topology, it is extremely important to zone the SVC so that it will only see paths to the storage subsystems on the same SAN switch as the SVC nodes. Implementing a storage subsystem host port mask might also be feasible here. Note: This type of topology means you must have more restrictive zoning than what is detailed in 1.3.6, Sample standard SVC zoning configuration on page 16. Because of the way that the SVC load balances traffic between the SVC nodes and MDisks, the amount of traffic that transits your ISLs will be unpredictable and vary significantly. If you
Chapter 1. SAN fabric
have the capability, you might want to use either Cisco Virtual SANs (VSANs) or Brocade Traffic Isolation to help enforce the separation.
Old Switch
New Switch
Old Switch
New Switch
SVC -> Storage Traffic should be zoned and masked to never travel over these links, but they should be zoned for intraCluster communications Host Host
This design is a valid configuration, but you must take certain precautions: As stated in Accidentally accessing storage over ISLs on page 7, the zone and Logical Unit Number (LUN) mask the SAN and storage subsystems, so that you do not access the storage subsystems over the ISLs. This design means that your storage subsystems will need connections to both the old and new SAN switches. Have two dedicated ISLs between the two switches on each SAN with no data traffic traveling over them. The reason for this design is because if this link ever becomes congested or lost, you might experience problems with your SVC cluster if there are also issues at the same time on the other SAN. If you can, set a 5% traffic threshold alert on the ISLs so that you know if a zoning mistake has allowed any data traffic over the links. Note: It is not a best practice to use this configuration to perform mirroring between I/O Groups within the same cluster. And, you must never split the two nodes in an I/O Group between various SAN switches within the same SAN fabric.
of smaller switches also saves ISL capacity and therefore ports used for inter-switch connectivity. IBM sells and support SAN switches from both of the major SAN vendors listed in the following product portfolios: IBM System Storage b-type/Brocade SAN portfolio IBM System Storage/Cisco SAN portfolio
Fabric Watch
The Fabric Watch feature found in newer IBM/Brocade-based SAN switches can be useful because the SVC relies on a properly functioning SAN. This is a licensed feature, but it comes pre-bundled with most IBM/Brocade SAN switches. With Fabric Watch, you can pre-configure thresholds on certain switch properties, which when triggered, produce an alert. These attributes include: Switch port event, such as link reset Switch port errors (link quality) Component failures Another useful feature included with Fabric Watch is Port Fencing, which can exclude a switch port if the port is misbehaving.
10
IBM/Brocade SAN switches, FCR is an optionally licensed feature. With older generations, special hardware is needed. For more information about the IBM System Storage b-type/Brocade products, refer to the following IBM Redbooks publications: Implementing an IBM/Brocade SAN, SG24-6116 IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation, SG24-7544
Port Channels
To ease the required planning efforts for future SAN expansions, ISLs/Port Channels can be made up of any combination of ports in the switch, which means that it is not necessary to reserve special ports for future expansions when provisioning ISLs. Instead, you can use any free port in the switch for expanding the capacity of an ISL/Port Channel.
Cisco VSANs
VSANs and inter-VSAN routing (IVR) enable port/traffic isolation in the fabric. This port/traffic isolation can be useful for instance fault isolation and scalability. It is possible to use Cisco VSANs, combined with inter-VSAN routes, to isolate the hosts from the storage arrays. This arrangement provides little benefit for a great deal of added configuration complexity. However, VSANs with inter-VSAN routes can be useful for fabric migrations from non-Cisco vendors onto Cisco fabrics, or other short-term situations. VSANs can also be useful if you have hosts that access the storage directly, along with virtualizing part of the storage with the SVC. (In this instance, it is best to use separate storage ports for the SVC and the hosts. We do not advise using inter-VSAN routes to enable port sharing.)
1.3 Zoning
Because it differs from traditional storage devices, properly zoning the SVC into your SAN fabric is a source of misunderstanding and errors. Despite the misunderstandings and errors, zoning the SVC into your SAN fabric is not particularly complicated. Note: Errors caused by improper SVC zoning are often fairly difficult to isolate, so create your zoning configuration carefully.
11
Here are the basic SVC zoning steps: 1. 2. 3. 4. 5. 6. Create SVC intra-cluster zone. Create SVC cluster. Create SVC Back-end storage subsystem zones. Assign back-end storage to the SVC. Create host SVC zones. Create host definitions on the SVC.
The zoning scheme that we describe next is slightly more restrictive than the zoning described in the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156. The Configuration Guide is a statement of what is supported, but this publication is a statement of our understanding of the best way to set up zoning, even if other ways are possible and supported.
12
13
Aliases
We strongly recommend that you use zoning aliases when creating your SVC zones if they are available on your particular type of SAN switch. Zoning aliases make your zoning easier to configure and understand and cause fewer possibilities for errors. One approach is to include multiple members in one alias, because zoning aliases can normally contain multiple members (just like zones). We recommend that you create aliases for: One that holds all the SVC node ports on each fabric One for each storage subsystem (or controller blade, in the case of DS4x00 units) One for each I/O Group port pair (that is, it needs to contain the 1st node in the I/O Group, port 2, and the 2nd node in the I/O Group, port 2) Host aliases can be omitted in smaller environments, as in our lab environment.
14
D
I/O Group 0
SVC Node
SVC Node
Zone Bar_Slot2_SAN_A
Zone Bar_Slot8_SAN_B
Host Foo
Host Bar
The IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, S7002156, discusses putting many hosts into a single zone as a supported configuration under certain circumstances. While this design usually works just fine, instability in one of your hosts can trigger all sorts of impossible to diagnose problems in the other hosts in the zone. For this reason, you need to only have a single host in each zone (single initiator zones). It is a supported configuration to have eight paths to each VDisk, but this design provides no performance benefit (indeed, under certain circumstances, it can even reduce performance), and it does not improve reliability or availability by any significant degree.
15
Switch A
Switch B
Peter
Barry
Jon
Ian
Thorsten
Ronda
Deon
Foo
Aliases
Unfortunately, you cannot nest aliases, so several of these WWPNs appear in multiple aliases. Also, do not be concerned if none of your WWPNs looks like the example; we made a few of them up when writing this book. Note that certain switch vendors (for example, McDATA) do not allow multiple-member aliases, but you can still create single-member aliases. While creating single-member aliases does not reduce the size of your zoning configuration, it still makes it easier to read than a mass of raw WWPNs. For the alias names, we have appended SAN_A on the end where necessary to distinguish that these alias names are the ports on SAN A. This system helps if you ever have to perform troubleshooting on both SAN fabrics at one time.
16
SVC_Group0_Port1: 50:05:07:68:01:10:37:e5 50:05:07:68:01:10:37:dc SVC_Group0_Port3: 50:05:07:68:01:30:37:e5 50:05:07:68:01:30:37:dc SVC_Group1_Port1: 50:05:07:68:01:10:1d:1c 50:05:07:68:01:10:27:e2 SVC_Group1_Port3: 50:05:07:68:01:30:1d:1c 50:05:07:68:01:30:27:e2
17
Because the IBM System Storage DS8000 has no concept of separate controllers (at least, not from the viewpoint of a SAN), we put all the ports on the storage subsystem into a single alias. Refer to Example 1-3.
Example 1-3 Storage aliases
DS4k_23K45_Blade_A_SAN_A 20:04:00:a0:b8:17:44:32 20:04:00:a0:b8:17:44:33 DS4k_23K45_Blade_B_SAN_A 20:05:00:a0:b8:17:44:32 20:05:00:a0:b8:17:44:33 DS8k_34912_SAN_A 50:05:00:63:02:ac:01:47 50:05:00:63:02:bd:01:37 50:05:00:63:02:7f:01:8d 50:05:00:63:02:2a:01:fc
Zones
Remember when naming your zones that they cannot have identical names as aliases. Here is our sample zone set, utilizing the aliases that we have just defined.
SVC_Cluster_Zone_SAN_A: SVC_Cluster_SAN_A
18
WinPeter_Slot3: 21:00:00:e0:8b:05:41:bc SVC_Group0_Port1 WinBarry_Slot7: 21:00:00:e0:8b:05:37:ab SVC_Group0_Port3 WinJon_Slot1: 21:00:00:e0:8b:05:28:f9 SVC_Group1_Port1 WinIan_Slot2: 21:00:00:e0:8b:05:1a:6f SVC_Group1_Port3 AIXRonda_Slot6_fcs1: 10:00:00:00:c9:32:a8:00 SVC_Group0_Port1 AIXThorsten_Slot2_fcs0: 10:00:00:00:c9:32:bf:c7 SVC_Group0_Port3 AIXDeon_Slot9_fcs3: 10:00:00:00:c9:32:c9:6f SVC_Group1_Port1 AIXFoo_Slot1_fcs2: 10:00:00:00:c9:32:a8:67 SVC_Group1_Port3
19
clusters, create two zones with each zone to a separate cluster. The back-end storage zones must also be separate, even if the two clusters share a storage subsystem.
20
21
Fibre Channel standards are not rigorously enforced. Interoperability problems between switch vendors are notoriously difficult and disruptive to isolate, and it can take a long time to obtain a fix. For these reasons, we suggest only running multiple switch vendors in the same SAN long enough to migrate from one vendor to another vendor, if this setup is possible with your hardware. It is acceptable to run a mixed-vendor SAN if you have gained agreement from both switch vendors that they will fully support attachment with each other. In general, Brocade will interoperate with McDATA under special circumstances. Contact your IBM marketing representative for details (McDATA here refers to the switch products sold by the McDATA Corporation prior to their acquisition by Brocade Communications Systems. Much of that product line is still for sale at this time). QLogic/BladeCenter FCSM will work with Cisco. We do not advise interoperating Cisco with Brocade at this time, except during fabric migrations, and only then if you have a back-out plan in place. We also do not advise that you connect the QLogic/BladeCenter FCSM to Brocade or McDATA. When having SAN fabrics with multiple vendors, pay special attention to any particular requirements. For instance, observe from which switch in the fabric the zoning must be performed.
22
Chapter 2.
23
24
which is designed to allow the SVC to connect and operate at up to 4 Gbps SAN fabric speed. Each I/O Group contains 8 GB of mirrored cache memory. Highly available I/O Groups are the basic configuration element of an SVC cluster. Adding I/O Groups to the cluster is designed to linearly increase cluster performance and bandwidth. An entry level SVC configuration contains a single I/O Group. The SVC can scale out to support four I/O Groups, and the SVC can scale up to support 1 024 host servers. For every cluster, the SVC supports up to 8 192 virtual disks (VDisks). This configuration flexibility means that SVC configurations can start small with an attractive price to suit smaller clients or pilot projects and yet can grow to manage extremely large storage environments.
25
After you reach the performance or configuration maximum for an I/O Group, you can add additional performance or capacity by attaching another I/O Group to the SVC cluster. Table 2-1 on page 27 shows the current maximum limits for one SVC I/O Group.
26
Table 2-1 Maximum configurations for an I/O Group Objects SAN Volume Controller nodes I/O Groups VDisks per I/O Group Host IDs per I/O Group Maximum number Eight Four 2048 256 (Cisco, Brocade, or McDATA) 64 QLogic 512 (Cisco, Brocade, or McDATA) 128 QLogic 1024 TB Comments Arranged as four I/O Groups Each containing two nodes Includes managed-mode and image-mode VDisks N/A
N/A
There is a per I/O Group limit of 1024 TB on the quantity of Primary and Secondary VDisk address space, which can participate in Metro/Global Mirror relationships. This maximum configuration will consume all 512 MB of bitmap space for the I/O Group and allow no FlashCopy bitmap space. The default is 40 TB. This limit is a per I/O group limit on the quantity of FlashCopy mappings using bitmap space from a given I/O Group. This maximum configuration will consume all 512 MB of bitmap space for the I/O Group and allow no Metro Mirror or Global Mirror bitmap space. The default is 40 TB.
1024 TB
27
Looking at Figure 2-3, you can see that the response time over throughput can be scaled nearly linearly by adding SVC nodes (I/O Groups) to the cluster.
28
Table 2-2 Maximum SVC cluster limits Objects SAN Volume Controller nodes MDisks Maximum number Eight 4 096 Comments Arranged as four I/O Groups The maximum number of logical units that can be managed by SVC. This number includes disks that have not been configured into Managed Disk Groups. Includes managed-mode VDisks and image-mode VDisks. The maximum requires an 8 node cluster. If maximum extent size of 2048 MB is used A Host ID is a collection of worldwide port names (WWPNs) that represents a host. This Host ID is used to associate SCSI LUNs with VDisks. N/A
8 192
8 PB 1 024 (Cisco, Brocade, and McDATA fabrics) 155 CNT 256 QLogic
2048 (Cisco, Brocade, and McDATA fabrics) 310 CNT 512 QLogic
If you exceed one of the current maximum configuration limits for the fully deployed SVC cluster, you then scale out by adding a new SVC cluster and distributing the workload to it. Because the current maximum configuration limits can change, use the following link to get a complete table of the current SVC restrictions: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283 Splitting an SVC cluster or having a secondary SVC cluster provides you with the ability to implement a disaster recovery option in the environment. Having two SVC clusters in two locations allows work to continue even if one site is down. With the SVC Advanced Copy functions, you can copy data from the local primary environment to a remote secondary site. The maximum configuration limits apply here as well. Another advantage of having two clusters is that the SVC Advanced Copy functions license is based on: The total amount of storage (in gigabytes) that is virtualized The Metro Mirror and Global Mirror or FlashCopy capacity in use In each case, the number of terabytes (TBs) to order for Metro Mirror and Global Mirror is the total number of source TBs and target TBs participating in the copy operations.
29
cluster. It is also necessary to adjust the zoning for each SVC node in the cluster to be able to see the same subsystem storage arrays. After you make the zoning changes, you can add the new nodes into the SVC cluster. You can use the guide for adding nodes to an SVC cluster in IBM System Storage SAN Volume Controller, SG24-6423-06.
30
31
For more details about replacing nodes non-disruptively or expanding an existing SVC cluster, refer to IBM System Storage SAN Volume Controller, SG24-6423-05.
32
During test 5, we disabled all of the ports for node 1 on the switches. Afterward, but still during the test, we enabled the switch ports again. SVC node 1 joined the cluster with a cleared cache, and therefore, you see the spike at the end of the test. In this section, we show you the value of the SVC cluster in our environment. For this purpose, we only compare the direct-attached storage with a striped VDisk (test 1 and test 4). Figure 2-5 on page 34 shows the values for the total traffic: the read MBps and the write MBps. Similar to the I/O rate, we saw a 12% improvement for the I/O traffic.
33
For both parameters, I/Ops and MBps in Figure 2-5, we saw a performance improvement by using the SVC.
34
during the SVC code update. Refer to Chapter 9, Hosts on page 175 for more information about hosts. It is wise to schedule a time for the SVC code update during low I/O activity. Upgrade the Master Console GUI first. Allow the SVC code update to finish before making any other changes in your environment. Allow at least one hour to perform the code update for a single SVC I/O Group and 30 minutes for each additional I/O Group. In a worst case scenario, an update can take up to two hours, which implies that the SVC code update will also update the BIOS, SP, and the SVC service card. Important: The Concurrent Code Upgrade (CCU) might appear to stop for a long time (up to an hour) if it is upgrading a low level BIOS. Never power off during a CCU unless you have been instructed to power off by IBM service personnel. If the upgrade encounters a problem and fails, the upgrade will be backed out. New features are not available until all nodes in the cluster are at the same level. Features, which are dependent on a remote cluster Metro Mirror or Global Mirror, might not be available until the remote cluster is at the same level too.
35
36
Chapter 3.
SVC Console
In this chapter, we describe important areas of the IBM System Storage SAN Volume Controller (SVC) Console. The SVC Console is a Graphical User Interface (GUI) application installed on a server running a server version of the Microsoft Windows operating system.
37
While not a requirement, we recommend that adequate antivirus software is installed on the server together with software for monitoring the server health status. Whenever service packs or critical updates for the operating system become available, we recommend that they are applied. To successfully install and run the SVC Console software, the server must have adequate system performance. We suggest a minimum hardware configuration of: Single Intel Xeon dual-core processor, minimum 2.1Ghz (or equivalent) 4 GB DDR memory 70 GB primary hard disk drive capacity using a disk mirror (for fault tolerance) 100 Mbps Ethernet connection To minimize the risk of conflicting applications, performance problems, and so on, we recommend that the server is not assigned any other roles except for serving as the SVC Console server. We also do not recommend that you set up the server to be a member of any Microsoft Windows Active Directory domain.
38
39
Example 3-1 Checking the SVC cluster software version (lines removed for clarity)
IBM_2145:itsosvccl1:admin>svcinfo lscluster itsosvccl1 cluster_IP_address 9.43.86.117 cluster_service_IP_address 9.43.86.118 code_level 4.3.0.0 (build 8.16.0806230000) BM_2145:itsosvccl1:admin> You can locate the SVC Console version on the Welcome window (Figure 3-1), which displays after you log in to the SVC Console.
After you obtain the software versions, locate the appropriate SVC Console version. For an overview of SAN Volume Controller and SVC Console compatibility, refer to the Web site, which is shown in Figure 3-2. http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1002888
40
The SVC cluster supports both IP Version 4 (IPv4) and 6 (IPv6) connectivity and attaches to the physical network infrastructure using one 10/100 Mbps Ethernet connection per node. All nodes in an SVC cluster share the same two IP addresses (cluster address and service IP address). The cluster IP address is dynamically following the current config node, whereas the service IP only becomes active when a node is put into service mode using the front panel. At this point, the service IP address will become active for the node entering service mode, and it will remain active until service mode is ended. It is imperative that all node Ethernet interfaces can access the IP networks where the SVC Console and other management stations reside, because the IP addresses for an SVC cluster are not statically assigned to any specific node in the SVC cluster. While everything will work with only the current config node having the correct access, access to the SVC cluster might be disrupted if the config node role switches to another node in the SVC cluster. Therefore, in order to allow seamless operations in failover and other state changing situations, observe the following IP/Ethernet recommendations: All nodes in an SVC cluster must be connected to the same layer 2 Ethernet segment. If Virtual LAN (VLAN) technology is implemented, all nodes must be on the same VLAN. If an IP gateway is configured for the SVC cluster, it must not filter traffic based on Ethernet Media Access Control (MAC) addresses. There can be no active packet filters or shapers for traffic to and from the SVC cluster. No static (sticky) Address Resolution Protocol (ARP) caching can be active for the IP gateway connecting to the SVC cluster. When the SVC cluster IP addresses shift from one node to another node, the corresponding ARP entry will need to be updated with the new MAC address information.
41
If you get this error: 1. If you still have access to an SVC Console GUI session for the cluster, you can use the Service and Maintenance menu to start the Run Maintenance Procedures task to fix this error. This option allows you to reset all active connections, which terminates all SSH sessions and clears the login count. 2. If you have no access to the SVC cluster using the SVC Console GUI, there is now a direct maintenance link in the drop-down menu of the View cluster panel of the SVC Console. Using this link, you can get directly to the Service and Maintenance procedures. The following panels guide you to access and use this maintenance feature. Figure 3-4 on page 43 shows you how to launch this procedure. 42
SAN Volume Controller Best Practices and Performance Guidelines
Figure 3-4 Launch Maintenance Procedures from the panel to view the cluster
When analyzing the error code 2500, a window similar to the example in Figure 3-5 on page 44 will appear. From this window, you can identify which user has reached the 10 concurrent connections limit, which in this case is the admin user. Note that the service user has only logged in four times and therefore still has six connections left. From this window, the originating IP address of a given SSH connection is also displayed, which can be useful to determine which user opened the connection. Remember that if the connection originated from a different IP subnet than where the SVC cluster resides, it might be a gateway device IP address that is displayed, which is the case with the IP address of 9.146.185.99 in Figure 3-5 on page 44. If you are unable to close any SSH connections from the originator side, you can force the closure of all SSH connections from the maintenance procedure panel by clicking Close All SSH Connections.
43
You can read more information about the current SSH limitations and how to fix related problems at: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCFKTH&context=STCFKTW&dc =DB500&uid=ssg1S1002896&loc=en_US&cs=utf-8&lang=en
44
Important: All SVC clusters to be managed by a given SVC Console must have the matching public key file installed, because an SVC Console instance can only load a single SSH certificate (the icat.ppk private SSH key) at a time. A single SVC Console can manage a maximum of four SVC clusters. As more testing is done, and more powerful hardware and software become available, this limit might change. For current information, contact your IBM marketing representative or refer to the SVC support site on the Internet: http://www.ibm.com/storage/support/2145 One challenge of using an SVC Console to manage multiple SVC clusters is that if one cluster is currently not operational, for example, the cluster shows No Contact for the SVC cluster state, ease of access to the other clusters is affected by the two minute timeout during the launch of SVC menus when the GUI is checking the status of both clusters. This timeout appears while the SVC Console GUI is trying to access the missing SVC cluster.
45
Redundancy: If one SVC Console is failing, you can use another SVC Console to continue
managing the SVC clusters.
Manageability from multiple locations: If you have two or more physical locations with SVC clusters installed, have an SVC Console in each location to allow you to manage the local clusters even if connectivity to the other sites is lost. It is a best practice to have an SVC Console installed per physical location with an SVC cluster. Managing multiple SVC cluster code level versions: For certain environments, it might be
necessary to have multiple versions of the SVC Console GUI application running, because multiple versions of the SVC cluster code are in use.
46
Figure 3-7 MDisk group actions available to SVC Console user with Monitor role
47
Figure 3-8 MDisk group actions available to SVC Console user with Administrator role
Implementing role-based security at the SVC cluster level implies that different key pairs are used for the SSH communication. When establishing an SSH session with the SVC cluster, available SVC CLI commands will be determined by the role that is associated with the SSH key that established the session. When implementing role-based security at the SVC cluster level, it is important to understand that when using SSH key pairs with no associated password, anyone with access to the correct key can gain administrative rights on the SVC cluster. If a user with restricted rights can access the private key part of an SSH key pair that has administrative rights on the SVC cluster (such as the icat.ppk used by the SVC Console), a user can elevate that users rights. To prevent this situation, it is important that users can only access the SSH keys to which they are entitled. Furthermore, PuTTYgen supports associating a password with generated SSH key pairs at creation time. In conjunction with access control to SSH keys, associating a password with user-specific SSH key pairs is the recommended approach. Note: The SSH key pair used with the SVC Console software cannot have a password associated with it. For more information about role-based security on the SVC and the commands that each user role can use, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06, and IBM System Storage SAN Volume Controller Command-Line Interface Users Guide, S7002157.
48
Important: Any user with access to the file system on the SVC Console server (in general, all users who can interactively log in to the operating system) can retrieve the icat.ppk SSH key and thereby gain administrative access to the SVC cluster. To prevent this general access, we recommend that the SVC Console GUI is accessed through a Web browser from another host. Only allow experienced Microsoft Windows Server professionals to implement additional file level access control in the operating system.
IBM_2145:ITSOCL1:admin>svctask dumpauditlog IBM_2145:ITSOCL1:admin> The audit log file name is generated automatically in the following format: auditlog_<firstseq>_<lastseq>_<timestamp>_<clusterid> where <firstseq> is the audit log sequence number of the first entry in the log <lastseq> is the audit sequence number of the last entry in the log <timestamp> is the time stamp of the last entry in the audit log being dumped <clusterid> is the cluster ID at the time the dump was created Note: The audit log file names cannot be changed. The audit log file that is created can be retrieved using either the SVC Console GUI or by using Secure Copy Protocol (SCP). The audit log provides the following information: The identity of the user who issued the action command The name of the action command The time stamp of when the action command was issued by the configuration node The parameters that were issued with the action command Note: Certain commands are not logged in the audit log dump.
49
This list shows the commands that are not documented in the audit log: svctask dumpconfig svctask cpdumps svctask cleardumps svctask finderr svctask dumperrlog svctask dumpinternallog svcservicetask dumperrlog svcservicetask finderr The audit log also tracks commands that failed. We recommend that audit log data is collected on a regular basis and stored in a safe location. This procedure must take into account any regulations regarding information systems auditing.
50
There are two possible problems that might cause an SVC cluster status of No Contact. The SVC Console code level does not match the SVC cluster code level (for example, SVC Console code V2.1.0.x with SVC cluster code 4.2.0). To fix this problem, you need to install the corresponding SVC Console code that was mentioned in 3.1.3, SVC cluster software and SVC Console compatibility on page 39. The CIMOM cannot execute the plink.exe command (PuTTY component). To test the connection, open a command prompt (cmd.exe) and go to the PuTTY installation directory. Common installation directories are C:\Support Utils\Putty and C:\Program Files\Putty. Execute the following command from this directory: plink.exe admin@clusterIP -ssh -2 -i "c:\Programfiles\IBM\svcconsole\cimom\icat.ppk" This command is shown in Example 3-3.
Example 3-3 Command execution
C:\Program Files\PuTTY>plink.exe admin@9.43.86.117 -ssh -2 -i "c:\Program files\IBM\svcconsole\cimom\icat.ppk" Using username "admin". Last login: Sun Jul 27 11:18:48 2008 from 9.43.86.115 IBM_2145:itsosvccl1:admin> In Example 3-3, we executed the command, and the connection was established. If the command fails, there are a few things to check: The location of the PuTTY executable does not match the SSHCLI path in the setupcmdline.bat used when installing the SVC Console software. The icat.ppk key needs to be in the C:\Program Files\IBM\svcconsole\cimom directory. The icat.ppk file found in the C:\Program Files\IBM\svcconsole\cimom directory needs to match the public key uploaded to the SVC cluster.
51
The CIMOM can execute the plink.exe command, but the SVC cluster does not exist, it is offline, or the network is down. Check if the SVC cluster is up and running (check the front panel of the SVC nodes and use the arrow keys on the node to determine if the Ethernet port on the configuration node is active). Also, check that the IP address of the cluster matches the IP address that you have entered in the SVC Console. Then, check the IP/Ethernet settings on the SVC Console server and issue a ping to the SVC cluster IP address. If the ping command fails, check your IP/Ethernet network. If the SVC cluster still reports No Contact after you have performed all of these actions on the SVC cluster, contact IBM Support.
52
This option allows access to the cluster if the admin password is lost. If the password reset feature was not enabled during the cluster creation, use the svctask setpwdreset -enable CLI command to enable it. Example 3-4 shows how to determine the current status (a zero indicates that the password reset feature is disabled) and afterwards how to enable it (a one indicates that the password reset feature is enabled).
Example 3-4 Enable password reset by using CLI
IBM_2145:itsosvccl1:admin>svctask setpwdreset -show Password status: [0] IBM_2145:itsosvccl1:admin>svctask setpwdreset -enable IBM_2145:itsosvccl1:admin>svctask setpwdreset -show Password status: [1]
53
IBM_2145:itsosvccl1:admin>svcconfig backup ...... CMMVC6130W Inter-cluster partnership fully_configured will not be restored .. CMMVC6112W controller controller0 has a default name . . . CMMVC6112W mdisk mdisk1 has a default name ................ CMMVC6136W No SSH key file svc.config.admin.admin.key CMMVC6136W No SSH key file svc.config.test.admin.key ...................................... CMMVC6155I SVCCONFIG processing completed successfully After the backup file is created, it can be retrieved from the SSH cluster using SSH Secure Copy.
As in the case with the SVC CLI, a new svc.config.backup.xml_Node-1 file will appear in the List Dumps section.
54
55
56
Chapter 4.
Storage controller
In this chapter, we discuss the following topics: Controller affinity and preferred path Pathing considerations for EMC Symmetrix/DMX and HDS Logical unit number (LUN) ID to MDisk translation MDisk to VDisk mapping Mapping physical logical block addresses (LBAs) to extents Media error logging Selecting array and cache parameters Considerations for controller configuration LUN masking Worldwide port name (WWPN) to physical port translation Using TotalStorage Productivity Center (TPC) to identify storage controller boundaries Using TPC to measure storage controller performance
57
58
Refer to Chapter 14, Troubleshooting and diagnostics on page 269 for information regarding checking the back-end paths to storage controllers.
4.3.1 ESS
The ESS uses 14 bits to represent the LUN ID, which ESS storage specialist displays in hexadecimal (that is, it is in the range 0x0000 to 0x3FFF). To convert this 14 bits to the SVC Controller LUN Number: Add 0x4000 to the LUN ID Append 00000000 For example, LUN ID 1723 on an ESS corresponds to SVC controller LUN 572300000000.
59
60
From the MDisk details panel in Figure 4-2, the Controller LUN Number field is 4011400500000000, which translates to LUN ID 0x1105 (represented in Hex). We can also identify the storage controller from the Controller Name as DS8K7598654, which had been manually assigned. Note: The command line interface (CLI) references the Controller LUN Number as ctrl_LUN_#.
61
IBM_2145:itsosvccl1:admin>svcinfo lsvdisklba -mdisk 6 -lba 0x00172001 vdisk_id vdisk_name copy_id type LBA vdisk_start vdisk_end mdisk_start mdisk_end 0 diomede0 0 allocated 0x00102001 0x00100000 0x0010FFFF 0x00170000 0x0017FFFF This output shows: This LBA maps to LBA 0x00102001 of VDisk 0. The LBA is within the extent that runs from 0x00100000 to 0x0010FFFF on the VDisk and from 0x00170000 to 0x0017FFFF on the MDisk (so, the extent size of this Managed Disk Group (MDG) is 32 MB). So, if the host performs I/O to this LBA, the MDisk goes offline.
62
mdisk_end
vdisk_start 0x00000000
vdisk_end 0x0000003F
VDisk 0 is a fully allocated VDisk, so the MDisk LBA information is displayed as in Example 4-2 on page 62. VDisk 14 is a Space-Efficient VDisk to which the host has not yet performed any I/O; all of its extents are unallocated. Therefore, the only information shown by lsmdisklba is that it is unallocated and that this Space-Efficient grain starts at LBA 0x00 and ends at 0x3F (the grain size if 32 KB).
63
LABEL: SC_DISK_ERR2 IDENTIFIER: B6267342 Date/Time: Thu Aug 5 10:49:35 2008 Sequence Number: 4334 Machine Id: 00C91D3B4C00 Node Id: testnode Class: H Type: PERM Resource Name: hdisk34 Resource Class: disk Resource Type: 2145 Location: U7879.001.DQDFLVP-P1-C1-T1-W5005076801401FEF-L4000000000000 VPD: Manufacturer................IBM Machine Type and Model......2145 ROS Level and ID............0000 Device Specific.(Z0)........0000043268101002 Device Specific.(Z1)........0200604 Serial Number...............60050768018100FF78000000000000F6 SENSE DATA 0A00 2800 001C 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
From the sense byte decode: Byte 2 = SCSI Op Code (28 = 10-Byte Read) Bytes 4 - 7 = LBA (Logical Block Address for VDisk) Byte 30 = Key Byte 40 = Code Byte 41 = Qualifier
Error Log Entry 1965 Node Identifier Object Type Object ID Sequence Number Root Sequence Number First Error Timestamp
: Node7 : mdisk : 48 : 7073 : 7073 : Thu Jul 24 17:44:13 2008 : Epoch + 1219599853 Last Error Timestamp : Thu Jul 24 17:44:13 2008 : Epoch + 1219599853 Error Count : 21 Error ID : 10025 : Amedia error has occurred during I/O to a Managed Disk Error Code : 1320 : Disk I/O medium error Status Flag : FIXED Type Flag : TRANSIENT ERROR 11 80 02 03 00 00 00 00 40 00 00 11 00 00 00 00 02 00 02 0B 00 00 00 00 00 40 00 80 00 00 00 0B 00 00 00 6D 00 00 00 00 00 00 00 59 00 00 00 00 00 00 00 58 00 00 00 00 00 00 00 00 00 00 00 04 00 00 01 00 00 00 00 00 00 00 0A 00 00 00 00 00 02 00 00 00 00 00 00 00 28 00 00 08 00 00 00 10 00 00 80 00 00 00 00 00 58 80 00 C0 00 00 00 02 59 00 00 AA 00 00 00 01
40 6D 04 02 00 00 00 00
Where the sense byte decodes as: Byte 12 = SCSI Op Code (28 = 10-Byte Read) Bytes 14 - 17 = LBA (Logical Block Address for MDisk) Bytes 49 - 51 = Key/Code/Qualifier Important: Attempting to locate medium errors on MDisks by scanning VDisks with host applications, such as dd, or using SVC background functions, such as VDisk migrations and FlashCopy, can cause the Managed Disk Group (MDG) to go offline as a result of error handling behavior in current levels of SVC microcode. This behavior will change in future levels of SVC microcode. Check with support prior to attempting to locate medium errors by any of these means.
Notes: Medium errors encountered on VDisks will log error code 1320 Disks I/O Medium Error. If more than 32 medium errors are found while data is being copied from one VDisk to another VDisk, the copy operation will terminate and log error code 1610 Too many medium errors on Managed Disk.
65
66
4.7.3 DS8000
For the DS8000, you cannot tune the array and cache parameters. The arrays will be either 6+p or 7+p, depending on whether the array site contains a spare and whether the segment size (contiguous amount of data that is written to a single disk) is 256 KB for fixed block volumes. Caching for the DS8000 is done on a 64 KB track boundary.
67
spread the disks across multiple controllers, as well as alternating slots, within the enclosures by using the manual method for array creation. Figure 4-3 shows a Storage Manager view of a 2+p array that is configured across enclosures. Here, we can see that each disk of the three disks is represented in a separate physical enclosure and that slot positions alternate from enclosure to enclosure.
68
Example 4-6 shows what this invalid configuration looks like from the CLI output of the lsarray and lsrank commands. The important thing to notice here is that arrays residing on the same DA pair contain the same group number (0 or 1), meaning that they have affinity to the same DS8000 server (server0 is represented by group0 and server1 is represented by group1). As an example of this situation, arrays A0 and A4 can be considered. They are both attached to DA pair 0, and in this example, both arrays are added to an even-numbered extent pool (P0 and P4). Doing so means that both ranks have affinity to server0 (represented by group0), leaving the DA in server1 idle.
Example 4-6 Command output dscli> lsarray -l Date/Time: Aug 8, 2008 8:54:58 AM CEST IBM DSCLI Version:5.2.410.299 DS: IBM.2107-75L2321 Array State Data RAID type arsite Rank DA Pair DDMcap(10^9B) diskclass =================================================================================== A0 Assign Normal 5 (6+P+S) S1 R0 0 146.0 ENT A1 Assign Normal 5 (6+P+S) S9 R1 1 146.0 ENT A2 Assign Normal 5 (6+P+S) S17 R2 2 146.0 ENT A3 Assign Normal 5 (6+P+S) S25 R3 3 146.0 ENT A4 Assign Normal 5 (6+P+S) S2 R4 0 146.0 ENT A5 Assign Normal 5 (6+P+S) S10 R5 1 146.0 ENT A6 Assign Normal 5 (6+P+S) S18 R6 2 146.0 ENT A7 Assign Normal 5 (6+P+S) S26 R7 3 146.0 ENT dscli> lsrank -l Date/Time: Aug 8, 2008 8:52:33 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321 ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779
69
R4 R5 R6 R7
0 1 0 1
A4 A5 A6 A7
5 5 5 5
P4 P5 P6 P7
fb fb fb fb
Figure 4-5 shows an example of a correct configuration that balances the workload across all four DA pairs.
Example 4-7 shows what this correct configuration looks like from the CLI output of the lsrank command. The configuration from the lsarray output remains unchanged. Notice that arrays residing on the same DA pair are now split between groups 0 and 1. Looking at arrays A0 and A4 once again now shows that they have different affinities (A0 to group0, A4 group1). To achieve this correct configuration, what has been changed compared to Example 4-6 on page 69 is that array A4 now belongs to an odd-numbered extent pool (P5).
Example 4-7 Command output dscli> lsrank -l Date/Time: Aug 9, 2008 2:23:18 AM CEST IBM DSCLI Version: 5.2.410.299 DS: IBM.2107-75L2321 ID Group State datastate Array RAIDtype extpoolID extpoolnam stgtype exts usedexts ====================================================================================== R0 0 Normal Normal A0 5 P0 extpool0 fb 779 779 R1 1 Normal Normal A1 5 P1 extpool1 fb 779 779 R2 0 Normal Normal A2 5 P2 extpool2 fb 779 779 R3 1 Normal Normal A3 5 P3 extpool3 fb 779 779 R4 1 Normal Normal A4 5 P5 extpool5 fb 779 779 R5 0 Normal Normal A5 5 P4 extpool4 fb 779 779 R6 1 Normal Normal A6 5 P7 extpool7 fb 779 779 R7 0 Normal Normal A7 5 P6 extpool6 fb 779 779
70
71
Table 4-2 shows the recommended number of ESS/DS8000 ports and adapters based on rank count.
Table 4-2 Recommended number of ports and adapters Ranks 2 - 48 > 48 Ports 8 16 Adapters 4-8 8 - 16
The ESS and DS8000 populate Fibre Channel (FC) adapters across two to eight I/O enclosures, depending on configuration. Each I/O enclosure represents a separate hardware domain. Ensure that adapters configured to different SAN networks do not share the same I/O enclosure as part of our goal of keeping redundant SAN networks isolated from each other. Best practices that we recommend: Configure a minimum of eight ports per DS8000. Configure 16 ports per DS8000 when > 48 ranks are presented to the SVC cluster. Configure a maximum of two ports per four port DS8000 adapter. Configure adapters across redundant SAN networks from different I/O enclosures.
dscli> showvolgrp -dev IBM.2107-75ALNN1 V0 Date/Time: August 15, 2008 10:12:33 AM PDT IBM DSCLI Version: 5.0.4.43 DS: IBM.2107-75ALNN1 Name SVCVG0 ID V0 Type SCSI Mask 72
SAN Volume Controller Best Practices and Performance Guidelines
Vols 1000 1001 1004 1005 Example 4-9 shows lshostconnect output from the DS8000. Here, you can see that all 16 ports of the 4-node cluster are assigned to the same volume group (V0) and, therefore, have been assigned to the same four LUNs.
Example 4-9 The lshostconnect command output
dscli> lshostconnect -dev IBM.2107-75ALNN1 Date/Time: August 14, 2008 11:51:31 AM PDT IBM DSCLI Version: 5.0.4.43 DS: IBM.2107-75ALNN1 Name ID WWPN HostType Profile portgrp volgrpID ESSIOport =============================================================================== svcnode 0000 5005076801302B3E SVC San Volume Controller 0 V0 all svcnode 0001 5005076801302B22 SVC San Volume Controller 0 V0 all svcnode 0002 5005076801202D95 SVC San Volume Controller 0 V0 all svcnode 0003 5005076801402D95 SVC San Volume Controller 0 V0 all svcnode 0004 5005076801202BF1 SVC San Volume Controller 0 V0 all svcnode 0005 5005076801402BF1 SVC San Volume Controller 0 V0 all svcnode 0006 5005076801202B3E SVC San Volume Controller 0 V0 all svcnode 0007 5005076801402B3E SVC San Volume Controller 0 V0 all svcnode 0008 5005076801202B22 SVC San Volume Controller 0 V0 all svcnode 0009 5005076801402B22 SVC San Volume Controller 0 V0 all svcnode 000A 5005076801102D95 SVC San Volume Controller 0 V0 all svcnode 000B 5005076801302D95 SVC San Volume Controller 0 V0 all svcnode 000C 5005076801102BF1 SVC San Volume Controller 0 V0 all svcnode 000D 5005076801302BF1 SVC San Volume Controller 0 V0 all svcnode 000E 5005076801102B3E SVC San Volume Controller 0 V0 all svcnode 000F 5005076801102B22 SVC San Volume Controller 0 V0 all fd11asys 0010 210100E08BA5A4BA VMWare VMWare 0 V1 all fd11asys 0011 210000E08B85A4BA VMWare VMWare 0 V1 all mdms024_fcs0 0012 10000000C946AB14 pSeries IBM pSeries - AIX 0 V2 all mdms024_fcs1 0013 10000000C94A0B97 pSeries IBM pSeries - AIX 0 V2 all parker_fcs0 0014 10000000C93134B3 pSeries IBM pSeries - AIX 0 V3 all parker_fcs1 0015 10000000C93139D9 pSeries IBM pSeries - AIX 0 V3 all Additionally, you can see from the lshostconnect output that only the SVC WWPNs are assigned to V0. Important: Data corruption can occur if LUNs are assigned to both SVC nodes and non-SVC nodes, that is, direct-attached hosts. Next, we show you how the SVC sees these LUNs if the zoning is properly configured. The Managed Disk Link Count represents the total number of MDisks presented to the SVC cluster. Figure 4-6 on page 74 shows the output storage controller general details. To display this panel, we selected Work with Managed Disks Disk Controller Systems View General Details. In this case, we can see that the Managed Disk Link Count is 4, which is correct for our example.
73
Figure 4-7 shows the storage controller port details. To get to this panel, we selected Work with Managed Disks Disk Controller Systems View General Details Ports.
Here, a path represents a connection from a single node to a single LUN. Because we have four nodes and four LUNs in this example configuration, we expect to see a total of 16 paths with all paths evenly distributed across the available storage ports. We have validated that
74
this configuration is correct, because we see eight paths on one WWPN and eight paths on the other WWPN for a total of 16 paths.
WWPN format for ESS = 5005076300XXNNNN XX = adapter location within storage controller NNNN = unique identifier for storage controller Bay R1-B1 R1-B1 R1-B1 R1-B1 R1-B2 R1-B2 R1-B2 R1-B2 Slot H1 H2 H3 H4 H1 H2 H3 H4 XX C4 C3 C2 C1 CC CB CA C9 Bay Slot XX R1-B3 R1-B3 R1-B3 R1-B3 R1-B4 R1-B4 R1-B4 R1-B4 H1 H2 H3 H4 H1 H2 H3 H4 C8 C7 C6 C5 D0 CF CE CD
In Example 4-11, we show the WWPN to physical port translations for the DS8000.
Example 4-11 DS8000
WWPN format for DS8000 = 50050763030XXYNNN XX = adapter location within storage controller Y = port number within 4-port adapter NNN = unique identifier for storage controller IO Bay Slot XX IO Bay Slot XX Port Y B1 S1 S2 S4 S5 00 01 03 04 B5 S1 S2 S4 S5 20 21 23 24 P1 0 P2 4 P3 8 B2 S1 S2 S4 S5 08 09 0B 0C B6 S1 S2 S4 S5 28 29 2B 2C P4 C B3 S1 S2 S4 S5 10 11 13 14 B7 S1 S2 S4 S5 30 31 33 34 B4 S1 S2 S4 S5 18 19 1B 1C B8 S1 S2 S4 S5 38 39 3B 3C
75
interested in performing an application level risk assessment. Information learned from this type of analysis can lead to actions taken to mitigate risks, such as scheduling application downtime, performing VDisk migrations, and initiating FlashCopy. TPC allows the mapping of the virtualization layer to occur quickly, and using TPC eliminates mistakes that can be made by using a manual approach. Figure 4-8 shows how a failing disk on a storage controller can be mapped to the MDisk that is being used by an SVC cluster. To display this panel, click Physical Disk RAID5 Array Logical Volume MDisk.
Figure 4-9 on page 77 completes the end-to-end view by mapping the MDisk through the SVC to the attached host. Click MDisk MDGroup VDisk host disk.
76
77
78
Small block reads (4 KB to 8 KB) must have average response times in the 2 - 15 millisecond range. Small block writes must have response times near 1 millisecond, because these small block writes are all cache hits. High response times with small block writes often indicate nonvolatile storage (NVS) full conditions. With large block reads and writes (32 KB or greater), response times are insignificant as long as throughput objectives are met. Read hit percentage can vary from 0% to near 100%. Anything lower than 50% is considered low; however, many database applications can run under 30%. Cache hit ratios are mostly dependent on application design. Larger cache always helps and allows back-end arrays to be driven at a higher utilization. Storage controller back-end read response times must not exceed 25 milliseconds unless the cache read hit ratio is near 99%. Storage controller back-end write response times can be high due to the RAID 5 and RAID 10 write penalties; however, they must not exceed 60 milliseconds. Array throughput above 700 - 800 IOPS can start impacting front-end performance. Port response times must be less than 2 milliseconds for most I/O; however, they can reach as high as 5 milliseconds with large transfer sizes. Figure 4-10 is a TPC graph showing aggregate throughput for several ESS arrays. In this case, all arrays have throughput lower than 700 IOPS.
79
Array response times depend on many factors, including disk RPM and the array configuration. However, in all cases when the number of IOPS is near, or exceeds 1 000 IOPS, the array is extremely busy. Table 4-3 shows the upper limit for several disk speeds and array widths. Remember that while these I/O rates can be achieved, they imply considerable queuing delays and high response times.
Table 4-3 Maximum IOPS for different DDM speeds DDM speed 10 K 15 K 7.2 k (near-line) Single drive (IOPS) 150 - 175 200 - 225 85 - 110 6+P array (IOPS) 900 - 1050 1200 - 1350 510 - 660 7+P array (IOPS) 1050 - 1225 1400 - 1575 595 - 770
80
These numbers can vary significantly depending on cache hit ratios, block size, and service time. Rule: 1 000 IOPS indicate an extremely busy array and can be impacting front-end response times.
81
82
Chapter 5.
MDisks
In this chapter, we discuss various MDisk attributes, as well as provide an overview of the process of adding and removing MDisks from existing Managed Disk Groups (MDGs). In this chapter, we discuss the following topics: Back-end queue depth MDisk transfer size Selecting logical unit number (LUN) attributes for MDisks Tiered storage Adding MDisks to existing MDGs Restriping (balancing) extents across an MDG Remapping managed MDisks Controlling extent allocation order for VDisk creation
83
84
Sequential writes
The SVC does not employ a caching algorithm for explicit sequential detect, which means coalescing of writes in SVC cache has a random component to it. For example, 4 KB writes to VDisks will translate to a mix of 4 KB, 8 KB, 16 KB, 24 KB, and 32 KB transfers to the MDisks with reducing probability as the transfer size grows. Although larger transfer sizes tend to be more efficient, this varying transfer size has no effect on the controllers ability to detect and coalesce sequential content to achieve full stride writes.
Sequential reads
The SVC uses prefetch logic for staging reads based on statistics maintained on 128 MB regions. If the sequential content is sufficiently high enough within a region, prefetch occurs with 32 KB reads.
Chapter 5. MDisks
85
86
Remember that a single tier of storage can be represented by multiple MDGs. For example, if you have a large pool of tier 3 storage that is provided by many low-cost storage controllers, it is sensible to use a number of MDGs. Using a number of MDGs prevents a single offline VDisk from taking all of the tier 3 storage offline. When multiple storage tiers are defined, you need to take precautions to ensure that storage is provisioned from the appropriate tiers. You can ensure that storage is provisioned from the appropriate tiers through MDG and MDisk naming conventions, along with clearly defined storage requirements for all hosts within the installation. Note: When multiple tiers are configured, it is a best practice to clearly indicate the storage tier in the naming convention used for the MDGs and MDisks.
Chapter 5. MDisks
87
88
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -filtervalue "mdisk_grp_name=itso_ds4500" id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 1 itso_ds45_18gb 18.0GB 0000000000000000 itso_ds4500 600a0b80001744310000011a4888478c00000000000000000000000000000000 1 mdisk1 online managed 1 itso_ds45_18gb 18.0GB 0000000000000001 itso_ds4500 600a0b8000174431000001194888477800000000000000000000000000000000 2 mdisk2 online managed 1 itso_ds45_18gb 18.0GB 0000000000000002 itso_ds4500 600a0b8000174431000001184888475800000000000000000000000000000000 3 mdisk3 online managed 1 itso_ds45_18gb 18.0GB 0000000000000003 itso_ds4500 600a0b8000174431000001174888473e00000000000000000000000000000000
Chapter 5. MDisks
89
4 mdisk4 online managed itso_ds45_18gb 18.0GB 0000000000000004 itso_ds4500 600a0b8000174431000001164888472600000000000000000000000000000000 5 mdisk5 online managed itso_ds45_18gb 18.0GB 0000000000000005 itso_ds4500 600a0b8000174431000001154888470c00000000000000000000000000000000 6 mdisk6 online managed itso_ds45_18gb 18.0GB 0000000000000006 itso_ds4500 600a0b800017443100000114488846ec00000000000000000000000000000000 7 mdisk7 online managed itso_ds45_18gb 18.0GB 0000000000000007 itso_ds4500 600a0b800017443100000113488846c000000000000000000000000000000000 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 64 0 2 64 0 1 64 0 4 64 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent mdisk0
mdisk1
mdisk2
mdisk3
The balance.pl script was then run on the Master Console using the command: C:\SVCTools\examples\balance>perl balance.pl itso_ds45_18gb -k "c:\icat.ppk" -i 9.43.86.117 -r -e In this command: itso_ds45_18gb is the MDG to be rebalanced. -k "c:\icat.ppk" gives the location of the PuTTY private key file, which is authorized for administrator access to the SVC cluster. -i 9.43.86.117 gives the IP address of the cluster.
90
-r requires that the optimal solution is found. If this option is not specified, the extents can still be somewhat unevenly spread at completion, but not specifying -r will often require fewer migration commands and less time. If time is important, it might be preferable to not use -r at first, and then rerun the command with -r if the solution is not good enough. -e specifies that the script will actually run the extent migration commands. Without this option, it will merely print the commands that it might have run. This option can be used to check that the series of steps is logical before committing to migration. In this example, with 4 x 8 GB VDisks, the migration completed within around 15 minutes. You can use the command svcinfo lsmigrate to monitor progress; this command shows a percentage for each extent migration command issued by the script. After the script had completed, we checked that the extents had been correctly rebalanced. Example 5-2 shows that the extents had been correctly rebalanced. In a test run of 40 minutes of I/O (25% random, 70/30 R/W) to the four VDisks, performance for the balanced MDG was around 20% better than for the unbalanced MDG.
Example 5-2 The lsmdiskextent output showing a balanced MDG
IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 31 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 33 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent id number_of_extents copy_id 0 32 0 2 32 0 1 32 0 4 32 0 IBM_2145:itsosvccl1:admin>svcinfo lsmdiskextent
mdisk0
mdisk1
mdisk2
mdisk3
mdisk4
mdisk5
mdisk6
Chapter 5. MDisks
91
92
Specify the -force flag on the svctask rmvdisk command, or check the corresponding checkbox in the GUI. Either action causes the SVC to automatically move all used extents on the MDisk to the remaining MDisks in the MDG. In most environments, where the extents were automatically allocated in the first place, moving all used extents on the MDisk in this manner will be fine. Alternatively, you might want to manually perform the extent migrations. For example, database administrators try to tune performance by arranging high workload VDisks on the outside of physical disks. To preserve this type of an arrangement, the user must migrate all extents off the MDisk before deletion; otherwise, the automatic migration will randomly allocate extents to MDisks (and areas of MDisks). After all extents have been migrated, the VDisk removal can proceed without the -force flag.
Chapter 5. MDisks
93
Figure 5-1 Controller LUN Number and UID fields from the SVC MDisk details panel
Figure 5-2 on page 95 shows an example of the Logical Drive Properties for the DS4000. Note that the DS4000 refers to UID as the Logical Drive ID.
94
Figure 5-2 Logical Drive properties for DS4000, including the LUN UID
Chapter 5. MDisks
95
When logical drive definitions are regenerated, the LUN will appear as a new LUN just as it does when it is created for the first time (the only exception is that the user data will still be present). In this case, restoring the UID on a LUN back to its prior value can only be done with the assistance of DS4000 support. Both the previous UID and the subsystem identifier (SSID) will be required, both of which can be obtained from the controller profile. To view the logical drive properties, click Logical/Physical View LUN Open Properties. Refer to Figure 5-2 on page 95 for an example of the Logical Drive Properties panel for a DS4000 logical drive. This panel shows Logical Drive ID (UID) and SSID.
To change extent allocation so that each extent alternates between even and odd extent pools, the MDisks can be renamed after being discovered and then added to the MDG in their new order. Table 5-2 on page 97 shows how the MDisks have been renamed so that when they are added to the MDG in their new order, the extent allocation will alternate between even and odd extent pools.
96
Table 5-2 MDisks renamed LUN ID 1000 1100 1001 1101 1002 1102 MDisk ID 1 4 2 5 3 6 MDisk name original/new mdisk01/md001 mdisk04/md002 mdisk02/md003 mdisk05/md004 mdisk03/md005 mdisk06/md006 Controller resource DA pair/extent pool DA2/P0 DA0/P9 DA6/P16 DA4/P23 DA7/P30 DA5/P39
There are two options available for VDisk creation. We describe both options along with the differences between the two options: Option A: Explicitly select the candidate MDisks within the MDG that will be used (through the command line interface (CLI) or GUI). Note that when explicitly selecting the MDisk list, the extent allocation will round-robin across MDisks in the order that they are represented on the list starting with the first MDisk on the list: Example A1: Creating a VDisk with MDisks from the explicit candidate list order: md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then begin at md001 and alternate round-robin around the explicit MDisk candidate list. In this case, the VDisk is distributed in the following order: md001, md002, md003, md004, md005, and md006. Example A2: Creating a VDisk with MDisks from the explicit candidate list order: md003, md001, md002, md005, md006, and md004. The VDisk extent allocations then begin at md003 and alternate round-robin around the explicit MDisk candidate list. In this case, the VDisk is distributed in the following order: md003, md001, md002, md005, md006, and md004. Option B: Do not explicitly select the candidate MDisks within an MDG that will be used (through the command line interface (CLI) or GUI). Note that when the MDisk list is not explicitly defined, the extents will be allocated across MDisks in the order that they were added to the MDG, and the MDisk that will receive the first extent will be randomly selected. Example B1: Creating a VDisk with MDisks from the candidate list order (based on this definitive list from the order that the MDisks were added to the MDG: md001, md002, md003, md004, md005, and md006. The VDisk extent allocations then begin at a random MDisk starting point (let us assume md003 is randomly selected) and alternate round-robin around the explicit MDisk candidate list based on the order that they were added to the MDG originally. In this case, the VDisk is allocated in the following order: md003, md004, md005, md006, md001, and md002. Summary: Independent of the order in which a storage subsystems LUNs (volumes) are discovered by SVC, recognize that by renaming MDisks and changing the order that they are added to the MDG will influence how the VDisks extents are allocated. Renaming MDisks into a particular order and then adding them to the MDG in that order will allow the starting MDisk to be randomly selected for each VDisk created and, therefore, is the optimal method for balancing VDisk extent allocation across storage subsystem resources.
Chapter 5. MDisks
97
When MDisks are added to an MDG based on the order in which the MDisks were discovered, the allocation order can be explicitly specified; however, the MDisk used for the first extent will always be the first MDisk specified on the list. When creating VDisks from the GUI: Recognize that you are not required to select the MDisks from the Managed Disk Candidates list and click Add, but rather you have the option to just enter a capacity value into the Type the size of the virtual disks field and select whether you require formatting the VDisk. With this approach, Option B is the applied methodology for how the VDisks extents will be allocated within an MDG. When a set or a subset of MDisks is selected and added (by clicking Add) to the Managed Disks Striped in this Order column, Option A is the applied methodology for how the VDisks extents are explicitly distributed across the selected MDisks. Figure 5-3 shows the MDisk selection panel for creating VDisks.
98
Using storage controller-based copy services. If you use storage controller-based copy services, make sure that the VDisks containing the data are image-mode and cache-disabled. If none of these options are appropriate, follow these steps to move an MDisk to another cluster: 1. Ensure that the MDisk is in image mode rather than striped or sequential. If the MDisk is in image mode, the MDisk contains only the raw client data and not any SVC metadata. If you want to move data from a non-image mode VDisk, first use the svctask migratetoimage command to migrate to a single image-mode MDisk. For a Space-Efficient VDisk (SEV), image mode means that all metadata for the VDisk is present on the same MDisk as the client data, which will not be readable by a host, but it will be able to be imported by another SVC cluster. 2. Remove the image-mode VDisk from the first cluster using the svctask rmvdisk command. Note: You must not use the -force option of the rmvdisk command. If you use the -force option, data in cache will not be written to the disk, which might result in metadata corruption for an SEV.
3. Check by using svcinfo lsvdisk that the VDisk is no longer displayed. You must wait until it is removed to allow cached data to destage to disk. 4. Change the back-end storage LUN mappings to prevent the source SVC cluster from seeing the disk, and then make it available to the target cluster. 5. Perform an svctask detectmdisk command on the target cluster. 6. Import the MDisk to the target cluster. If it is not an SEV, you will use the svctask mkvdisk command with the -image option. If it is an SEV, you will also need to use two other options: -import instructs the SVC to look for SEV metadata on the specified MDisk. -rsize indicates that the disk is Space-Efficient. The value given to -rsize must be at least the amount of space that the source cluster used on the Space-Efficient VDisk. If it is smaller, a 1862 error will be logged. In this case, delete the VDisk and try the mkvdisk command again. 7. The VDisk is now online. If it is not, and the VDisk is Space-Efficient, check the SVC error log for an 1862 error; if an 1862 error is present, it will indicate why the VDisk import failed (for example, metadata corruption). You might then be able to use the svctask repairsevdisk command to correct the problem.
Chapter 5. MDisks
99
100
Chapter 6.
101
102
Best practices for availability: Each storage subsystem must be used with only a single SVC cluster. Each array must be included in only one MDG. Each MDG must only contain MDisks from a single storage subsystem. Each MDG must contain MDisks from no more than approximately 10 storage subsystem arrays. In the following sections, we examine the effects of these best practices on performance.
103
Note: If there is uncertainty about in which storage pool (MDG) to create a VDisk, initially use the pool with the lowest performance and then move the VDisk up to a higher performing pool later if required.
over-driving an array can occur. Additionally, placing these LUNs in multiple MDGs expands failure domains considerably as we discussed in 6.1, Availability considerations for MDGs on page 102. Table 6-1 provides our recommended guidelines for array provisioning on IBM storage subsystems.
Table 6-1 Array provisioning Controller type IBM System Storage DS4000 IBM System Storage DS6000 IBM System Storage DS8000 IBM Enterprise Storage Server (ESS) LUNs per array 1 1 1-2 1-2
6.2.1 Performance comparison of one LUN compared to two LUNs per array
The following example shows a comparison between one LUN per array as opposed to two LUNs using DS8000 arrays. Because any performance benefit to be gained relies on having both LUNs within an array to be evenly loaded, this comparison was performed by placing both LUNs for each array within the same MDG. Testing was performed on two MDGs with eight MDisks per MDG. Table 6-2 shows the MDG layout for Config1 with two LUNs per array and Table 6-3 on page 106 shows the MDG layout for Config2 with a single LUN per array.
Table 6-2 Two LUNs per array DS8000 array Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 LUN1 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2 LUN2 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2
105
Table 6-3 One LUN per array DS8000 array Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 LUN1 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2
We performed testing using a four node SVC cluster with two I/O Groups and eight VDisks per MDG. The following workloads were used in the testing: Ran-R/W-50/50-0%CH Seq-R/W-50/50-25%CH Seq-R/W-50/50-0%CH Ran-R/W-70/30-25%CH Ran-R/W-50/50-25%CH Ran-R/W-70/30-0%CH Seq-R/W-70/30-25%CH Seq-R/W-70/30-0%CH Note: Ran=Random, Seq=Sequential, R/W= Read/Write, and CH=Cache Hit (25%CH means that 25% of all I/Os are read cache hits) We collected the following performance metrics for a single MDG using IBM TotalStorage Productivity Center (TPC). Figure 6-1 on page 107 and Figure 6-2 on page 108 show the I/Os per second (IOPS) and response time comparisons between Config1 (two LUNs per array) and Config2 (one LUN per array).
106
Figure 6-1 IOPS comparison between two LUNs per array and one LUN per array
107
Figure 6-2 Response time comparison between two LUNs per array and one LUN per array
The test shows a small response time advantage to the two LUNs per array configuration and a small IOPS advantage to the one LUN per array configuration for sequential workloads. Overall, the performance differences between these configurations are minimal.
108
arrays per MDG that is appropriate for general cases. Again, when it comes to performance, there can always be exceptions.
Table 6-4 Recommended number of arrays per MDG Controller type DS4000 ESS/DS8000 Arrays per MDG 4 - 24 4 - 12
You can see from this design that if a single array fails, all four MDGs are affected, and all SVC VDisks that are using storage from this DS8000 fail.
109
Table 6-6 shows an alternative to this configuration. Here, the arrays are divided into two LUNs each, and there are half the number of arrays for each MDG as there were in the first configuration. In this design, the failure boundary of an array failure is cut in half, because any single array failure only affects half of the MDGs.
Table 6-6 Configuration two: Each array is contained in two MDGs DS8000 array Array1 Array2 Array3 Array4 Array5 Array6 Array7 Array8 LUN1 MDG1 MDG1 MDG1 MDG1 MDG2 MDG2 MDG2 MDG2 LUN2 MDG3 MDG3 MDG3 MDG3 MDG4 MDG4 MDG4 MDG4
We collected the following performance metrics using TPC to compare these configurations. The first test was performed with all four MDGs evenly loaded. Figure 6-3 on page 111 and Figure 6-4 on page 112 show the IOPS and response time comparisons between Config1 (four LUNs per array) and Config2 (two LUNs per array) for varying workloads.
110
Figure 6-3 IOPS comparison of eight arrays/MDG and four arrays/MDG with all four MDGs active
111
Figure 6-4 Response time comparison between eight and four arrays/MDG with all four MDGs active
This test shows virtually no difference between using eight arrays per MDG compared to using four arrays per MDG, when all MDGs are evenly loaded (with the exception of a small advantage in IOPS for the eight array MDG for sequential workloads). We performed two additional tests to show the potential effect when MDGs are not loaded evenly. We performed the first test using only one of the four MDGs, while the other three MDGs remained idle. This test presents the worst case scenario, because the eight array MDG has the fully dedicated bandwidth of all eight arrays available to it, and therefore, halving the number of arrays has a pronounced effect. This test tends to be an unrealistic scenario, because it is unlikely that all host workload will be directed at a single MDG. Figure 6-5 on page 113 shows the IOPS comparison between these configurations.
112
Figure 6-5 IOPS comparison between eight and four arrays/MDG with a single MDG active
We performed the second test with I/O running to only two of the four MDGs, which is shown in Figure 6-6 on page 114.
113
Figure 6-6 IOPS comparison between eight arrays/MDG and four arrays/MDG with two MDGs active
Figure 6-6 shows the results from the test where only two of the four MDGs are loaded. This test shows no difference between the eight arrays per MDG configuration and the four arrays per MDG configuration for random workload. This test shows a small advantage to the eight arrays per MDG configuration for sequential workloads. Our conclusions are: The performance advantage with striping across a larger number of arrays is not as pronounced as you might expect. You must consider the number of MDisks per array along with the number of arrays per MDG to understand aggregate MDG loading effects. You can achieve availability improvements without compromising performance objectives.
what is possible with striping. This situation is a rare exception given the unlikely requirement to optimize for FlashCopy as opposed to online workload. Note: Electing to use sequential type over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting system performance.
The effect of the SVC cache partitioning is that no single MDG occupies more than its upper limit of cache capacity with write data. Upper limits are the point at which the SVC cache starts to limit incoming I/O rates for VDisks created from the MDG. If a particular MDG reaches the upper limit, it will experience the same result as a global cache resource that is full. That is, the host writes are serviced on a one-out one-in basis - as the cache destages writes to the back-end storage. However, only writes targeted at the full MDG are limited, all I/O destined for other (non-limited) MDGs continues normally. Read I/O requests for the limited MDG also continue normally. However, because the SVC is destaging write data at a rate that is obviously greater than the controller can actually sustain (otherwise, the partition does not reach the upper limit), reads are serviced equally as slowly. The main thing to remember is that the partitioning is only limited on write I/Os. In general, a 70/30 or 50/50 ratio of read to write operations is observed. Of course, there are applications, or workloads, that perform 100% writes; however, write cache hits are much less of a benefit than read cache hits. A write always hits the cache. If modified data already resides in the cache, it is overwritten, which might save a single destage operation. However, read cache hits provide a much more noticeable benefit, saving seek and latency time at the disk layer.
Chapter 6. Managed disk groups
115
In all benchmarking tests performed, even with single active MDGs, good path SVC I/O group throughput remains the same as it was before the introduction of SVC cache partitioning. For in-depth information about SVC cache partitioning, we recommend the following IBM Redpaper publication: IBM SAN Volume Controller 4.2.1 Cache Partitioning, REDP-4426-00
116
A significant consideration when comparing native performance characteristics between storage subsystem types is the amount of scaling that is required to meet the performance objectives. While lower performing subsystems can typically be scaled to meet performance objectives, the additional hardware that is required lowers the availability characteristics of the SVC cluster. Remember that all storage subsystems possess an inherent failure rate, and therefore, the failure rate of an MDG becomes the failure rate of the storage subsystem times the number of units. Of course, there might be other factors that lead you to select one storage subsystem over another storage subsystem, such as utilizing available resources or a requirement for additional features and functions, such as the System z attach capability.
117
118
Chapter 7.
VDisks
In this chapter, we show the new features of SVC Version 4.3.0 and discuss Virtual Disks (VDisks). We describe creating them, managing them, and migrating them across I/O Groups. We then discuss VDisk performance and how you can use TotalStorage Productivity Center (TPC) to analyze performance and to help guide you to possible solutions.
119
120
You need to use the striping policy in order to spread SE VDisks across many MDisks. Important: Do not use SE VDisks where high I/O performance is required. SE VDisks only save capacity if the host server does not write to the whole VDisk. Whether the Space-Efficient VDisk works well is partly dependent on how the filesystem allocated the space: Certain filesystems (for example, NTFS (NT File System)) will write to the whole VDisk before overwriting deleted files, while other filesystems will reuse space in preference to allocating new space. Filesystem problems can be moderated by tools, such as defrag or by managing storage using host Logical Volume Managers (LVMs). The SE VDisk is also dependent on how applications use the filesystem, for example, certain applications only delete log files when the filesystem is nearly full. Note: There is no recommendation for SEV and best performance or practice. As already explained, it depends on what is used in the particular environment. For the absolute best performance, use fully allocated VDisks instead of an SE VDisk.
Chapter 7. VDisks
121
122
to change the VDisk name, use the svctask chvdisk command (refer to Example 7-1). This command changes the name of the VDisk Test_0 to Test_1.
Example 7-1 The svctask chvdisk command
IBM_2145:itsosvccl1:admin>svctask chvdisk -name Test_1 Test_0 Balance the VDisks across the I/O Groups in the cluster to balance the load across the cluster. At the time of VDisk creation, the workload to be put on the VDisk might not be known. In this case, if you are using the GUI, accept the system default of load balancing allocation. Using the command line interface (CLI), you must manually specify the I/O Group. In configurations with large numbers of attached hosts where it is not possible to zone a host to multiple I/O Groups, it might not be possible to choose to which I/O Group to attach the VDisks. The VDisk has to be created in the I/O Group to which its host belongs. For moving a VDisk across I/O Groups, refer to 7.2.3, Moving a VDisk to another I/O Group on page 125. Note: Migrating VDisks across I/O Groups is a disruptive action. Therefore, it is best to specify the correct I/O Group at the time of VDisk creation. By default, the preferred node, which owns a VDisk within an I/O Group, is selected on a load balancing basis. At the time of VDisk creation, the workload to be put on the VDisk might not be known. But it is important to distribute the workload evenly on the SVC nodes within an I/O Group. The preferred node cannot easily be changed. If you need to change the preferred node, refer to 7.2.2, Changing the preferred node within an I/O Group on page 124. The maximum number of VDisks per I/O Group is 2 048. The maximum number of VDisks per cluster is 8 192 (eight node cluster). The smaller the extent size that you select, the finer the granularity of the VDisk of space occupied on the underlying storage controller. A VDisk occupies an integer number of extents, but its length does not need to be an integer multiple of the extent size. The length does need to be an integer multiple of the block size. Any space left over between the last logical block in the VDisk and the end of the last extent in the VDisk is unused. A small extent size is used in order to minimize this unused space. The counter view to this view is that the smaller the extent size, the smaller the total storage volume that the SVC can virtualize. The extent size does not affect performance. For most clients, extent sizes of 128 MB or 256 MB give a reasonable balance between VDisk granularity and cluster capacity. There is no longer a default value set. Extent size is set during the Managed Disk (MDisk) Group creation. Important: VDisks can only be migrated between Managed Disk Groups (MDGs) that have the same extent size, except for mirrored VDisks. The two copies can be in different MDisk Groups with different extent sizes. As mentioned in the first section of this chapter, a VDisk can be created as Space-Efficient or fully allocated, in one of these three modes: striped, sequential, or image and with one or two copies (VDisk mirroring). With extremely few exceptions, you must always configure VDisks using striping mode.
Chapter 7. VDisks
123
Note: Electing to use sequential mode over striping requires a detailed understanding of the data layout and workload characteristics in order to avoid negatively impacting the system performance.
FlashCopy the VDisk to a target VDisk in the same I/O Group with the preferred node that you want, using the auto-delete option. The steps to follow are: a. b. c. d. e. f. Cease I/O to the VDisk. Start FlashCopy. When the FlashCopy completes, unmap the source VDisk from the host. Map the target VDisk to the host. Resume I/O operations. Delete the source VDisk.
124
There is a fourth, non-SVC method of changing the preferred node within an I/O Group if the host operating system or logical volume manager supports disk mirroring. The steps are: 1. Create a VDisk, the same size as the existing one, on the desired preferred node. 2. Mirror the data to this VDisk using host-based logical volume mirroring. 3. Remove the original VDisk from the Logical Volume Manager (LVM).
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0 id 11 name Image_mode0 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 18.0GB type image formatted no mdisk_id 10 mdisk_name mdisk10 FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002A ... Look for the FC_id and RC_id fields. If these fields are not blank, the VDisk is part of a mapping or a relationship. The procedure is: 1. Cease I/O operations to the VDisk. 2. Disconnect the VDisk from the host operating system. For example, in Windows, remove the drive letter. 3. Stop any copy operations.
Chapter 7. VDisks
125
4. Issue the command to move the VDisk (refer to Example 7-3). This command does not work while there is data in the SVC cache that is to be written to the VDisk. After two minutes, the data automatically destages if no other condition forces an earlier destaging. 5. On the host, rediscover the VDisk. For example in Windows, run a rescan, then either mount the VDisk or add a drive letter. Refer to Chapter 9, Hosts on page 175. 6. Resume copy operations as required. 7. Resume I/O operations on the host. After any copy relationships are stopped, you can move the VDisk across I/O Groups with a single command in an SVC: svctask chvdisk -iogrp newiogrpname/id vdiskname/id In this command, newiogrpname/id is the name or ID of the I/O Group to which you move the VDisk and vdiskname/id is the name or ID of the VDisk. Example 7-3 shows the command to move the VDisk named Image_mode0 from its existing I/O Group, io_grp1, to PerfBestPrac.
Example 7-3 Command to move a VDisk to another I/O Group
IBM_2145:itsosvccl1:admin>svctask chvdisk -iogrp PerfBestPrac Image_mode0 Migrating VDisks between I/O Groups can be a potential issue if the old definitions of the VDisks are not removed from the configuration prior to importing the VDisks to the host. Migrating VDisks between I/O Groups is not a dynamic configuration change. It must be done with the hosts shut down. Then, follow the procedure listed in Chapter 9, Hosts on page 175 for the reconfiguration of SVC VDisks to hosts. We recommend that you remove the stale configuration and reboot the host to reconfigure the VDisks that are mapped to a host. For details about how to dynamically reconfigure IBM Subsystem Device Driver (SDD) for the specific host operating system, refer to Multipath Subsystem Device Driver: Users Guide, SC30-4131-01, where this procedure is also described in great depth. Note: Do not move a VDisk to an offline I/O Group under any circumstances. You must ensure that the I/O Group is online before moving the VDisks to avoid any data loss.
This command will not work if there is any data in the SVC cache, which has to be flushed out first. There is a -force flag; however, this flag discards the data in the cache rather than flushing it to the VDisk. If the command fails due to outstanding I/Os, it is better to wait a couple of minutes after which the SVC will automatically flush the data to the VDisk. Note: Using the -force flag can result in data integrity issues.
126
IBM_2145:itsosvccl1:svctask migratevdisk -mdiskgrp itso_ds45_64gb -threads 4 -vdisk image_mode0 This command migrates our VDisk, image_mode0, to the MDG, itso_ds45_64gb, and uses four threads while migrating. Note that instead of using the VDisk name, you can use its ID number.
Chapter 7. VDisks
127
change the preferred node that is used by a VDisk. Refer to 7.2.2, Changing the preferred node within an I/O Group on page 124. The procedure of migrating a VDisk to an image type VDisk is non-disruptive to host I/O. In order to migrate a striped type VDisk to an image type VDisk, you must be able to migrate to an available unmanaged MDisk. The destination MDisk must be greater than or equal to the size of the VDisk. Regardless of the mode in which the VDisk starts, it is reported as managed mode during the migration. Both of the MDisks involved are reported as being in image mode during the migration. If the migration is interrupted by a cluster recovery, the migration will resume after the recovery completes. You must perform these command line steps: 1. To determine the name of the VDisk to be moved, issue the command: svcinfo lsvdisk The output is in the form that is shown in Example 7-5.
Example 7-5 The svcinfo lsvdisk output
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -delim : id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:t ype:FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count 0:diomede0:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018 381BF2800000000000024:0:1 1:diomede1:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018 381BF2800000000000025:0:1 2:diomede2:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018 381BF2800000000000026:0:1 3:vdisk3:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::600507680183 81BF2800000000000009:0:1 4:diomede3:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::60050768018 381BF2800000000000027:0:1 5:vdisk5:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::600507680183 81BF280000000000000B:0:1 6:vdisk6:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::600507680183 81BF280000000000000C:0:1 7:siam1:0:PerfBestPrac:online:4:itso_ds47_siam:70.0GB:striped:::::6005076801838 1BF2800000000000016:0:1 8:vdisk8:0:PerfBestPrac:online:many:many:800.0MB:many:::::60050768018381BF28000 00000000013:0:2 9:vdisk9:0:PerfBestPrac:online:2:itso_smallgrp:1.5GB:striped:::::60050768018381 BF2800000000000014:0:1 10:Diomede_striped:0:PerfBestPrac:online:0:itso_ds45_64gb:64.0GB:striped:::::60 050768018381BF2800000000000028:0:1 11:Image_mode0:0:PerfBestPrac:online:0:itso_ds45_64gb:18.0GB:image:::::60050768 018381BF280000000000002A:0:1 12:Test_1:0:PerfBestPrac:online:0:itso_ds45_64gb:8.0GB:striped:::::600507680183 81BF280000000000002B:0:1
128
2. In order to migrate the VDisk, you need the name of the MDisk to which you will migrate it. Example 7-6 shows the command that you use.
Example 7-6 The svcinfo lsmdisk command output
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -delim : id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_ name:UID 0:mdisk0:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000000:itso_ds4500:60 0a0b80001744310000011a4888478c000000000000000000000000 1:mdisk1:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000001:itso_ds4500:60 0a0b80001744310000011948884778000000000000000000000000 2:mdisk2:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000002:itso_ds4500:60 0a0b80001744310000011848884758000000000000000000000000 3:mdisk3:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000003:itso_ds4500:60 0a0b8000174431000001174888473e000000000000000000000000 4:mdisk4:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000004:itso_ds4500:60 0a0b80001744310000011648884726000000000000000000000000 5:mdisk5:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000005:itso_ds4500:60 0a0b8000174431000001154888470c000000000000000000000000 6:mdisk6:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000006:itso_ds4500:60 0a0b800017443100000114488846ec000000000000000000000000 7:mdisk7:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000007:itso_ds4500:60 0a0b800017443100000113488846c0000000000000000000000000 8:mdisk8:online:unmanaged:::64.0GB:0000000000000018:itso_ds4500:600a0b800017443 10000013a48a32b5400000000000000000000000000000000 9:mdisk9:online:unmanaged:::18.0GB:0000000000000008:itso_ds4500:600a0b800017443 10000011b4888aeca00000000000000000000000000000000 ... From this command, we can see that mdisk8 and mdisk9 are candidates for the image type migration, because they are unmanaged. 3. We now have enough information to enter the command to migrate the VDisk to image type, and you can see the command in Example 7-7.
Example 7-7 The migratetoimage command
IBM_2145:itsosvccl1:admin>svctask migratetoimage -vdisk Test_1 -threads 4 -mdisk mdisk8 -mdiskgrp itso_ds45_64gb 4. If there is no unmanaged MDisk to which to migrate, you can remove an MDisk from an MDisk Group. However, you can only remove an MDisk from an MDisk Group if there are enough free extents on the remaining MDisks in the group to migrate any used extents on the MDisk that you are removing.
129
the time of VDisk creation either manually by the user or automatically by the SVC. Because read miss performance is better when the host issues a read request to the owning node, you want the host to know which node owns a track. The SCSI command set provides a mechanism for determining a preferred path to a specific VDisk. Because a track is just part of a VDisk, the cache component distributes ownership by VDisk. The preferred paths are then all the paths through the owning node. Therefore, a preferred path is any port on a preferred controller, assuming that the SAN zoning is correct. Note: The performance can be better if the access is made on the preferred node. The data can still be accessed by the partner node in the I/O Group in the event of a failure.
By default, the SVC assigns ownership of even-numbered VDisks to one node of a caching pair and the ownership of odd-numbered VDisks to the other node. It is possible for the ownership distribution in a caching pair to become unbalanced if VDisk sizes are significantly different between the nodes or if the VDisk numbers assigned to the caching pair are predominantly even or odd. To provide flexibility in making plans to avoid this problem, the ownership for a specific VDisk can be explicitly assigned to a specific node when the VDisk is created. A node that is explicitly assigned as an owner of a VDisk is known as the preferred node. Because it is expected that hosts will access VDisks through the preferred nodes, those nodes can become overloaded. When a node becomes overloaded, VDisks can be moved to other I/O Groups, because the ownership of a VDisk cannot be changed after the VDisk is created. We described this situation in 7.2.3, Moving a VDisk to another I/O Group on page 125. SDD is aware of the preferred paths that SVC sets per VDisk. SDD uses a load balancing and optimizing algorithm when failing over paths; that is, it tries the next known preferred path. If this effort fails and all preferred paths have been tried, it load balances on the non-preferred paths until it finds an available path. If all paths are unavailable, the VDisk goes offline. It can take time, therefore, to perform path failover when multiple paths go offline. SDD also performs load balancing across the preferred paths where appropriate.
130
Before running the chvdisk command, run the svcinfo lsvdisk command against the VDisk that you want to throttle in order to check its parameters as shown in Example 7-8.
Example 7-8 The svcinfo lsvdisk command output
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0 id 11 name Image_mode0 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 18.0GB type image formatted no mdisk_id 10 mdisk_name mdisk10 FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002A throttling 0 preferred_node_id 1 fast_write_state empty cache readwrite udid 0 fc_map_count 0 sync_rate 50 copy_count 1 ... The throttle setting of zero indicates that no throttling has been set. Having checked the VDisk, you can then run the svctask chvdisk command. The complete syntax of the command is: svctask chvdisk [-iogrp iogrp_name|iogrp_id] [-rate throttle_rate [-unitmb]] [-name new_name_arg] [-force] vdisk_name|vdisk_id To just modify the throttle setting, we run: svctask chvdisk -rate 40 -unitmb Image_mode0 Running the lsvdisk command now gives us the output that is shown in Example 7-9.
Example 7-9 Output of lsvdisk command
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0 id 11 name Image_mode0 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 18.0GB type image
Chapter 7. VDisks
131
formatted no mdisk_id 10 mdisk_name mdisk10 FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002A virtual_disk_throttling (MB) 40 preferred_node_id 1 fast_write_state empty cache readwrite udid 0 fc_map_count 0 sync_rate 50 copy_count 1 copy_id 0 status online sync yes primary yes mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb type image mdisk_id 10 mdisk_name mdisk10 fast_write_state empty used_capacity 18.00GB real_capacity 18.00GB free_capacity 0.00MB overallocation 100 autoexpand warning grainsize This example shows that the throttle setting (virtual_disk_throttling) is 40 MBps on this VDisk. If we had set the throttle setting to an I/O rate by using the I/O parameter, which is the default setting, we do not use the -unitmb flag: svctask chvdisk -rate 2048 Image_mode0 You can see in Example 7-10 that the throttle setting has no unit parameter, which means that it is an I/O rate setting.
Example 7-10 The svctask chvdisk command and svcinfo lsvdisk output
IBM_2145:itsosvccl1:admin>svctask chvdisk -unitmb -rate 2048 Image_mode0 IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode0 id 11 name Image_mode0 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 18.0GB type image
132
formatted no mdisk_id 10 mdisk_name mdisk10 FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002A throttling 2048 preferred_node_id 1 fast_write_state empty cache readwrite udid 0 fc_map_count 0 sync_rate 50 copy_count 1 Note: An I/O governing rate of 0 (displayed as virtual_disk_throttling in the CLI output of the svcinfo lsvdisk command) does not mean that zero IOPS (or MBs per second) can be achieved. It means that no throttle is set.
Chapter 7. VDisks
133
7.4.2 Using underlying controller PiT copy with SVC cache-disabled VDisks
Where point-in-time (PiT) copy is used in the underlying storage controller, the controller LUNs for both the source and the target must be mapped through the SVC as image mode disks with the SVC cache disabled as shown in Figure 7-2 on page 135. Note that, of course, it is possible to access either the source or the target of the FlashCopy from a host directly rather than through the SVC.
134
IBM_2145:itsosvccl1:admin>svctask migratetoimage -vdisk Test_1 -threads 4 -mdisk mdisk8 -mdiskgrp itso_ds45_64gb 2. Stop I/O to the VDisk. 3. Unmap the VDisk from the host. 4. Run the svcinfo lsmdisk command to check your unmanaged MDisks. 5. Remove the VDisk, which makes the MDisk on which it is created become unmanaged. Refer to Example 7-12.
Example 7-12 Removing the VDisk Test_1
IBM_2145:itsosvccl1:admin>svctask rmvdisk Test_1 6. Make an image mode VDisk on the unmanaged MDisk that was just released from the SVC. Check the MDisks by running the svcinfo lsmdisk command first. Refer to Example 7-13 on page 136.
Chapter 7. VDisks
135
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk -delim : id:name:status:mode:mdisk_grp_id:mdisk_grp_name:capacity:ctrl_LUN_#:controller_ name:UID 0:mdisk0:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000000:itso_ds4500:60 0a0b80001744310000011a4888478c000000000000000000000000 1:mdisk1:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000001:itso_ds4500:60 0a0b80001744310000011948884778000000000000000000000000 2:mdisk2:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000002:itso_ds4500:60 0a0b80001744310000011848884758000000000000000000000000 3:mdisk3:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000003:itso_ds4500:60 0a0b8000174431000001174888473e000000000000000000000000 4:mdisk4:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000004:itso_ds4500:60 0a0b80001744310000011648884726000000000000000000000000 5:mdisk5:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000005:itso_ds4500:60 0a0b8000174431000001154888470c000000000000000000000000 6:mdisk6:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000006:itso_ds4500:60 0a0b800017443100000114488846ec000000000000000000000000 7:mdisk7:online:managed:1:itso_ds45_18gb:18.0GB:0000000000000007:itso_ds4500:60 0a0b800017443100000113488846c0000000000000000000000000 8:mdisk8:online:unmanaged:::64.0GB:0000000000000018:itso_ds4500:600a0b800017443 10000013a48a32b5400000000000000000000000000000000 9:mdisk9:online:unmanaged:::18.0GB:0000000000000008:itso_ds4500:600a0b800017443 10000011b4888aeca00000000000000000000000000000000 ... IBM_2145:itsosvccl1:admin>svctask mkvdisk -mdiskgrp itso_ds45_64gb -size 5 -unit gb -iogrp PerfBestPrac -name Image_mode1 -cache none Virtual Disk, id [13], successfully created IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode1 id 13 name Image_mode1 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002D throttling 0 preferred_node_id 1 fast_write_state empty cache none udid fc_map_count 0 sync_rate 50 copy_count 1 136
SAN Volume Controller Best Practices and Performance Guidelines
copy_id 0 status online sync yes primary yes mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb type striped mdisk_id mdisk_name fast_write_state empty used_capacity 5.00GB real_capacity 5.00GB free_capacity 0.00MB overallocation 100 autoexpand warning grainsize 7. If you want to create the VDisk with read/write cache, omit the -cache parameter, because cache-enabled is the default setting. Refer to Example 7-14.
Example 7-14 Removing VDisk and recreating with cache enabled
IBM_2145:itsosvccl1:admin>svctask rmvdisk Image_mode1 IBM_2145:itsosvccl1:admin>svctask mkvdisk -mdiskgrp itso_ds45_64gb -size 5 -unit gb -iogrp PerfBestPrac -name Image_mode1 Virtual Disk, id [13], successfully created IBM_2145:itsosvccl1:admin>svcinfo lsvdisk Image_mode1 id 13 name Image_mode1 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 5.0GB type striped formatted no mdisk_id mdisk_name FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002D throttling 0 preferred_node_id 1 fast_write_state empty cache readwrite ... 8. You can then map the VDisk to the host and continue I/O operations after rescanning the host. Refer to Example 7-15 on page 138.
Chapter 7. VDisks
137
IBM_2145:itsosvccl1:admin>svctask mkvdiskhostmap -host Diomede_Win2k8 Image_mode1 Virtual Disk to Host map, id [5], successfully created Note: Before removing the VDisk host mapping, it is essential that you follow the procedures in Chapter 9, Hosts on page 175 so that you can remount the disk with its access to data preserved.
The MDisk I/O limit depends on many factors. The primary factor is the number of disks in the RAID array on which the MDisk is built and the speed or revolutions per minute (RPM) of the disks. But when the number of IOPS to an MDisk is near or above 1 000, the MDisk is considered extremely busy. For 15 K RPM disks, the limit is a bit higher. But these high I/O rates to the back-end storage systems are not consistent with good performance; they imply that the back-end RAID arrays are operating at extremely high utilizations, which is indicative of considerable queuing delays. Good planning demands a solution that reduces the load on such busy RAID arrays. For more precision, we will consider the upper limit of performance for 10 K and 15 K RPM, enterprise class devices. Be aware that different people have different opinions about these limits, but all the numbers in Table 7-1 represent extremely busy disk drive modules (DDMs).
Table 7-1 DDM speeds DDM speed Maximum operations/second 150 - 175 200 - 225 6+P operations/second 900 - 1050 1200 - 1350 7+P operations/second 1050 - 1225 1400 - 1575
10 K 15 K
While disks might achieve these throughputs, these ranges imply a lot of queuing delay and high response times. These ranges probably represent acceptable performance only for batch-oriented applications, where throughput is the paramount performance metric. For online transaction processing (OLTP) applications, these throughputs might already have unacceptably high response times. Because 15 K RPM DDMs are most commonly used in OLTP environments (where response time is at a premium), a simple rule is if the MDisk does
more than 1 000 operations per second, it is extremely busy, no matter what the drives RPM is.
In the absence of additional information, we often assume, and our performance models assume, that 10 milliseconds (msec) response time is pretty high. But for a particular application, 10 msec might be too low or too high. Many OLTP environments require response times closer to 5 msec, while batch applications with large sequential transfers might run fine with 20 msec response time. The appropriate value can also change between shifts or on the weekend. A response time of 5 msec might be required from 8 a.m. until 5 p.m., while 50 msec is perfectly acceptable near midnight. It is all client and application dependent. What really matters is the average front-end response time, which is what counts for the users. You can measure the average front-end response time by using TPC for Disk with its performance reporting capabilities. Refer to Chapter 11, Monitoring on page 221 for more information. Figure 7-3 on page 140 shows the overall response time of a VDisk that is under test. Here, we have plotted the overall response time. Additionally, TPC allows us to plot read and write response times as distinct entities if one of these response times was causing problems to the user. This response time in the 1 - 2 msec range gives an acceptable level of performance for OLTP applications.
Chapter 7. VDisks
139
If we look at the I/O rate on this VDisk, we see the chart in Figure 7-4 on page 141, which shows us that the I/O rate to this VDisk was in the region of 2 000 IOPS, which normally is an unacceptably high response time for a LUN that is based on a single RAID array. However, in this case, the VDisk was striped across two MDisks, which gives us an I/O rate per MDisk in the order of 1 200 IOPS. This I/O rate is high and normally gives a high user response time; however, here, the SVC front-end cache mitigates the high latency at the back end, giving the user a good response time. Although there is no immediate issue with this VDisk, if the workload characteristics change and the VDisk becomes less cache friendly, you need to consider adding another MDisk to the MDG, making sure that it comes from another RAID array, and striping the VDisk across all three MDisks.
140
OLTP workloads
Probably the most important parameter as far as VDisks are concerned is the I/O response time for OLTP workloads. After you have established what VDisk response time provides good user performance, you can set TPC alerting to notify you if this number is exceeded by about 25%. Then, you check the I/O rate of the MDisks on which this VDisk is built. If there are multiple MDisks per RAID array, you need to check the RAID array performance. You can perform all of these tasks using TPC. The magic number here is 1 000 IOPS, assuming that the RAID array is 6+P. Refer to Table 7-1 on page 139. If one of the back-end storage arrays is running at more than 1 000 IOPS and the user is experiencing poor performance because of degraded response time, this array is probably the root cause of the problem.
Chapter 7. VDisks
141
If users complain of response time problems, yet the VDisk response as measured by TPC has not changed significantly, this situation indicates that the problem is in the SAN network between the host and the SVC. You can diagnose where the problem is with TPC. The best way to determine the location of the problem is to use the Topology Viewer to look at the host using Datapath Explorer (DPE). This view enables you to see the paths from the host to the SVC, which we show in Figure 7-5.
Figure 7-5 shows the paths from the disk as seen by the server through its host bus adapters (HBAs) to the SVC VDisk. By hovering the cursor over the switch port, you can see the throughput of that port. You can also use TPC to produce reports showing the overall throughput of the ports, which we show in Figure 7-6 on page 143.
142
TPC can present the throughput of the ports graphically over time as shown in Figure 7-7 on page 144.
Chapter 7. VDisks
143
From this type of graph, you can identify performance bottlenecks in the SAN fabric and make the appropriate changes.
Batch workloads
With batch workloads in general, the most important parameter is the throughput rate as measured in megabytes per second. The goal rate is harder to quantify than the OLTP response figure, because throughput is heavily dependent on the block size. Additionally high response times can be acceptable for these workloads. So, it is not possible to give a single metric to quantify performance. It really is a question of it depends. The larger the block size, the greater the potential throughput to the SVC. Block size is often determined by the application. With TPC, you can measure the throughput of a VDisk and the MDisks on which it is built. The important measure for the user is the time that the batch job takes to complete. If this time is too long, the following steps are a good starting point. Determine the data rate that is needed for timely completion and compare it with the storage systems capability as documented in performance white papers and Disk Magic. If the storage system is capable of greater performance: 1. Make sure that the application transfer size is as large as possible. 2. Consider increasing the number of concurrent application streams, threads, files, and partitions. 3. Make sure that the host is capable of supporting the required data rate. For example, use tests, such as dd, and use TPC to monitor the results.
144
4. Check whether the flow of data through the SAN is balanced by using the switch performance monitors within TPC (extremely useful). 5. Check whether all switch and host ports are operating at the maximum permitted data rate of 2 GB per seconds or 4 Gb per seconds. 6. Watch out for cases where the whole batch window stops on a single file or database getting read or written, which can be a practical exposure for obvious reasons. Unfortunately, sometimes there is nothing that can be done. However, it is worthwhile evaluating this situation to see whether, for example, the database can be divided into partitions, or the large file replaced by multiple smaller files. Or, the use of the SVC in combination with SDD might help with a combination of striping and added paths to multiple VDisks. These efforts can allow parallel batch streams to the VDisks and, thus, speed up batch runs. The chart shown in Figure 7-8 gives an indication of what can be achieved with tuning the VDisk and the application. From point A to point B shows the normal steady state running of the application on the VDisk built on a single MDisk. We then migrated the VDisk so that it spanned two MDisks. From point B to point C shows the drop in performance during the migration. When the migration was complete, the line from point D to point E shows that the performance had almost doubled. The application was one with 75% reads and 75% sequential access. The application was then modified so that it was 100% sequential. The resulting gain in performance is shown between point E and point F.
Chapter 7. VDisks
145
Figure 7-9 shows the performance enhancements that can be achieved by modifying the number of parallel streams flowing to the VDisk. The line from point A to point B shows the performance with a single stream application. We then doubled the size of the workload, but we kept it in single stream. As you can see from the line between point C and point D, there is no improvement in performance. We were then able to split the workload into two parallel streams at point E. As you can see from the graph, from point E to point F shows that the throughput to the VDisk has increased by over 60%.
Figure 7-9 Effect of splitting a large job into two parallel streams
Mixed workloads
As discussed in 7.2.1, Selecting the MDisk Group on page 124, we usually recommend mixing workloads, so that the maximum resources are available to any workload when needed. When there is a heavy batch workload and there is no VDisk throttling, we recommend that the VDisks are placed on separate MDGs. This action is illustrated by the chart in Figure 7-10 on page 147. VDisk 21 is running an OLTP workload, and VDisk 20 is running a batch job. Both VDisks were in the same MDG sharing the same MDisks, which were spread over three RAID arrays. As you can see between point A and point B, the response time for the OLTP workload is extremely high, averaging 10 milliseconds. At point in time B, we migrated VDisk 20 to another MDG, using MDisks built on different RAID arrays. As you can see, after the migration had completed, the response time (from point D to point E) dropped for both the batch job and, more importantly, the OLTP workload.
146
Chapter 7. VDisks
147
Then, as you define the VDisks and the FlashCopy mappings, calculate the maximum average I/O that the SVC will receive per VDisk before you start to overload your storage controller. This example assumes: An MDisk is defined from an entire array (that is, the array only provides one LUN and that LUN is given to the SVC as an MDisk). Each MDisk that is assigned to an MDG is the same size and same RAID type and comes from a storage controller of the same type. MDisks from a storage controller are contained entirely in the same MDG. The raw I/O capability of the MDG is the sum of the capabilities of its MDisks. For example, for five RAID 5 MDisks with eight component disks on a typical back-end device, the I/O capability is: 5 x (150 x 7) = 5250 This raw number might be constrained by the I/O processing capability of the back-end storage controller itself. FlashCopy copying contributes to the I/O load of a storage controller, and thus, it must be taken into consideration. The effect of a FlashCopy is effectively adding a number of loaded VDisks to the group, and thus, a weighting factor can be calculated to make allowance for this load. The affect of FlashCopy copies depends on the type of I/O taking place. For example, in a group with two FlashCopy copies and random writes to those VDisks, the weighting factor is 14 x 2 = 28. The total weighting factor for FlashCopy copies is given in Table 7-2.
Table 7-2 FlashCopy weighting Type of I/O to the VDisk None/very little Reads only Sequential reads and writes Random reads and writes Random writes Impact on I/O Insignificant Insignificant Up to 2x I/Os Up to 15x I/O Up to 50x I/O Weight factor for FlashCopy 0 0 2xF 14 x F 49 x F
Thus, to calculate the average I/O per VDisk before overloading the MDG, use this formula: I/O rate = (I/O Capability) / (No vdisks + Weighting Factor) So, using the example MDG as defined previously, if we added 20 VDisks to the MDG and that MDG was able to sustain 5 250 IOPS, and there were two FlashCopy mappings that also have random reads and writes, the maximum I/O per VDisks is: 5250 / (20 + 28) = 110 Note that this is an average I/O rate, so if half of the VDisks sustain 200 I/Os and the other half of the VDisks sustain 10 I/Os, the average is still 110 IOPS.
148
Conclusion
As you can see from the previous examples, TPC is an extremely useful and powerful tool for analyzing and solving performance problems. If you want a single parameter to monitor to gain an overview of your systems performance, it is the read and write response times for both VDisks and MDisks. This parameter shows everything that you need in one view. It is the key day-to-day performance validation metric. It is relatively easy to notice that a system that usually had 2 ms writes and 6 ms reads suddenly has 10 ms writes and 12 ms reads and is getting overloaded. A general monthly check of CPU usage will show you how the system is growing over time and highlight when it is time to add a new I/O Group (or cluster). In addition, there are useful rules for OLTP-type workloads, such as the maximum I/O rates for back-end storage arrays, but for batch workloads, it really is a case of it depends.
Chapter 7. VDisks
149
150
Chapter 8.
Copy services
In this chapter, we discuss the best practices for using the Advanced Copy Services functions, such as FlashCopy services and Metro Mirror and Global Mirror. We also describe guidelines to obtain the best performance.
151
it is pointless if the operating system, or more importantly, the application, cannot use the copied disk.
Data stored to a disk from an application normally goes through these steps: 1. The application records the data using its defined application programming interface. Certain applications might first store their data in application memory before sending it to disk at a later time. Normally, subsequent reads of the block just being written will get the block in memory if it is still there. 2. The application sends the data to a file. The file system accepting the data might buffer it in memory for a period of time. 3. The file system will send the I/O to a disk controller after a defined period of time (or even based on an event). 4. The disk controller might cache its write in memory before sending the data to the physical drive. If the SVC is the disk controller, it will store the write in its internal cache before sending the I/O to the real disk controller. 5. The data is stored on the drive. At any point in time, there might be any number of unwritten blocks of data in any of these steps, waiting to go to the next step. It is also important to realize that sometimes the order of the data blocks created in step 1 might not be the same order that is used when sending the blocks to steps 2, 3, or 4. So it is possible, that at any point in time, data arriving in step 4 might be missing a vital component that has not yet been sent from step 1, 2, or 3. FlashCopy copies are normally created with data that is visible from step 4. So, to maintain application integrity, when a FlashCopy is created, any I/O that is generated in step 1 must make it to step 4 when the FlashCopy is started. In other words, there must not be any outstanding write I/Os in steps 1, 2, or 3. If there were outstanding write I/Os, the copy of the disk that is created at step 4 is likely to be missing those transactions, and if the FlashCopy is to be used, these missing I/Os can make it unusable.
152
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -delim : id:name:IO_group_id:IO_group_name:status:mdisk_grp_id:mdisk_grp_name:capacity:type :FC_id:FC_name:RC_id:RC_name:vdisk_UID:fc_map_count:copy_count 0:diomede0:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000024:0:1 1:diomede1:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000025:0:1 2:diomede2:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000026:0:1 3:vdisk3:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838 1BF2800000000000009:0:1 4:diomede3:0:PerfBestPrac:online:1:itso_ds45_18gb:8.0GB:striped:::::600507680183 81BF2800000000000027:0:1 5:vdisk5:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838 1BF280000000000000B:0:1 6:vdisk6:0:PerfBestPrac:online:2:itso_smallgrp:500.0MB:striped:::::6005076801838 1BF280000000000000C:0:1 7:siam1:0:PerfBestPrac:online:4:itso_ds47_siam:70.0GB:striped:::::60050768018381 BF2800000000000016:0:1 8:vdisk8:0:PerfBestPrac:online:many:many:800.0MB:many:::::60050768018381BF280000 0000000013:0:2 9:vdisk9:0:PerfBestPrac:online:2:itso_smallgrp:1.5GB:striped:::::60050768018381B F2800000000000014:0:1 10:Diomede_striped:0:PerfBestPrac:online:0:itso_ds45_64gb:64.0GB:striped:::::600 50768018381BF2800000000000028:0:1 11:Image_mode0:0:PerfBestPrac:online:0:itso_ds45_64gb:18.0GB:image:::::600507680 18381BF280000000000002A:0:1
153
Figure 8-1 Using the SVC GUI to see the type of VDisks
The VDisk 11, which is used in our example, is an image-mode VDisk. In this example, you need to know its exact size in bytes. In Example 8-2, we use the -bytes parameter of the svcinfo lsvdisk command to find its exact size. Thus, the target VDisk must be created with a size of 19 327 352 832 bytes, not 18 GB. Figure 8-2 on page 155 shows the exact size of an image mode VDisk using the SVC GUI.
Example 8-2 Find the exact size of an image mode VDisk using the command line interface
IBM_2145:itsosvccl1:admin>svcinfo lsvdisk -bytes 11 id 11 name Image_mode0 IO_group_id 0 IO_group_name PerfBestPrac status online mdisk_grp_id 0 mdisk_grp_name itso_ds45_64gb capacity 19327352832 type image formatted no mdisk_id 10 mdisk_name mdisk10 FC_id FC_name RC_id RC_name vdisk_UID 60050768018381BF280000000000002A throttling 0 preferred_node_id 1 fast_write_state empty cache readwrite ...
154
Figure 8-2 Find the exact size of an image mode VDisk using the SVC GUI
3. Create a target VDisk of the required size as identified by the source VDisk in Figure 8-3 on page 163. The target VDisk can be either an image, sequential, or striped mode VDisk; the only requirement is that it must be exactly the same size as the source VDisk. The target VDisk can be cache-enabled or cache-disabled. 4. Define a FlashCopy mapping, making sure that you have the source and target disks defined in the correct order. (If you use your newly created VDisk as a source and the existing hosts VDisk as the target, you will destroy the data on the VDisk if you start the FlashCopy.) 5. As part of the define step, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the data from the source VDisk to the target VDisk. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed since the mapping was started on the source VDisk or the target VDisk (if the target VDisk is mounted, read write to a host). 6. The prepare process for the FlashCopy mapping can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the source VDisks to the storage controllers disks. After the preparation completes, the mapping has a Prepared status and the target VDisk behaves as though it was a cache-disabled VDisk until the FlashCopy mapping is either started or deleted. Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of an active Metro Mirror relationship, you add additional latency to that existing Metro Mirror relationship (and possibly affect the host that is using the source VDisk of that Metro Mirror relationship as a result). The reason for the additional latency is that the FlashCopy prepares and disables the cache on the source VDisk (which is the target VDisk of the Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the completion is returned to the host.
155
7. After the FlashCopy mapping is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process will be different for each application and for each operating system. One guaranteed way to quiesce the host is to stop the application and unmount the VDisk from the host. 8. As soon as the host completes its flushing, you can the start the FlashCopy mapping. The FlashCopy starts extremely quickly (at most, a few seconds). 9. When the FlashCopy mapping has started, you can then unquiesce your application (or mount the volume and start the application), at which point the cache is re-enabled for the source VDisks. The FlashCopy continues to run in the background and ensures that the target VDisk is an exact copy of the source VDisk when the FlashCopy mapping was started. You can perform step 1 on page 153 through step 5 on page 155 while the host that owns the source VDisk performs its typical daily activities (that means no downtime). While step 6 on page 155 is running, which can last several minutes, there might be a delay in I/O throughput, because the cache on the VDisk is temporarily disabled. Step 7 must be performed when the application is down. However, these steps complete quickly and application downtime is minimal. The target FlashCopy VDisk can now be assigned to another host, and it can be used for read or write even though the FlashCopy process has not completed. Note: If you intend to use the target VDisk on the same host as the source VDisk at the same time that the source VDisk is visible to that host, you might need to perform additional preparation steps to enable the host to access VDisks that are identical.
156
Here are the steps to ensure that data integrity is preserved when VDisks are related to each other: 1. Your host is currently writing to the VDisks as part of its daily activities. These VDisks will become the source VDisks in our FlashCopy mappings. 2. Identify the size and type (image, sequential, or striped) of each source VDisk. If any of the source VDisks is an image mode VDisk, you will need to know its size in bytes. If any of the source VDisks are sequential or striped mode VDisks, their size as reported by the SVC Master Console or SVC command line will be sufficient. 3. Create a target VDisk of the required size for each source identified in the previous step. The target VDisk can be either an image, sequential, or striped mode VDisk; the only requirement is that they must be exactly the same size as their source VDisk. The target VDisk can be cache-enabled or cache-disabled. 4. Define a FlashCopy Consistency Group. This Consistency Group will be linked to each FlashCopy mapping that you have defined, so that data integrity is preserved between each VDisk. 5. Define a FlashCopy mapping for each source VDisk, making sure that you have the source disk and the target disk defined in the correct order. (If you use any of your newly created VDisks as a source and the existing hosts VDisk as the target, you will destroy the data on the VDisk if you start the FlashCopy). When defining the mapping, make sure that you link this mapping to the FlashCopy Consistency Group that you defined in the previous step. As part of defining the mapping, you can specify the copy rate from 0 to 100. The copy rate will determine how quickly the SVC will copy the source VDisks to the target VDisks. Setting the copy rate to 0 (NOCOPY), the SVC will only copy blocks that have changed on any VDisk since the Consistency Group was started on the source VDisk or the target VDisk (if the target VDisk is mounted read/write to a host). 6. Prepare the FlashCopy Consistency Group. This preparation process can take several minutes to complete, because it forces the SVC to flush any outstanding write I/Os belonging to the VDisks in the Consistency Group to the storage controllers disks. After the preparation process completes, the Consistency Group has a Prepared status and all source VDisks behave as though they were cache-disabled VDisks until the Consistency Group is either started or deleted. Note: If you create a FlashCopy mapping where the source VDisk is a target VDisk of an active Metro Mirror relationship, this mapping adds additional latency to that existing Metro Mirror relationship (and possibly affects the host that is using the source VDisk of that Metro Mirror relationship as a result). The reason for the additional latency is that the FlashCopy Consistency Group preparation process disables the cache on all source VDisks (which might be target VDisks of a Metro Mirror relationship), and thus, all write I/Os from the Metro Mirror relationship need to commit to the storage controller before the complete status is returned to the host. 7. After the Consistency Group is prepared, you can then quiesce the host by forcing the host and the application to stop I/Os and flush any outstanding write I/Os to disk. This process differs for each application and for each operating system. One guaranteed way to quiesce the host is to stop the application and unmount the VDisks from the host.
157
8. As soon as the host completes its flushing, you can then start the Consistency Group. The FlashCopy start completes extremely quickly (at most, a few seconds). 9. When the Consistency Group has started, you can then unquiesce your application (or mount the VDisks and start the application), at which point the cache is re-enabled. The FlashCopy continues to run in the background and preserves the data that existed on the VDisks when the Consistency Group was started. Step 1 on page 157 through step 6 on page 157 can be performed while the host that owns the source VDisks is performing its typical daily duties (that is, no downtime). While step 6 on page 157 is running, which can take several minutes, there might be a delay in I/O throughput, because the cache on the VDisks is temporarily disabled. You must perform step 7 when the application is down; however, these steps complete quickly so that the application downtime is minimal. The target FlashCopy VDisks can now be assigned to another host and used for read or write even though the FlashCopy processes have not completed. Note: If you intend to use any of the target VDisks on the same host as their source VDisk at the same time that the source VDisk is visible to that host, you might need to perform additional preparation steps to enable the host to access VDisks that are identical.
158
You can use SE VDisks for cascaded FlashCopy and multiple target FlashCopy. It is also possible to mix SE with normal VDisks, and it can be used for incremental FlashCopy too, but using SE VDisks for incremental FlashCopy only makes sense if the source and target are Space-Efficient. The recommendation for SEFC: SEV grain size must be equal to the FlashCopy grain size. SEV grain size must be 64 KB for the best performance and the best space efficiency. The exception is where the SEV target VDisk is going to become a production VDisk (will be subjected to ongoing heavy I/O). In this case, the 256 KB SEV grain size is recommended to provide better long term I/O performance at the expense of a slower initial copy. Note: Even if the 256 KB SEV grain size is chosen, it is still beneficial if you keep the FlashCopy grain size to 64 KB. It is then possible to minimize the performance impact to the source VDisk, even though this size increases the I/O workload on the target VDisk. Clients with extremely large numbers of FlashCopy/Remote Copy relationships might still be forced to choose a 256 KB grain size for FlashCopy due to constraints on the amount of bitmap memory.
159
If you intend to keep the target so that you can use it as part of a quick recovery process, you might choose one of the following options: Create the FlashCopy mapping with NOCOPY initially. If the target is used and migrated into production, you can change the copy rate at the appropriate time to the appropriate rate to have all the data copied to the target disk. When the copy completes, you can delete the FlashCopy mapping and delete the source VDisk, thus, freeing the space. Create the FlashCopy mapping with a low copy rate. Using a low rate might enable the copy to complete without an impact to your storage controller, thus, leaving bandwidth available for production work. If the target is used and migrated into production, you can change the copy rate to a higher value at the appropriate time to ensure that all data is copied to the target disk. After the copy completes, you can delete the source, thus, freeing the space. Create the FlashCopy with a high copy rate. While this copy rate might add additional I/O burden to your storage controller, it ensures that you get a complete copy of the source disk as quickly as possible. By using the target on a different Managed Disk Group (MDG), which, in turn, uses a different array or controller, you reduce your window of risk if the storage providing the source disk becomes unavailable. With Multiple Target FlashCopy, you can now use a combination of these methods. For example, you can use the NOCOPY rate for an hourly snapshot of a VDisk with a daily FlashCopy using a high copy rate.
160
8. Quiesce or shut down the hosts so that they no longer use the old storage. 9. Change the masking on the LUNs on the old storage controller so that the SVC now is the only user of the LUNs. You can change this masking one LUN at a time so that you can discover them (in the next step) one at a time and not mix any LUNs up. 10.Use svctask detectmdisk to discover the LUNs as MDisks. We recommend that you also use svctask chmdisk to rename the LUNs to something more meaningful. 11.Define a VDisk from each LUN and note its exact size (to the number of bytes) by using the svcinfo lsvdisk command. 12.Define a FlashCopy mapping and start the FlashCopy mapping for each VDisk by using the steps in 8.1.2, Steps to making a FlashCopy VDisk with application data integrity on page 153. 13.Assign the target VDisks to the hosts and then restart your hosts. Your host sees the original data with the exception that the storage is now an IBM SVC LUN. With these steps, you have made a copy of the existing storage, and the SVC has not been configured to write to the original storage. Thus, if you encounter any problems with these steps, you can reverse everything that you have done, assign the old storage back to the host, and continue without the SVC. By using FlashCopy in this example, any incoming writes go to the new storage subsystem and any read requests that have not been copied to the new subsystem automatically come from the old subsystem (the FlashCopy source). You can alter the FlashCopy copy rate, as appropriate, to ensure that all the data is copied to the new controller. After the FlashCopy completes, you can delete the FlashCopy mappings and the source VDisks. After all the LUNs have been migrated across to the new storage controller, you can remove the old storage controller from the SVC node zones and then, optionally, remove the old storage controller from the SAN fabric. You can also use this process if you want to migrate to a new storage controller and not keep the SVC after the migration. At step 2 on page 160, make sure that you create LUNs that are the same size as the original LUNs. Then, at step 11, use image mode VDisks. When the FlashCopy mappings complete, you can shut down the hosts and map the storage directly to them, remove the SVC, and continue on the new storage controller.
161
The target VDisk must be the same size as the source VDisk; however, the target VDisk can be a different type (image, striped, or sequential mode) or have different cache settings (cache-enabled or cache-disabled). If you stop a FlashCopy mapping or a Consistency Group before it has completed, you will lose access to the target VDisks. If the target VDisks are mapped to hosts, they will have I/O errors. A VDisk cannot be a source in one FlashCopy mapping and a target in another FlashCopy mapping. A VDisk can be the source for up to 16 targets. A FlashCopy target cannot be used in a Metro Mirror or Global Mirror relationship.
8.2.1 Using both Metro Mirror and Global Mirror between two clusters
A Remote Copy (RC) Mirror relationship is a relationship between two individual VDisks of the same size. The management of the RC Mirror relationships is always performed in the cluster where the source VDisk exists. However, you must consider the performance implications of this configuration, because write data from all mirroring relationships will be transported over the same inter-cluster links. Metro Mirror and Global Mirror respond differently to a heavily loaded, poorly performing link. Metro Mirror will usually maintain the relationships in a consistent synchronized state, meaning that primary host applications will start to see poor performance (as a result of the synchronous mirroring being used). Global Mirror, however, offers a higher level of write performance to primary host applications. With a well-performing link, writes are completed asynchronously. If link performance becomes unacceptable, the link tolerance feature automatically stops Global Mirror relationships to ensure that the performance for application hosts remains within reasonable limits. Therefore, with active Metro Mirror and Global Mirror relationships between the same two clusters, Global Mirror writes might suffer degraded performance if Metro Mirror relationships consume most of the inter-cluster links capability. If this degradation reaches a level where hosts writing to Global Mirror experience extended response times, the Global Mirror relationships can be stopped when the link tolerance threshold is exceeded. If this situation happens, refer to 8.2.9, Diagnosing and fixing 1920 errors on page 170.
162
Important: The SVC only supports copy services between two clusters.
In Figure 8-3, the Primary Site uses SVC copy services (Global Mirror or Metro Mirror) to the secondary site. Thus, in the event of a disaster at the primary site, the storage administrator enables access to the target VDisk (from the secondary site), and the business application continues processing. While the business continues processing at the secondary site, the storage controller copy services replicate to the third site.
163
When defining LUNs in point-in-time copy or a remote mirror relationship, double-check that the SVC does not have visibility to the LUN (mask it so that no SVC node can see it), or if the SVC must see the LUN, ensure that it is an unmanaged MDisk. The storage controller might, as part of its Advanced Copy Services function, take a LUN offline or suspend reads or writes. The SVC does not understand why this happens; therefore, the SVC might log errors when these events occur. If you mask target LUNs to the SVC and rename your MDisks as you discover them and if the Advanced Copy Services function prohibits access to the LUN as part of its processing, the MDisk might be discarded and rediscovered with an SVC-assigned MDisk name.
164
If you use one of these extenders or routers, you need to test the link to ensure that the following requirements are met before you place SVC traffic onto the link: For SVC 4.1.0.x, the round-trip latency between sites must not exceed 68 ms (34 ms oneway) for Fibre Channel (FC) extenders or 20 ms (10 ms one-way) for SAN routers. For SVC 4.1.1.x and later, the round-trip latency between sites must not exceed 80 ms (40 ms one-way). The latency of long distance links is dependent on the technology that is used. Typically, for each 100 km (62.1 miles) of distance, it is assumed that 1 ms is added to the latency, which for Global Mirror means that the remote cluster can be up to 4 000 km (2485 miles) away. When testing your link for latency, it is important that you take into consideration both current and future expected workloads, including any times when the workload might be unusually high. You must evaluate the peak workload by considering the average write workload over a period of one minute or less plus the required synchronization copy bandwidth. SVC uses part of the bandwidth for its internal SVC inter-cluster heartbeat. The amount of traffic depends on how many nodes are in each of the two clusters. Table 8-1 shows the amount of traffic, in megabits per second, generated by different sizes of clusters. These numbers represent the total traffic between the two clusters when no I/O is taking place to mirrored VDisks. Half of the data is sent by one cluster, and half of the data is sent by the other cluster. The traffic will be divided evenly over all available inter-cluster links; therefore, if you have two redundant links, half of this traffic will be sent over each link during fault-free operation.
Table 8-1 SVC inter-cluster heartbeat traffic (megabits per second) Local/remote cluster Two nodes Four nodes Six nodes Eight nodes Two nodes 2.6 4.0 5.4 6.7 Four nodes 4.0 5.5 7.1 8.6 Six nodes 5.4 7.1 8.8 10.5 Eight nodes 6.7 8.6 10.5 12.4
If the link between the sites is configured with redundancy so that it can tolerate single failures, the link must be sized so that the bandwidth and latency statements continue to be accurate even during single failure conditions.
8.2.5 Saving bandwidth creating Metro Mirror and Global Mirror relationships
If you have a situation where you have a large source VDisk (or a large number of source VDisks) that you want to replicate to a remote site and your planning shows that the SVC mirror initial sync time will take too long (or will be too costly if you pay for the traffic that you use), here is a method of setting up the sync using another medium (that might be less expensive). Another reason that you might want to use these steps is if you want to increase the size of the VDisks currently in a Metro Mirror relationship or a Global Mirror relationship. To increase the size of these VDisks, you must delete the current mirror relationships and redefine the mirror relationships after you have resized the VDisks.
165
In this example, we use tape media as the source for the initial sync for the Metro Mirror relationship or the Global Mirror relationship target before using SVC to maintain the Metro Mirror or Global Mirror. This example does not require downtime for the hosts using the source VDisks. Here are the steps: 1. The hosts are up and running and using their VDisks normally. There is no Metro Mirror relationship or Global Mirror relationship defined yet. You have identified all the VDisks that will become the source VDisks in a Metro Mirror relationship or a Global Mirror relationship. 2. You have already established the SVC cluster relationship with the target SVC. 3. Define a Metro Mirror relationship or a Global Mirror relationship for each source VDisk. When defining the relationship, ensure that you use the -sync option, which stops the SVC from performing an initial sync. Note: If you fail to use the -sync option, all of these steps are redundant, because the SVC performs a full initial sync anyway.
4. Stop each mirror relationship by using the -access option, which enables write access to the target VDisks. We will need this write access later. 5. Make a copy of the source VDisk to the alternate media by using the dd command to copy the contents of the VDisk to tape. Another option might be using your backup tool (for example, IBM Tivoli Storage Manager) to make an image backup of the VDisk. Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have some of the changes and is likely to have missed some of the changes as well. When the relationship is restarted, the SVC will apply all of the changes that occurred since the relationship was stopped in step 1 in 8.2.10, Using Metro Mirror or Global Mirror with FlashCopy on page 172. After all the changes are applied, you will have a consistent target image.
6. Ship your media to the remote site and apply the contents to the targets of the Metro/Global Mirror relationship; you can mount the Metro Mirror and Global Mirror target VDisks to a UNIX server and use the dd command to copy the contents of the tape to the target VDisk. If you used your backup tool to make an image of the VDisk, follow the instructions for your tool to restore the image to the target VDisk. Do not forget to remove the mount, if this is a temporary host. Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker SVC is running and maintaining the Metro Mirror and Global Mirror.
7. Unmount the target VDisks from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the VDisk while the mirror relationship is running. 8. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target VDisk is not usable at all. As soon as it reaches Consistent Copying, your remote VDisk is ready for use in a disaster.
166
167
Each node of the remote cluster has a fixed pool of Global Mirror system resources for each node of the primary cluster. That is, each remote node has a separate queue for I/O from each of the primary nodes. This queue is a fixed size and is the same size for every node. If preferred nodes for the VDisks of the remote cluster are set so that every combination of primary node and secondary node is used, Global Mirror performance will be maximized. Figure 8-4 shows an example of Global Mirror resources that are not optimized. VDisks from the Local Cluster are replicated to the Remote Cluster, where all VDisks with a preferred node of Node 1 are replicated to the Remote Cluster, where the target VDisks also have a preferred node of Node 1. With this configuration, the Remote Cluster Node 1 resources reserved for Local Cluster Node 2 are not used. Nor are the resources for Local Cluster Node 1 used for Remote Cluster Node 2.
If the configuration was changed to the configuration shown in Figure 8-5, all Global Mirror resources for each node are used, and SVC Global Mirror operates with better performance than that of the configuration shown in Figure 8-4.
168
169
Extremely important: If the relationship is not stopped in the consistent state, or if any host I/O takes place between stopping the old Metro Mirror or Global Mirror relationship and starting the new Metro Mirror or Global Mirror relationship, those changes will never be mirrored to the target VDisks. As a result, the data on the source and target VDisks is not exactly the same, and the SVC will be unaware of the inconsistency.
170
The remote link is overloaded. Using TPC, you can check the following metrics to see if the remote link was a cause: Look at the total Global Mirror auxiliary VDisk write throughput before the Global Mirror relationships were stopped. If this write throughput is approximately equal to your link bandwidth, it is extremely likely that your link is overloaded, which might be due to application host I/O or a combination of host I/O and background (synchronization) copy activity. Look at the total Global Mirror source VDisk write throughput before the Global Mirror relationships were stopped. This write throughput represents only the I/O performed by the application hosts. If this number approaches the link bandwidth, you might need to either upgrade the links bandwidth, reduce the I/O that the application is attempting to perform, or choose to mirror fewer VDisks using Global Mirror. If, however, the auxiliary disks show much more write I/O than the source VDisks, this situation suggests a high level of background copy. Try decreasing the Global Mirror partnerships background copy rate parameter to bring the total application I/O bandwidth and background copy rate within the links capabilities. Look at the total Global Mirror source VDisk write throughput after the Global Mirror relationships were stopped. If write throughput increases greatly (by 30% or more) when the relationships were stopped, this situation indicates that the application host was attempting to perform more I/O than the link can sustain. While the Global Mirror relationships are active, the overloaded link causes higher response times to the application host, which decreases the throughput that it can achieve. After the relationships have stopped, the application host sees lower response times, and you can see the true I/O workload. In this case, the link bandwidth must be increased, the application host I/O rate must be decreased, or fewer VDisks must be mirrored using Global Mirror. The storage controllers at the remote cluster are overloaded. Any of the MDisks on a storage controller that are providing poor service to the SVC cluster can cause a 1920 error if this poor service prevents application I/O from proceeding at the rate required by the application host. If you have followed the specified back-end storage controller requirements, it is most likely that the error has been caused by a decrease in controller performance due to maintenance actions or a hardware failure of the controller. Use TPC to obtain the back-end write response time for each MDisk at the remote cluster. Response time for any individual MDisk, which exhibits a sudden increase of 50 ms or more or that is higher than 100 ms, indicates a problem: Check the storage controller for error conditions, such as media errors, a failed physical disk, or associated activity, such as RAID array rebuilding. If there is an error, fix the problem and restart the Global Mirror relationships. If there is no error, consider whether the secondary controller is capable of processing the required level of application host I/O. It might be possible to improve the performance of the controller by: Adding more physical disks to a RAID array Changing the RAID level of the array Changing the controllers cache settings (and checking that the cache batteries are healthy, if applicable) Changing other controller-specific configuration parameters
171
The storage controllers at the primary site are overloaded. Analyze the performance of the primary back-end storage using the same steps you use for the remote back-end storage. The main effect of bad performance is to limit the amount of I/O that can be performed by application hosts. Therefore, back-end storage at the primary site must be monitored regardless of Global Mirror. However, if bad performance continues for a prolonged period, it is possible that a 1920 error will occur and the Global Mirror relationships will stop. One of the SVC clusters is overloaded. Use TPC to obtain the port to local node send response time and port to local node send queue time. If the total of these statistics for either cluster is higher than 1 millisecond, the SVC might be experiencing an extremely high I/O load. Also, check the SVC node CPU utilization; if this figure is in excess of 50%, this situation might also contribute to the problem. In either case, contact your IBM service support representative (IBM SSR) for further assistance. FlashCopy mappings are in the prepared state. If the Global Mirror target VDisks are the sources of a FlashCopy mapping, and that mapping is in the prepared state for an extended time, performance to those VDisks can be impacted, because the cache is disabled. Starting the flash copy mapping will re-enable the cache, improving the VDisks performance for Global Mirror I/O.
172
Note: Even though the source is being modified while you are copying the image, the SVC is tracking those changes. Your image that you create might already have part of the changes and is likely to have missed part of the changes as well. When the relationship is restarted, the SVC will apply all changes that have occurred since the relationship was stopped in step 1. After all the changes are applied, you will have a consistent target image. 3. Ship your media to the remote site and apply the contents to the targets of the Metro/Global mirror relationship; you can mount the Metro Mirror and Global Mirror target VDisks to a UNIX server and use the dd command to copy the contents of the tape to the target VDisk. If you used your backup tool to make an image of the VDisk, follow the instructions for your tool to restore the image to the target VDisk. Do not forget to remove the mount if this is a temporary host. Note: It will not matter how long it takes to get your media to the remote site and perform this step. The quicker you can get it to the remote site and loaded, the quicker the SVC is running and maintaining the Metro Mirror and Global Mirror. 4. Unmount the target VDisks from your host. When you start the Metro Mirror and Global Mirror relationship later, the SVC will stop write access to the VDisk while the mirror relationship is running. 5. Start your Metro Mirror and Global Mirror relationships. While the mirror relationship catches up, the target VDisk is not usable at all. As soon as it reaches Consistent Copying, your remote VDisk is ready for use in a disaster.
173
CPU Utilization Percentage CPU Utilization must be below 50%. Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at the remote cluster Time needs to be less than 100 ms. A longer response time can indicate that the storage controller is overloaded. If the response time for a specific storage controller is outside of its specified operating range, investigate for the same reason. Sum of Backend Write Response Time and Write Queue Time for Global Mirror MDisks at the primary cluster Time must also be less than 100 ms. If response time is greater than 100 ms, application hosts might see extended response times if the SVCs cache becomes full. Write Data Rate for Global Mirror MDisk groups at the remote cluster This data rate indicates the amount of data that is being written by Global Mirror. If this number approaches either the inter-cluster link bandwidth or the storage controller throughput limit, be aware that further increases can cause overloading of the system and monitor this number appropriately.
174
Chapter 9.
Hosts
This chapter describes best practices for monitoring host systems attached to the SAN Volume Controller (SVC). A host system is an Open Systems computer that is connected to the switch through a Fibre Channel (FC) interface. The most important part of tuning, troubleshooting, and performance considerations for a host attached to an SVC will be in the host. There are three major areas of concern: Using multipathing and bandwidth (physical capability of SAN and back-end storage) Understanding how your host performs I/O and the types of I/O Utilizing measurement and test tools to determine host performance and for tuning This topic supplements the IBM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.3.0, SC26-7905-02, at: http://www-1.ibm.com/support/docview.wss?rs=591&context=STCCCXR&context=STCCCYH&dc =DA400&q1=english&q2=-Japanese&uid=ssg1S7002159&loc=en_US&cs=utf-8&lang=en
175
176
Table 9-1 Effect of multipathing on write performance R/W test Write Hit 512 b Sequential IOPS Write Miss 512 b Random IOPS 70/30 R/W Miss 4K Random IOPS 70/30 R/W Miss 64K Random MBps 50/50 R/W Miss 4K Random IOPS 50/50 R/W Miss 64K Random MBps Four paths 81 877 60 510.4 130 445.3 1 810.8138 97 822.6 1 674.5727 Eight paths 74 909 57 567.1 124 547.9 1 834.2696 98 427.8 1 678.1815 Difference
-8.6%
-5.0% -5.6% 1.3% 0.6% 0.2%
Chapter 9. Hosts
177
178
An invocation example: svcinfo lshostvdiskmap -delim The resulting output: id:name:SCSI_id:vdisk_id:vdisk_name:wwpn:vdisk_UID 2:host2:0:10:vdisk10:0000000000000ACA:6005076801958001500000000000000A 2:host2:1:11:vdisk11:0000000000000ACA:6005076801958001500000000000000B 2:host2:2:12:vdisk12:0000000000000ACA:6005076801958001500000000000000C 2:host2:3:13:vdisk13:0000000000000ACA:6005076801958001500000000000000D 2:host2:4:14:vdisk14:0000000000000ACA:6005076801958001500000000000000E For example, VDisk 10, in this example, has a unique device identifier (UID) of 6005076801958001500000000000000A, while the SCSI_ id that host2 used for access is 0. svcinfo lsvdiskhostmap -delim : EEXCLS_HBin01 id:name:SCSI_id:host_id:host_name:wwpn:vdisk_UID 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938CFDF:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:109:HDMCENTEX1N1:10000000C938D01F:600507680191011D48000000000 00466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D65B:600507680191011D48000000000 00466 950:EEXCLS_HBin01:13:110:HDMCENTEX1N2:10000000C938D3D3:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D615:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:111:HDMCENTEX1N3:10000000C938D612:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CFBD:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:112:HDMCENTEX1N4:10000000C938CE29:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EE1D8:600507680191011D48000000000 00466 950:EEXCLS_HBin01:14:113:HDMCENTEX1N5:10000000C92EDFFE:600507680191011D48000000000 00466 If using IBM multipathing software (IBM Subsystem Device Driver (SDD) or SDDDSM), the command datapath query device shows the vdisk_UID (unique identifier) and so enables easier management of VDisks. The SDDPCM equivalent command is pcmpath query device.
Chapter 9. Hosts
179
Example 9-1 Host-VDisk mapping for one host from two I/O Groups
IBM_2145:ITSOCL1:admin>svcinfo lshostvdiskmap senegal id name SCSI_id vdisk_id wwpn vdisk_UID 0 senegal 1 60 210000E08B89CCC2 60050768018101BF28000000000000A8 0 senegal 2 58 210000E08B89CCC2 60050768018101BF28000000000000A9 0 senegal 3 57 210000E08B89CCC2 60050768018101BF28000000000000AA 0 senegal 4 56 210000E08B89CCC2 60050768018101BF28000000000000AB 0 senegal 5 61 210000E08B89CCC2 60050768018101BF28000000000000A7 0 senegal 6 36 210000E08B89CCC2 60050768018101BF28000000000000B9 0 senegal 7 34 210000E08B89CCC2 60050768018101BF28000000000000BA 0 senegal 1 40 210000E08B89CCC2 60050768018101BF28000000000000B5 0 senegal 2 50 210000E08B89CCC2 60050768018101BF28000000000000B1 0 senegal 3 49 210000E08B89CCC2 60050768018101BF28000000000000B2 0 senegal 4 42 210000E08B89CCC2 60050768018101BF28000000000000B3 0 senegal 5 41 210000E08B89CCC2 60050768018101BF28000000000000B4
vdisk_name s-0-6-4 s-0-6-5 s-0-5-1 s-0-5-2 s-0-6-3 big-0-1 big-0-2 s-1-8-2 s-1-4-3 s-1-4-4 s-1-4-5 s-1-8-1
Example 9-2 shows the datapath query device output of this Windows host. Note that the order of the two I/O Groups VDisks is reversed from the host-vdisk map. VDisk s-1-8-2 is first, followed by the rest of the LUNs from the second I/O Group, then VDisk s-0-6-4, and the rest of the LUNs from the first I/O Group. Most likely, Windows discovered the second set of LUNS first. However, the relative order within an I/O Group is maintained.
Example 9-2 Using datapath query device for the host VDisk map
DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B5 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1342 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1444 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B1 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 180
SAN Volume Controller Best Practices and Performance Guidelines
0 1 2 3
1405 0 1387 0
0 0 0 0
DEV#: 2 DEVICE NAME: Disk3 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B2 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk3 Part0 OPEN NORMAL 1398 0 1 Scsi Port2 Bus0/Disk3 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk3 Part0 OPEN NORMAL 1407 0 3 Scsi Port3 Bus0/Disk3 Part0 OPEN NORMAL 0 0 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B3 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 1504 0 1 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 1281 0 3 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 DEV#: 4 DEVICE NAME: Disk5 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B4 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk5 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk5 Part0 OPEN NORMAL 1399 0 2 Scsi Port3 Bus0/Disk5 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk5 Part0 OPEN NORMAL 1391 0 DEV#: 5 DEVICE NAME: Disk6 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A8 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 1400 0 1 Scsi Port2 Bus0/Disk6 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 1390 0 3 Scsi Port3 Bus0/Disk6 Part0 OPEN NORMAL 0 0 DEV#: 6 DEVICE NAME: Disk7 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A9 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 1379 0 1 Scsi Port2 Bus0/Disk7 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk7 Part0 OPEN NORMAL 1412 0 3 Scsi Port3 Bus0/Disk7 Part0 OPEN NORMAL 0 0 DEV#: 7 DEVICE NAME: Disk8 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000AA ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk8 Part0 OPEN NORMAL 0 0
Chapter 9. Hosts
181
1 2 3
Scsi Port2 Bus0/Disk8 Part0 Scsi Port3 Bus0/Disk8 Part0 Scsi Port3 Bus0/Disk8 Part0
1417 0 1381
0 0 0
DEV#: 8 DEVICE NAME: Disk9 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000AB ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk9 Part0 OPEN NORMAL 1388 0 2 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk9 Part0 OPEN NORMAL 1413 0 DEV#: 9 DEVICE NAME: Disk10 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A7 ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 1293 0 1 Scsi Port2 Bus0/Disk10 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk10 Part0 OPEN NORMAL 1477 0 3 Scsi Port3 Bus0/Disk10 Part0 OPEN NORMAL 0 0 DEV#: 10 DEVICE NAME: Disk11 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000B9 ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk11 Part0 OPEN NORMAL 59981 0 2 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk11 Part0 OPEN NORMAL 60179 0 DEV#: 11 DEVICE NAME: Disk12 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000BA ============================================================================= Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 28324 0 1 Scsi Port2 Bus0/Disk12 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 27111 0 3 Scsi Port3 Bus0/Disk12 Part0 OPEN NORMAL 0 0 Sometimes, a host might discover everything correctly at initial configuration, but it does not keep up with the dynamic changes in the configuration. The scsi id is therefore extremely important. For more discussion about this topic, refer to 9.2.4, Dynamic reconfiguration on page 185.
182
Chapter 9. Hosts
183
among the VDisks. If the preferred node is offline, all I/O will go through the non-preferred node in write-through mode. Certain multipathing software does not utilize the preferred node information, so it might balance the I/O load for a host differently. Veritas DMP is one example. Table 9-2 shows the effect with 16 devices and read misses of the preferred node contrasted with the non-preferred node on performance and shows the effect on throughput. The effect is significant.
Table 9-2 The 16 device random 4 Kb read miss response time (4.2 nodes, usecs) Preferred node (owner) 18 227 Non-preferred node 21 256 Delta 3 029
Table 9-3 shows the change in throughput for the case of 16 devices and random 4 Kb read miss throughput using the preferred node as opposed to a non-preferred node shown in Table 9-2.
Table 9-3 The 16 device random 4 Kb read miss throughput (IOPS) Preferred node (owner) 105 274.3 Non-preferred node 90 292.3 Delta 14 982
In Table 9-4, we show the effect of using the non-preferred paths compared to the preferred paths on read performance.
Table 9-4 Random (1 TB) 4 Kb read response time (4.1 nodes, usecs) Preferred Node (Owner) 5 074 Non-preferred Node 5 147 Delta 73
Table 9-5 shows the effect of using non-preferred nodes on write performance.
Table 9-5 Random (1 TB) 4 Kb write response time (4.2 nodes, usecs) Preferred node (owner) 5 346 Non-preferred node 5 433 Delta 87
IBM SDD software, SDDDSM software, and SDDPCM software recognize the preferred nodes and utilize the preferred paths.
184
Chapter 9. Hosts
185
Removing VDisks and then later allocating new VDisks to the host
The problem surfaces when a user removes a vdiskhostmap on the SVC during the process of removing a VDisk. After a VDisk is unmapped from the host, the device becomes unavailable and the SVC reports that there is no such disk on this port. Usage of datapath query device after the removal will show a closed, offline, invalid, or dead state as shown here: Windows host: DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018201BEE000000000000041 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSE OFFLINE 0 0 1 Scsi Port3 Bus0/Disk1 Part0 CLOSE OFFLINE 263 0 AIX host: DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 DEAD OFFLINE 0 0 1 fscsi0/hdisk1655 DEAD OFFLINE 2 0 2 fscsi1/hdisk1658 INVALID NORMAL 0 0 3 fscsi1/hdisk1659 INVALID NORMAL 1 0 The next time that a new VDisk is allocated and mapped to that host, the SCSI ID will be reused if it is allowed to set to the default value, and the host can possibly confuse the new device with the old device definition that is still left over in the device database or system memory. It is possible to get two devices that use identical device definitions in the device database, such as in this example. Note that both vpath189 and vpath190 have the same hdisk definitions while they actually contain different device serial numbers. The path fscsi0/hdisk1654 exists in both vpaths. DEV#: 189 DEVICE NAME: vpath189 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007E6 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 CLOSE NORMAL 0 0 1 fscsi0/hdisk1655 CLOSE NORMAL 2 0 2 fscsi1/hdisk1658 CLOSE NORMAL 0 0 3 fscsi1/hdisk1659 CLOSE NORMAL 1 0 DEV#: 190 DEVICE NAME: vpath190 TYPE: 2145 POLICY: Optimized SERIAL: 600507680000009E68000000000007F4 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk1654 OPEN NORMAL 0 0 1 fscsi0/hdisk1655 OPEN NORMAL 6336260 0 2 fscsi1/hdisk1658 OPEN NORMAL 0 0 3 fscsi1/hdisk1659 OPEN NORMAL 6326954 0 186
SAN Volume Controller Best Practices and Performance Guidelines
The multipathing software (SDD) recognizes that there is a new device, because at configuration time, it issues an inquiry command and reads the mode pages. However, if the user did not remove the stale configuration data, the Object Data Manager (ODM) for the old hdisks and vpaths still remains and confuses the host, because the SCSI ID as opposed to the device serial number mapping has changed. You can avoid this situation if you remove the hdisk and vpath information from the device configuration database (rmdev -dl vpath189, rmdev -dl hdisk1654, and so forth) prior to mapping new devices to the host and running discovery. Removing the stale configuration and rebooting the host is the recommended procedure for reconfiguring the VDisks mapped to a host. Another process that might cause host confusion is expanding a VDisk. The SVC will tell a host through the scsi check condition mode parameters changed, but not all hosts are able to automatically discover the change and might confuse LUNs or continue to use the old size. Review the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, SC23-6628, for more details and supported hosts: http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156
C:\Program Files\IBM\Subsystem Device Driver>datapath query device DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A0 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1884768 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF280000000000009F ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0 If you just quiesce the host I/O and then migrate the VDisks to the new I/O Group, you will get closed offline paths for the old I/O Group and open normal paths to the new I/O Group. However, these devices do not work correctly, and there is no way to remove the stale paths
Chapter 9. Hosts
187
without rebooting. Note the change in the pathing in Example 9-4 for device 0 SERIAL:S60050768018101BF28000000000000A0.
Example 9-4 Windows VDISK moved to new I/O Group dynamically showing the closed offline paths
DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF28000000000000A0 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 1 Scsi Port2 Bus0/Disk1 Part0 CLOSED OFFLINE 1873173 0 2 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 0 0 3 Scsi Port3 Bus0/Disk1 Part0 CLOSED OFFLINE 1884768 0 4 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0 5 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 45 0 6 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 7 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 54 0 DEV#: 1 DEVICE NAME: Disk2 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF280000000000009F ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 0 0 1 Scsi Port2 Bus0/Disk2 Part0 OPEN NORMAL 1863138 0 2 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 0 0 3 Scsi Port3 Bus0/Disk2 Part0 OPEN NORMAL 1839632 0 To change the I/O Group, you must first flush the cache within the nodes in the current I/O Group to ensure that all data is written to disk. The SVC command line interface (CLI) guide recommends that you suspend I/O operations at the host level. The recommended way to quiesce the I/O is to take the volume groups offline, remove the saved configuration (AIX ODM) entries, such as hdisks and vpaths for those that are planned for removal, and then gracefully shut down the hosts. Migrate the VDisk to the new I/O Group and power up the host, which will discover the new I/O Group. If the stale configuration data was not removed prior to the shutdown, remove it from the stored host device databases (such as ODM if it is an AIX host) at this point. For Windows hosts, the stale registry information is normally ignored after reboot. Doing VDisk migrations in this way will prevent the problem of stale configuration issues.
parameters for the various storage devices (VDisks on the SVC). There are also algorithms within multipathing software, such as qdepth_enable.
Chapter 9. Hosts
189
Figure 9-1 IOPS compared to queue depth for 32 Vdisks tests on a single host
Figure 9-2 shows another example of queue depth sensitivity for 32 VDisks on a single host.
Figure 9-2 MBps compared to queue depth for 32 VDisk tests on a single host
190
Persistent reserve refers to a set of Small Computer Systems Interface-3 (SCSI-3) standard commands and command options that provide SCSI initiators with the ability to establish, preempt, query, and reset a reservation policy with a specified target device. The functionality provided by the persistent reserve commands is a superset of the legacy reserve/release commands. The persistent reserve commands are incompatible with the legacy reserve/release mechanism, and target devices can only support reservations from either the legacy mechanism or the new mechanism. Attempting to mix persistent reserve commands with legacy reserve/release commands will result in the target device returning a reservation conflict error.
Legacy reserve and release mechanisms (SCSI-2) reserved the entire LUN (VDisk) for exclusive use down a single path, which prevents access from any other host or even access from the same host utilizing a different host adapter. The persistent reserve design establishes a method and interface through a reserve policy attribute for SCSI disks, which specifies the type of reservation (if any) that the OS device driver will establish before accessing data on the disk. Four possible values are supported for the reserve policy: No_reserve: No reservations are used on the disk. Single_path: Legacy reserve/release commands are used on the disk. PR_exclusive: Persistent reservation is used to establish exclusive host access to the disk. PR_shared: Persistent reservation is used to establish shared host access to the disk. When a device is opened (for example, when the AIX varyonvg command opens the underlying hdisks), the device driver will check the ODM for a reserve_policy and a PR_key_value and open the device appropriately. For persistent reserve, it is necessary that each host attached to the shared disk use a unique registration key value.
Clearing reserves
It is possible to accidently leave a reserve on the SVC VDisk or even the SVC MDisk during migration into the SVC or when reusing disks for another purpose. There are several tools available from the hosts to clear these reserves. The easiest tools to use are the commands lquerypr (AIX SDD host) and pcmquerypr (AIX SDDPCM host). There is also a Windows SDD/SDDDSM tool, which is menu driven.
Chapter 9. Hosts
191
The Windows Persistent Reserve Tool is called PRTool.exe and is installed automatically when SDD or SDDDSM is installed: C:\Program Files\IBM\Subsystem Device Driver>PRTool.exe It is possible to clear SVC VDisk reserves by removing all the host-VDisk mappings when SVC code is at 4.1.0 or higher. Example 9-5 shows how to determine if there is a reserve on a device using the AIX SDD lquerypr command on a reserved hdisk.
Example 9-5 The lquerypr command
[root@ktazp5033]/reserve-checker-> lquerypr -vVh /dev/hdisk5 connection type: fscsi0 open dev: /dev/hdisk5 Attempt to read reservation key... Attempt to read registration keys... Read Keys parameter Generation : 935 Additional Length: 32 Key0 : 7702785F Key1 : 7702785F Key2 : 770378DF Key3 : 770378DF Reserve Key provided by current host = 7702785F Reserve Key on the device: 770378DF This example shows that the device is reserved by a different host. The advantage of using the vV parameter is that the full persistent reserve keys on the device are shown, as well as the errors if the command fails. An example of a failing pcmquerypr command to clear the reserve shows the error: # pcmquerypr -ph /dev/hdisk232 -V connection type: fscsi0 open dev: /dev/hdisk232 couldn't open /dev/hdisk232, errno=16 Use the AIX include file errno.h to find out what the 16 indicates. This error indicates a busy condition, which can indicate a legacy reserve or a persistent reserve from another host (or this host from a different adapter). However, there are certain AIX technology levels (TLs) that have a diagnostic open issue, which prevents the pcmquerypr command from opening the device to display the status or to clear a reserve. The following hint and tip give more information about AIX TL levels that break the pcmquerypr command: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&uid=ssg1S1003122&lo c=en_US&cs=utf-8&lang=en
192
9.5.1 AIX
The following topics describe items specific to AIX.
Transaction-based settings
The following host attachment script will set the default values of attributes for the SVC hdisks: devices.fcp.disk.IBM.rte or devices.fcp.disk.IBM.mpio.rte. You can modify these values, but they are an extremely good place to start. There are additionally HBA parameters that are useful to set for higher performance or large numbers of hdisk configurations. All attribute values that are changeable can be changed using the chdev command for AIX. AIX settings, which can directly affect transaction performance, are the queue_depth hdisk attribute and num_cmd_elem in the HBA attributes.
Chapter 9. Hosts
193
AIX settings, which can directly affect throughput performance with large I/O block size, are the lg_term_dma and max_xfer_size parameters for the fcs device.
Throughput-based settings
In the throughput-based environment, you might want to decrease the queue depth setting to a smaller value than the default from the host attach. In a mixed application environment, you do not want to lower the num_cmd_elem setting, because other logical drives might need this higher value to perform. In a purely high throughput workload, this value will have no effect. Best practice: The recommended start values for high throughput sequential I/O environments are lg_term_dma = 0x400000 or 0x800000 (depending on the adapter type) and max_xfr_size = 0x200000. We recommend that you test your host with the default settings first and then make these possible tuning changes to the host parameters to verify if these suggested changes actually enhance performance for your specific host configuration and workload.
Multipathing
When the AIX operating system was first developed, multipathing was not embedded within the device drivers. Therefore, each path to an SVC VDisk was represented by an AIX hdisk. The SVC host attachment script devices.fcp.disk.ibm.rte sets up the predefined attributes within the AIX database for SVC disks, and these attributes have changed with each iteration of host attachment and AIX technology levels. Both SDD and Veritas DMP utilize the hdisks for multipathing control. The host attachment is also used for other IBM storage devices. The Host Attachment allows AIX device driver configuration methods to properly identify and configure SVC (2145), DS6000 (1750), and DS8000 (2107) LUNs: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att achment&uid=ssg1S4000106&loc=en_US&cs=utf-8&lang=en
SDD
IBM Subsystem Device Driver (SDD) multipathing software has been designed and updated consistently over the last decade and is an extremely mature multipathing technology. The SDD software also supports many other IBM storage types directly connected to AIX, such as the 2107. SDD algorithms for handling multipathing have also evolved. There are throttling mechanisms within SDD that controlled overall I/O bandwidth in SDD Releases 1.6.1.0 and lower. This throttling mechanism has evolved to be single vpath specific and is called qdepth_enable in later releases. SDD utilizes persistent reserve functions, placing a persistent reserve on the device in place of the legacy reserve when the volume group is varyon. However, if HACMP is installed, HACMP controls the persistent reserve usage depending on the type of varyon used. Also, the enhanced concurrent volume groups (VGs) have no reserves: varyonvg -c for enhanced concurrent and varyonvg for regular VGs that utilize the persistent reserve. Datapath commands are an extremely powerful method for managing the SVC storage and pathing. The output shows the LUN serial number of the SVC VDisk and which vpath and hdisk represent that SVC LUN. Datapath commands can also change the multipath selection algorithm. The default is load balance, but the multipath selection algorithm is programmable. The recommended best practice when using SDD is also load balance using four paths. The datapath query device output will show a somewhat balanced number of selects on each preferred path to the SVC: DEV#: 12 DEVICE NAME: vpath12 TYPE: 2145 POLICY: Optimized SERIAL: 60050768018B810A88000000000000E0 ==================================================================== Path# Adapter/Hard Disk State Mode Select Errors 0 fscsi0/hdisk55 OPEN NORMAL 1390209 0 1 fscsi0/hdisk65 OPEN NORMAL 0 0 2 fscsi0/hdisk75 OPEN NORMAL 1391852 0 3 fscsi0/hdisk85 OPEN NORMAL 0 0 We recommend that you verify that the selects during normal operation are occurring on the preferred paths (use datapath query device -l). Also, verify that you have the correct connectivity.
Chapter 9. Hosts
195
SDDPCM
As Fibre Channel technologies matured, AIX was enhanced by adding native multipathing support called Multipath I/O (MPIO). This structure allows a manufacturer of storage to create software plug-ins for their specific storage. The IBM SVC version of this plug-in is called SDDPCM, which requires a host attachment script called devices.fcp.disk.ibm.mpio.rte: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&dc=D410&q1=host+att achment&uid=ssg1S4000203&loc=en_US&cs=utf-8&lang=en SDDPCM and AIX MPIO have been continually improved since their release. We recommend that you are at the latest release levels of this software. The preferred path indicator for SDDPCM will not display until after the device has been opened for the first time, which differs from SDD, which displays the preferred path immediately after being configured. SDDPCM features four types of reserve policies: No_reserve policy Exclusive host access single path policy Persistent reserve exclusive host policy Persistent reserve shared host access policy The usage of the persistent reserve now depends on the hdisk attribute: reserve_policy. Change this policy to match your storage security requirements. There are three path selection algorithms: Failover Round-robin Load balancing The latest SDDPCM code of 2.1.3.0 and later has improvements in failed path reclamation by a health checker, a failback error recovery algorithm, Fibre Channel dynamic device tracking, and support for SAN boot device on MPIO-supported storage devices.
SDDPCM pathing
SDDPCM pcmpath commands are the best way to understand configuration information about the SVC storage allocation. The following example shows how much can be determined from this command, pcmpath query device, about the connections to the SVC from this host.
196
DEV#: 0 DEVICE NAME: hdisk0 TYPE: 2145 ALGORITHM: Load Balance SERIAL: 6005076801808101400000000000037B ====================================================================== Path# Adapter/Path Name State Mode Select Errors 0 fscsi0/path0 OPEN NORMAL 155009 0 1 fscsi1/path1 OPEN NORMAL 155156 0 In this example, both paths are being used for the SVC connections. These counts are not the normal select counts for a properly mapped SVC, and two paths are not an adequate number of paths. Use the -l option on pcmpath query device to check whether these paths are both preferred paths. If they are both preferred paths, one SVC node must be missing from the host view. Using the -l option shows an asterisk on both paths, indicating a single node is visible to the host (and is the non-preferred node for this VDisk): 0* 1* fscsi0/path0 fscsi1/path1 OPEN OPEN NORMAL NORMAL 9795 0 9558 0
This information indicates a problem that needs to be corrected. If zoning in the switch is correct, perhaps this host was rebooted while one SVC node was missing from the fabric.
Veritas
Veritas DMP multipathing is also supported for the SVC. Veritas DMP multipathing requires certain AIX APARS and the Veritas Array Support Library. It also requires a certain version of the host attachment script devices.fcp.disk.ibm.rte to recognize the 2 145 devices as hdisks rather than MPIO hdisks. In addition to the normal ODM databases that contain hdisk attributes, there are several Veritas filesets that contain configuration data: /dev/vx/dmp /dev/vx/rdmp /etc/vxX.info Storage reconfiguration of VDisks presented to an AIX host will require cleanup of the AIX hdisks and these Veritas filesets.
There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI
PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOSs for that reason, you must use PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are VDisks from the virtual I/O client (VIOC) point of view. An LV VCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in LVM volume groups (VGs) on the VIOS and cannot span PVs in that VG, nor be striped LVs. Due to these restrictions, we recommend using PV VSCSI hdisks. Multipath support for SVC attachment to Virtual I/O Server is provided by either SDD or MPIO with SDDPCM. Where Virtual I/O Server SAN Boot or dual Virtual I/O Server configurations
Chapter 9. Hosts
197
are required, only MPIO with SDDPCM is supported. We recommend using MPIO with SDDPCM due to this restriction with the latest SVC-supported levels as shown by:
http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_Virtual_IO_Server
Details of the Virtual I/O Server-supported environments are at: http://www14.software.ibm.com/webapp/set2/sas/f/vios/home.html There are many questions answered on the following Web site for usage of the VIOS: http://www14.software.ibm.com/webapp/set2/sas/f/vios/documentation/faq.html One common question is how to migrate data into a VIO environment or how to reconfigure storage on a VIOS. This question is addressed in the previous link. Many clients want to know if SCSI LUNs can be moved between the physical and virtual environment as is. That is, given a physical SCSI device (LUN) with user data on it that resides in a SAN environment, can this device be allocated to a VIOS and then provisioned to a client partition and used by the client as is? The answer is no, this function is not supported at this time. The device cannot be used as is. Virtual SCSI devices are new devices when created, and the data must be put on them after creation, which typically requires a type of backup of the data in the physical SAN environment with a restoration of the data onto the VDisk.
A quick and simple method to determine if a backup and restoration is necessary is to run the command lquerypv -h /dev/hdisk## 80 10 to read the PVID off the disk. If the output is different on both the VIOS and VIOC, you must use backup and restore.
9.5.4 Windows
There are two options of multipathing drivers released for Windows 2003 Server hosts. Windows 2003 Server device driver development has concentrated on the storport.sys driver. This driver has significant interoperability differences from the older scsiport driver set. Additionally, Windows has released a native multipathing I/O option with a storage specific plug-in. SDDDSM was designed to support these newer methods of interfacing with Windows 2003 Server. In order to release new enhancements more quickly, the newer hardware architectures (64-bit EMT and so forth) are only tested on the SDDDSM code stream; therefore, only SDDDSM packages are available. The older version of the SDD multipathing driver works with the scsiport drivers. This version is required for Windows Server 2000 servers, because storport.sys is not available. The SDD software is also available for Windows 2003 Server servers when the scsiport hba drivers are used.
Chapter 9. Hosts
199
release. Future enhancements will concentrate on SDDDSM within the windows MPIO framework.
Tunable parameters
With Windows operating systems, the queue depth settings are the responsibility of the host adapters and configured through the BIOS setting. Configuring the queue depth settings varies from vendor to vendor. Refer to your manufacturers instructions about how to configure your specific cards and the IBM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.3.0, SC26-7905-03, at: http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002159 Queue depth is also controlled by the Windows application program. The application program has control of how many I/O commands it will allow to be outstanding before waiting for completion. For IBM FAStT FC2-133 (and QLogic-based HBAs), the queue depth is known as the execution throttle, which can be set with either the QLogic SANSurfer tool or in the BIOS of the QLogic-based HBA by pressing Ctrl+Q during the startup process.
9.5.5 Linux
IBM has decided to transition SVC multipathing support from IBM SDD to Linux native DM-MPIO multipathing. Refer to the V4.3.0 - Recommended Software Levels for SAN Volume Controller for which versions of each Linux kernel require SDD or DM-MPIO support: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278 If your kernel is not listed for support, contact your IBM marketing representative to request a Request for Price Quotation (RPQ) for your specific configuration. Linux Clustering is not supported, and Linux OS does not use the legacy reserve function. Therefore, there are no persistent reserves used in Linux. Contact IBM marketing for RPQ support if you need Linux Clustering in your specific environment.
200
Tunable parameters
Linux performance is influenced by HBA parameter settings and queue depth. Queue depth for Linux servers can be determined by using the formula specified in the IBM System Storage SAN Volume Controller V4.3.0 - Software Installation and Configuration Guide, SC23-6628-02, at: http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002156 Refer to the settings for each specific HBA type and general Linux OS tunable parameters in the IBM System Storage SAN Volume Controller V4.3.0 - Host Attachment Guide, SC26-7905-03, at: http://www-1.ibm.com/support/docview.wss?uid=ssg1S7002159 In addition to the I/O and OS parameters, Linux also has tunable file system parameters. You can use the command tune2fs to increase file system performance based on your specific configuration. The journal mode and size can be changed. Also, the directories can be indexed. Refer to the following open source document for details: http://swik.net/how-to-increase-ext3-and-reiserfs-filesystems-performance
9.5.6 Solaris
There are several options for multipathing support on Solaris hosts. You can choose between IBM SDD, Symantec/VERITAS Volume Manager, or you can use Solaris MPxIO depending on the OS levels in the latest SVC software level matrix. SAN startup support and clustering support are available for Symantec/VERITAS Volume Manager, and SAN boot support is also available for MPxIO.
Solaris MPxIO
Releases of SVC code prior to 4.3.0 did not support load balancing of the MPxIO software. Configure your SVC host object with the type attribute set to tpgs if you want to run MPxIO on your Sun SPARC host. For example: svctask mkhost -name new_name_arg -hbawwpn wwpn_list -type tpgs In this command, -type specifies the type of host. Valid entries are hpux, tpgs, or generic. The tpgs option enables an extra target port unit. The default is generic.
Chapter 9. Hosts
201
Use the following commands to determine the basic configuration of a Symantec/Veritas server: pkginfo l (lists all installed packages) showrev -p |grep vxvm (to obtain version of volume manager) vxddladm listsupport (to see what ASLs are configured) vxdisk list vxdmpadm listctrl all (shows all attached subsystems, and provides a type where possible) vxdmpadm getsubpaths ctlr=cX (lists paths by controller) vxdmpadm getsubpaths dmpnodename=cxtxdxs2 (lists paths by lun) The following commands will determine if the SVC is properly connected and show at a glance which ASL library is used (native DMP ASL or SDD ASL). Here is an example of what you see when Symantec volume manager is correctly seeing our SVC, using the SDD passthrough mode ASL: # vxdmpadm list enclosure all ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ============================================================ OTHER_DISKS OTHER_DISKS OTHER_DISKS CONNECTED VPATH_SANVC0 VPATH_SANVC 0200628002faXX00 CONNECTED Here is an example of what we see when SVC is configured using native DMP ASL: # vxdmpadm listenclosure all ENCLR_NAME ENCLR_TYPE ENCLR_SNO STATUS ============================================================ OTHER_DISKS OTHER_DSKSI OTHER_DISKS CONNECTED SAN_VC0 SAN_VC 0200628002faXX00 CONNECTED
Following the installation of a new ASL using pkgadd, you need to either reboot or issue vxdctl enable. To list the ASLs that are active, run vxddladm listsupport.
9.5.7 VMware
Review the V4.3.0 - Recommended Software Levels for SAN Volume Controller Web site for the various ESX levels that are supported: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003278#_VMWare To get continued support for VMware levels, for example, you use level 3.01, you must upgrade it to a minimum VMware level of 3.02. For more details, contact your IBM marketing representative and ask about the submission of an RPQ for support. The necessary patches and procedures to apply them will be supplied after the specific configuration has been reviewed and approved.
Chapter 9. Hosts
203
9.7 Monitoring
A consistent set of monitoring tools is available when IBM SDD, SDDDSM, and SDDPCM are used for the multipathing software on the various OS environments. Examples earlier in this chapter showed how the datapath query device and datapath query adapter commands can be used for path monitoring. Path performance can also be monitored via datapath commands: datapath query devstats (or pcmpath query devstats) The datapath query devstats command shows performance information for a single device, all devices, or a range of devices. Example 9-6 shows the output of datapath query devstats for two devices.
Example 9-6 The datapath query devstats command output
C:\Program Files\IBM\Subsystem Device Driver>datapath query devstats Total Devices : 2 Device #: 0 ============= I/O: SECTOR: Total Read 1755189 14168026 Total Write 1749581 153842715 Active Read 0 0 Active Write 0 0 Maximum 3 256
204
Transfer Size:
<= 4k 2337858
> 64K 0
Device #: 1 ============= I/O: SECTOR: Transfer Size: Total Read 20353800 162956588 <= 512 296 Total Write 9883944 451987840 <= 4k 27128331 Active Read 0 0 <= 16K 215 Active Write 1 128 <= 64K 3108902 Maximum 4 256 > 64K 0
Also, an adapter level statistics command is available: datapath query adaptstats (also mapped to pcmpath query adaptstats). Refer to Example 9-7 for a two adapter example.
Example 9-7 The datapath query adaptstats output
C:\Program Files\IBM\Subsystem Device Driver>datapath query adaptstats Adapter #: 0 ============= I/O: SECTOR: Adapter #: 1 ============= I/O: SECTOR: Total Read 11048415 88512687 Total Write 5930291 317726325 Active Read 0 0 Active Write 1 128 Maximum 2 256 Total Read 11060574 88611927 Total Write 5936795 317987806 Active Read 0 0 Active Write 0 0 Maximum 2 256
It is possible to clear these counters so that you can script the usage to cover a precise amount of time. The commands also allow you to choose devices to return as a range, single device, or all devices. The command to clear the counts is datapath clear device count.
Chapter 9. Hosts
205
206
10
Chapter 10.
Applications
In this chapter, we provide information about laying out storage for the best performance for general applications, IBM AIX Virtual I/O (VIO) servers, and IBM DB2 databases specifically. While most of the specific information is directed to hosts running the IBM AIX operating system, the information is also relevant to other host types.
207
208
Generally, a smaller number of physical drives are needed to reach adequate I/O performance than with transaction-based workloads. For instance, 20 - 28 physical drives are normally enough to reach maximum I/O throughput rates with the IBM System Storage DS4000 series of storage subsystems. In a throughput-based environment, read operations make use of the storage subsystem cache to stage greater chunks of data at a time to improve the overall performance. Throughput rates are heavily dependent on the storage subsystems internal bandwidth. Newer storage subsystems with broader bandwidths are able to reach higher numbers and bring higher rates to bear.
209
mixed and use SVC striped VDisks over several MDisks in an MDG in order to have the best performance and eliminate trouble spots or hot spots.
211
212
back-end storage configurations and removes a significant data layout burden for the storage administrators. Consider where the failure boundaries are in the back-end storage and take this into consideration when locating application data. A failure boundary is defined as what will be affected if we lose a RAID array (an SVC MDisk). All the VDisks and servers striped on that MDisk will be affected together with all other VDisks in that MDG. Consider also that spreading out the I/Os evenly across the back-end storage has a performance benefit and a management benefit. We recommend that an entire set of back-end storage is managed together considering the failure boundary. If a company has several lines of business (LOBs), it might decide to manage the storage along each LOB so that each LOB has a unique set of back-end storage. So, for each set of back-end storage (a group of MDGs or perhaps better, just one MDG), we create only striped VDisks across all the back-end storage arrays, which is is beneficial, because the failure boundary is limited to a LOB, and performance and storage management is handled as a unit for the LOB independently. What we do not recommend is to create striped VDisks that are striped across different sets of back-end storage, because using different sets of back-end storage makes the failure boundaries difficult to determine, unbalances the I/O, and might limit the performance of those striped VDisks to the slowest back-end device. For SVC configurations where SVC image mode VDisks must be used, we recommend that the back-end storage configuration for the database consists of one LUN (and therefore one image mode VDisk) per array, or an equal number of LUNs per array, so that the Database Administrator (DBA) can guarantee that the I/O workload is distributed evenly across the underlying physical disks of the arrays. Refer to Figure 10-2 on page 214. Use striped mode VDisks for applications that do not already stripe their data across physical disks. Striped VDisks are the all-purpose VDisks for most applications. Use striped mode VDisks if you need to manage a diversity of growing applications and balance the I/O performance based on probability. If you understand your application storage requirements, you might take an approach that explicitly balances the I/O rather than a probabilistic approach to balancing the I/O. However, explicitly balancing the I/O requires either testing or a good understanding of the application and the storage mapping and striping to know which approach works better. Examples of applications that stripe their data across the underlying disks are DB2, GPFS, and Oracle ASM. These types of applications might require additional data layout considerations as described in 10.4, When the application does its own balancing of I/Os on page 216.
213
214
215
How the partitions are selected for use and laid out can vary from system to system. In all cases, you need to ensure that spreading the partitions is done in a manner to achieve maximum I/Os available to the logical drives in the group. Generally, large volumes are built across a number of different logical drives to bring more resources to bear. You must be careful when selecting logical drives when you do this in order to not use logical drives that will compete for resources and degrade performance.
216
Note that if we use SMS file system directories, it is important to have one file system (and underlying LV) per container. That is, do not have two SMS file system directory containers in the same file system. Also, for DMS file system files, it is important to have just one file per file system (and underlying LV) per container. In other words, we have only one container per LV. The reason for these restrictions is that we do not have control of where each container resides in the LV; thus, we cannot assure that the LVs are balanced across physical disks. The simplest way to think of DB2 data layout is to assume that we are using many disks and that we create one container per disk. In general, each container has the same sustained IOPS bandwidth and resides on a set of physically independent physical disks, because each container will be accessed equally by DB2 agents. DB2 also has multiple types of tablespaces and storage uses. For example, tablespaces can be created separately for table data, indexes, and DB2 temporary work areas. The principles of storage design for even I/O balancing among tablespace containers applies to each of these tablespace types. Furthermore, containers for different tablespace types can be shared on the same array, thus, allowing all database objects to have equal opportunity at using all I/O performance of the underlying storage subsystem and disks. Also note that different options can be used for each container type, for example, DMS file containers might be used for data tablespaces, and SMS file system directories might be used for DB2 temporary tablespace containers. DB2 connects physical storage to DB2 tables and database structures through the use of DB2 tablespaces. Collaboration between a DB2 DBA and the AIX Administrator (or storage administrator) to create the DB2 tablespace definitions can ensure that the guidance provided for the database storage design is implemented for optimal I/O performance of the storage subsystem by the DB2 database. Use of Automatic Storage bypasses LVM entirely, and here, DB2 uses disks for containers. So in this case, each disk must have similar IOPS characteristics. We will not describe this option here.
217
A second approach to growth is to add another array, the LUNs, and the LVs and allow DB2 to rebalance the data across the containers. This approach also increases the IOPS number available to DB2. A third approach to growth is to add one or two disks to each RAID array (for disk subsystems that support dynamic RAID array expansion). This approach increases IOPS bandwidth. For DB2 data warehouses, or extremely high bandwidth DB2 databases on the SVC, utilizing sequential mode VDisks and DB2 managed striping might be preferred. But for other general applications, we generally recommend using striped VDisks to balance the I/Os. This recommendation also has the advantage of eliminating LVM data layout as an issue. We also recommend using SDDPCM instead of IBM Subsystem Device Driver (SDD). Growth can be handled for general applications by dynamically increasing the size of the VDisk and then using chvg -g for LVM to see the increased size. For DB2, growth can be handled by adding another container (a sequential or image mode VDisk) and allowing DB2 to restripe the data across the VDisks.
10.5 Data layout with the AIX virtual I/O (VIO) server
The purpose of this section is to describe strategies to get the best I/O performance by evenly balancing I/Os across physical disks when using the VIO Server.
218
10.5.1 Overview
In setting up storage at a VIO server (VIOS), a broad range of possibilities exists for creating VDisks and serving them up to VIO clients (VIOCs). The obvious consideration is to create sufficient storage for each VIOC. Less obvious, but equally important, is getting the best use of the storage. Performance and availability are of paramount importance. There are typically internal Small Computer System Interface (SCSI) disks (typically used for the VIOS operating system) and SAN disks. Availability for disk is usually handled by RAID on the SAN or by SCSI RAID adapters on the VIOS. We will assume here that any internal SCSI disks are used for the VIOS operating system and possibly for the VIOCs operating systems. Furthermore, we will assume that the applications are configured so that the limited I/O will occur to the internal SCSI disks on the VIOS and to the VIOCs rootvgs. If you expect your rootvg will have a significant IOPS rate, you can configure it in the same fashion as we recommend for other application VGs later.
VIOS restrictions
There are two types of VDisks that you can create on a VIOS: physical volume (PV) VSCSI hdisks and logical volume (LV) VSCSI hdisks. PV VSCSI hdisks are entire LUNs from the VIOS point of view, and if you are concerned about failure of a VIOS and have configured redundant VIOS for that reason, you must use PV VSCSI hdisks. So, PV VSCSI hdisks are entire LUNs that are VDisks from the VIOC point of view. An LV VSCSI hdisk cannot be served up from multiple VIOSs. LV VSCSI hdisks reside in LVM VGs on the VIOS and cannot span PVs in that VG, nor be striped LVs.
219
220
11
Chapter 11.
Monitoring
The SAN Volume Controller (SVC) provides a range of data about how it performs and also about the performance of other components of the SAN. When you properly monitor performance, having the SVC in the SAN makes it easier to recognize and fix faults and performance problems. In this chapter, we first describe how to collect SAN topology and performance information using TotalStorage Productivity Center (TPC). We then show several examples of misconfiguration and failures, and how they can be identified in the TPC Topology Viewer and performance reports. Finally, we describe how to monitor the SVC error log effectively by using the e-mail notification function. The examples in this chapter were taken from TotalStorage Productivity Center (TPC) V3.3.2.79, which was released in June 2008 to support SVC 4.3. You must always use the latest version of TPC that is supported by your SVC code; TPC is often updated to support new SVC features. If you have an earlier version of TPC installed, you might still be able to reproduce the reports described here, but certain data might not be available.
221
2. When you click Save, TPC will validate the information that you have provided by testing the connection to the CIMOM. If there is an error, an alert will pop up, and you must correct the error before you can save the configuration again. 222
3. After the connection has been successfully configured, TPC must run a CIMOM Discovery (under Administrative Services Discovery CIMOM) before you can set up performance monitoring or before the SVC cluster will appear in the Topology Viewer. Note: The SVC Config Node (that owns the IP address for the cluster) has a 10 session Secure Shell (SSH) limit. TPC will use one of these sessions while interacting with the SVC. You can read more information about the session limit in 3.2.1, SSH connection limitations on page 42.
223
Information: When we cabled our SVC, we intended to connect ports 1 and 3 to one switch (IBM_2109_F32) and ports 2 and 4 to the other switch (swd77). We thought that we were really careful about labeling our cables and configuring our ports. TPC showed us that we did not configure the ports this way, and additionally, we made two mistakes. Figure 11-2 shows that we: Correctly configured all four nodes with port 1 to switch IBM_2109_F32 Correctly configured all four nodes with port 2 to switch swd77 Incorrectly configured two nodes with port 3 to switch swd77 Incorrectly configured two nodes with port 4 to switch IBM_2109_F32
Figure 11-2 Checking the SVC ports to ensure they are connected to the SAN fabric
TPC can also show us where our host and storage are in our fabric and which switches the I/Os will go through when I/Os are generated from the host to the SVC or from the SVC to the storage controller. For redundancy, all storage controllers must be connected to at least two fabrics, and those same fabrics must be the fabrics to which the SVC is connected. Figure 11-3 on page 225 shows our DS4500 is also connected to fabrics FABRIC-2GBS and FABRIC-4GBS as we planned. Information: Our DS4500 was shared with other users, so we were only able to use two ports of the available four ports. The other two ports were used by a different SAN infrastructure.
224
Figure 11-2 on page 224 shows an example where all the TPC ports are connected and the switch ports are healthy. Figure 11-4 on page 226 shows an example where the SVC ports are not healthy. In this example, the two ports that have a black line drawn between the switch and the SVC node port are in fact down. Because TPC knew where these two ports were connected on a previous probe (and, thus, they were previously shown with a green line), the probe discovered that these ports were no longer connected, which resulted in the green line becoming a black line.
Chapter 11. Monitoring
225
If these ports had never been connected to the switch, no lines will show for them, and we will only see six of the eight ports connected to the switch.
226
Our SVC will also be used in a Metro Mirror and Global Mirror relationship with another SVC cluster. In order for this configuration to be a supported configuration, we must make sure that every SVC in this cluster is zoned so that it can see every port in the remote cluster. In each fabric, we made a zone set called SVC_MM_NODE with all the node ports for all of the SVC nodes. We can check each SVC to make sure that all of its ports are in fact in this zone set. Figure 11-6 on page 228 shows that we have correctly configured all ports for the SVC cluster ITSO_CL1.
227
228
Figure 11-7 Verifying the health between two objects in the SVC
229
Figure 11-8 Kanaga has two HBAs but is only zoned into one fabric
Using the Fabric Manager component of TPC, we can quickly fix this situation. The fixed results are shown in Figure 11-9 on page 231.
230
You can also use the Data Path Viewer in TPC to check to confirm path connectivity between a disk that an operating system sees and the VDisk that the SVC provides. Figure 11-10 on page 232 shows two diagrams for the path information relating to host KANAGA: The top (left) diagram shows the path information before we fixed our zoning configuration. It confirms that KANAGA only has one path to the SVC VDisk vdisk4. Figure 11-8 on page 230 confirmed that KANAGA has two HBAs and that they are connected to our SAN fabrics. From this panel, we can deduce that our problem is likely to be a zoning configuration problem. The lower (right) diagram is the result that shows the zoning fixed. Figure 11-10 on page 232 does not show us that you can hover over each component to also get health and performance information, which might also be useful when you perform problem determination and analysis.
231
232
233
Thus, traffic between a host, the SVC nodes, and a storage controller goes through these paths: 1. The host generates the I/O and transmits it on the fabric. 2. The I/O is received on the SVC node ports. 3. If the I/O is a write I/O: a. The SVC node writes the I/O to the SVC node cache. b. The SVC node sends a copy to its partner node to write to the partner nodes cache. c. If the I/O is part of a Metro Mirror and Global Mirror, a copy needs to go to the target VDisk of the relationship. d. If the I/O is part of a FlashCopy and the FlashCopy block has not been copied to the target VDisk, this action needs to be scheduled. 4. If the I/O is a read I/O: a. The SVC needs to check the cache to see if the Read I/O is already there. b. If the I/O is not in the cache, the SVC needs to read the data from the physical LUNs. 5. At some point, write I/Os will be sent to the storage controller. 6. The SVC might also do some read ahead I/Os to load the cache in case the next read I/O from the host is the next block. TPC can help you report on most of these steps so that it is easier to identify where a bottleneck might exist.
235
An important metric in this report is the CPU utilization (in dark blue). The CPU utilization reports give you an indication of how busy the cluster CPUs are. A continually high CPU Utilization rate indicates a busy cluster. If the CPU utilization remains constantly high, it might be time to increase the cluster by adding more resources. You can add cluster resources by adding another I/O Group to the cluster (two nodes) up to the maximum of four I/O Groups per cluster. After there are four I/O Groups in a cluster and high CPU utilization is still indicated in the reports, it is time to build a new cluster and consider either migrating part of the storage to the new cluster or servicing new storage requests from it. We recommend that you plan additional resources for the cluster if your CPU utilization indicates workload continually above 70%. The cache memory resource reports provide an understanding of the utilization of the SVC cache. These reports provide you with an indication of whether the cache is able to service and buffer the current workload. In Figure 11-11, you will notice that there is an increase in the Write-cache Delay Percentage and Write-cache Flush Through Percentage and a drop in the Write-cache Hits Percentage, Read Cache Hits, and Read-ahead percentage of cache hits. This change is noted about halfway through the graph. This change in these performance metrics together with an increase in back-end response time shows that the storage controller is heavily burdened with I/O, and at this time interval, the SVC cache is probably full of outstanding write I/Os. (We expected this result with our test run.) Host I/O activity will now be impacted with the backlog of data in the SVC cache and with any other SVC workload that is happening on the same MDisks (FlashCopy and Global/Metro Mirror).
236
If cache utilization is a problem, you can add additional cache to the cluster by adding an I/O Group and moving VDisks to the new I/O Group.
Figure 11-12 Port receive and send data rate for each I/O Group
Figure 11-12 and Figure 11-13 on page 238 show two versions of port rate reports. Figure 11-12 shows the overall SVC node port rates for send and receive traffic. With a 2 Gb per second fabric, these rates are well below the throughput capability of this fabric, and thus, the fabric is not a bottleneck here. Figure 11-13 on page 238 shows the port traffic broken down into host, node, and disk traffic. During our busy time as reported in Figure 11-11 on page 236, we can see that host port traffic drops while disk port traffic continues. This information indicates that the SVC is communicating with the storage controller, possibly flushing outstanding I/O write data in the cache and performing other non-host functions, such as FlashCopy and Metro Mirror and Global Mirror copy synchronization.
237
Figure 11-13 Total port to disk, host, and local node report
Figure 11-14 on page 239 shows an example TPC report looking at port rates between the SVC nodes, hosts, and disk storage controllers. This report shows low queue and response times, indicating that the nodes do not have a problem communicating with each other. If this report showed unusually high queue times and high response times, our write activity (because each node communicates to each other node over the fabric) is affected. Unusually high numbers in this report indicate: SVC node or port problem (unlikely) Fabric switch congestion (more likely) Faulty fabric ports or cables (most likely)
238
Figure 11-14 Port to local node send and receive response and queue times
239
In Figure 11-15 on page 239, we see an unusual spike in back-end response time for both read and write operations, and this spike is consistent for both of our I/O Groups. This report confirms that we are receiving poor response from our storage controller and explains our lower than expected host performance. Our cache resource reports (in Figure 11-11 on page 236) also show an unusual pattern in cache usage during the same time interval. Thus, we can attribute the cache performance to be a result of the poor back-end response time that the SVC is receiving from the storage controller. The cause of this poor response time must be investigated using all available information from the SVC and the back-end storage controller. Possible causes, which might be visible in the storage controller management tool, include: Physical drive failure can lead to an array rebuild, which drives internal read/write workload in the controller while the rebuild is in progress. If this array rebuild is causing poor latency, it might be desirable to adjust the array rebuild priority to lessen the load. However, this priority must be balanced with the increased risk of a second drive failure during the rebuild, which can cause data loss in a Redundant Array of Independent Disks 5 (RAID 5) array. Cache battery failure can lead to cache being disabled by the controller, which can usually be resolved simply by replacing the failed battery.
240
By including a VDisk on a report, together with the LUNs from the storage controllers (which in turn are the MDisks over which the VDisks can be striped), you can see the performance that a host is receiving (through the VDisks) together with the impact on the storage controller (through the LUNs). Figure 11-16 shows a VDisk named IOTEST and the associated LUNs from our DS4000 storage controller. We can see which of the LUNs are being used while IOTEST is being used.
241
Script. The script option enables you to run a defined set of commands that might help address this event. For example, simply open a trouble ticket in your helpdesk ticket system. Notification by e-mail. TPC will send an e-mail to each person listed.
242
The graph from this report will show whether one MDisk group performs significantly worse than the other MDisk group. If there is a gap between the two MDisk groups, consider taking steps to avoid adverse performance impact, which might include: Migrating other, non-mirrored MDisks from the poorly performing MDisk group to allow more bandwidth for the mirrored VDisks I/O Migrating one of the mirrored VDisks copies to another MDisk group with spare performance capacity Accepting the current performance if the slower of the two MDisk groups is still reasonable
243
IBM_2145:itsosvccl1:admin>svcinfo lscluster itsosvccl1 email_server 9.43.86.82 email_server_port 25 email_reply noone@uk.ibm.com IBM_2145:itsosvccl1:admin>svcinfo lsemailuser id name address err_type 0 admin_email noone@uk.ibm.com all
user_type local
inventory off
IBM_2145:itsosvccl1:admin>svctask testemail admin_email CMMVC6280E Sendmail error EX_TEMPFAIL. The sendmail command could not create a connection to a remote system. Possible causes include: Ethernet connectivity issues between the SVC cluster and the mail server. For example, the SVC might be behind a firewall protecting the data center network, or even on a separate network segment that has no access to the mail server. As with the Master Console or System Storage Productivity Center (SSPC), the mail server must be accessible by the SVC. SMTP uses TCP port 25 (unless you have configured an alternative port); if there is a firewall, enable this port outbound from the SVC. Mail server relay blocking. Many administrators implement filtering rules to prevent spam, which is particularly likely if you are sending e-mail to a user who is on a different mail server or outside of the mail servers own network. On certain platforms, the default configuration prevents mail forwarding to any other machine. You must check the mail server log to see whether it is rejecting mail from the SVC. If it is, the mail server administrator must adjust the configuration to allow the forwarding of these e-mails. An invalid FROM address. Certain mail servers will reject e-mail if no valid FROM address is included. SVC takes this FROM address from the email_reply field of lscluster. Therefore, make sure that a valid reply-to address is specified when setting up e-mail. You can change the reply-to address by using the command svctask chemail -reply address If you cannot find the cause of the e-mail failure, contact your IBM service support representative (IBM SSR).
244
12
Chapter 12.
Maintenance
As with any piece of enterprise storage equipment, the IBM SAN Volume Controller (SVC) is not a completely hands-off device. It requires configuration changes to meet growing needs, updates to software for enhanced performance, features, and reliability, and the tracking of all the data that you used to configure your SVC.
245
12.1.1 SAN
Tracking how your SAN is configured is extremely important.
SAN diagram
The most basic piece of SAN documentation is the SAN diagram. If you ever call IBM Support asking for help with your SAN, you can be sure that the SAN diagram is likely to be one of the first things that you are asked to produce. Maintaining a proper SAN diagram is not as difficult as it sounds. It is not necessary for the diagram to show every last host and the location of every last port; this information is more properly collected (and easier to read) in other places. To understand how difficult an overly detailed diagram is to read, refer to Figure 12-1 on page 247.
246
Instead, a SAN diagram must only include every switch, every storage device, all inter-switch links (ISLs), along with how many there are, and a representation of which switches have hosts connected to them. An example is shown in Figure 12-2 on page 248. In larger SANs with many storage devices, the diagram can still be too large to print without a large-format printer, but it can still be viewed on a panel using the zoom feature. We suggest a tool, such as Microsoft Visio, to create your diagrams. Do not worry about finding fancy stencils or official shapes, because your diagram does not need to show exactly into which port everything is plugged. You can use your port inventory for that. Your diagram can be appropriately simple. You will notice that our sample diagram just uses simple geometric shapes and standard stencils to represent a SAN. Note: These SAN diagrams are just sample diagrams. They do not necessarily depict a SAN that you actually want to deploy.
247
Port inventory
Along with the SAN diagram, an inventory of what is supposed to be plugged in where is also quite important. Again, you can create this inventory manually or generate it with automated tools. Before using automated tools, remember that it is important that your inventory contains not just what is currently plugged into the SAN, but also what is supposed to be attached to the SAN. If a server has lost its SAN connection, merely looking at the current status of the SAN will not tell you where it was supposed to be attached. This inventory must exist in a format that can be exported and sent to someone else and retained in an archive for long-term tracking. The list, spreadsheet, database, or automated tool needs to contain the following information for each port in the SAN: The name of the attached device and whether it is a storage device, host, or another switch The port on the device to which the switch port is attached, for example, Host Slot 6 for a host connection or Switch Port 126 for an ISL The speed of the port If the port is not an ISL, list the attached worldwide port name (WWPN) For host ports or SVC ports, the destination aliases to which the host is zoned Automated tools, obviously, can do a decent job of keeping this inventory up-to-date, but even with a fairly large SAN, a simple database, combined with standard operating procedures, can be equally effective. For smaller SANs, spreadsheets are a time-honored and simple method of record keeping.
248
Zoning
While you need snapshots of your zoning configuration, you do not really need a separate spreadsheet or database just to keep track of your zones. If you lose your zoning configuration, you can rebuild the SVC parts from your zoning snapshot, and the host zones can be rebuilt from your port inventory.
12.1.2 SVC
For the SVC, there are several important components that you need to document.
12.1.3 Storage
Actually, for the LUNs themselves, you do not need to track anything outside of what is already in your configuration documentation for the MDisks, unless the disk array is also used for direct-attached hosts.
249
Support contract numbers Warranty end dates Current running code level Date that the code was last checked for updates
250
Obviously, you do not need to pull DS4x00 profiles if the only thing you are modifying is SAN zoning.
Abstract: Request__ABC456__ : Add new server __XYZ123__ to the SAN and allocate __200GB__ from SVC Cluster __1__ Date of Implementation: __08/01/2008__ Implementing Storage Administrator: Katja Gebuhr(x1234) Server Administrator: Jon Tate (x5678) Impact: None. This is a non-disruptive change. Risk: Low. Time estimate: __30 minutes__ Backout Plan: Reverse changes Implementation Checklist: 1. ___ Verify (via phone or e-mail) that the server administrator has installed all code levels listed on the intranet site http://w3.itsoelectronics.com/storage_server_code.html 2. ___ Verify that the cabling change request, __CAB927__ has been completed. 3. ___ For each HBA in the server, update the switch configuration spreadsheet with the new server using the information below. To decide on which SVC cluster to use: All new servers must be allocated to SVC cluster 2, unless otherwise indicated by the Storage Architect.
251
To decide which I/O Group to Use: These must roughly be evenly distributed. Note: If this is a high-bandwidth host, the Storage Architect might give a specific I/O Group assignment, which should be noted in the abstract. To select which Node Ports to Use: If the last digit of the first WWPN is odd (in hexadecimal, B, D, and F are also odd), use ports 1 and 3; if even, 2 and 4. HBA A: Switch: __McD_1__ Port: __47__ WWPN: __00:11:22:33:44:55:66:77__ Port Name:__XYZ123_A__ Host Slot/Port: __5__ Targets: __SVC 1, IOGroup 2, Node Ports 1__ HBA B: Switch: __McD_2__ Port: __47__ WWPN: __00:11:22:33:44:55:66:88__ Port Name:__XYZ123_B__ Host Slot/Port: __6__ Targets: __SVC 1, IOGroup 2, Node Ports 4__ 4. ___ Log in to EFCM and modify the Nicknames for the new ports (using the information above). 5. ___ Collect Data Collections from both switches and attach them to this ticket with the filenames of ticket_number>_<switch name_old.zip
6. ___ Add new zones to the zoning configuration using the standard naming convention and the information above. 7. ___ Collect Data Collections from both switches again and attach them with the filenames of <ticket_number>_<switch name>_new.zip 8. Log on to the SVC Console for Cluster __2__ and: ___ Obtain a config dump and attach it to this ticket under the filename <ticket_number>_<cluster_name>_old.zip ___ Add the new host definition to the SVC using the information above and setting the host type to __Generic__ Do not type in the WWPN. If it does not appear in the drop-down list, cancel the operation and retry. If it still does not appear, check zoning and perform other troubleshooting as necessary. ___ Create new VDisk(s) with the following parameters: To decide on the MDiskGroup: For current requests (as of 8/1/08) use ESS4_Group_5, assuming that it has sufficient free space. If it does not have sufficient free space, inform the storage architect prior to submitting this change ticket and request an update to these procedures. Use Striped (instead of Sequential) VDisks for all requests, unless otherwise noted in the abstract. Name: __XYZ123_1__ Size: __200GB__ IO Group: __2__ MDisk Group: __ESS 4_Group_5__ Mode: __Striped__
252
9. ___ Map the new VDisk to the Host 10.___ Obtain a config dump and attach it to this ticket under <ticket_number>_<cluster_name>_new.zip 11.___ Update the SVC Configuration spreadsheet using the above information, and the following supplemental data: Request: __ABC456__Project: __Foo__ 12.Also update the entry for the remaining free space in the MDiskGroup with the information pulled from the SVC console. 13.___ Call the Server Administrator in the ticket header and request storage discovery. Ask them to obtain a pathcount to the new disk(s). If it is not 4, perform necessary troubleshooting as to why there are an incorrect number of paths. 14.___ Request that the storage admin confirm R/W connectivity to the paths. 15.Make notes on anything unusual in the implementation here: ____
Note that the example checklist does not contain pages upon pages of screen captures or click Option A, select Option 7.... Instead, it assumes that the user of the checklist understands the basic operational steps for the environment. After the change is over, the entire checklist, along with the configuration snapshots, needs to be stored in a safe place, not the SVC or any other SAN-attached location. You must use detailed checklists even for non-routine changes, such as migration projects, to help the implementation go smoothly and provide an easy-to-read record of what was done. Writing a one-use checklist might seem horribly inefficient, but if you have to review the process for a complex project a few weeks after implementation, you might discover that your memory of exactly what was done is not as good as you thought. Also, complex, one-off projects are actually more likely to have steps skipped, because they are not routine.
253
Do not run software levels that are higher than what is recommended on those lists if possible. We do recognize that there can be situations where you need a particular code fix that is only available in a level of code later than what appears on the support matrix. If that is the case, contact your IBM marketing representative and ask for a Request for Price Quotation (RPQ); however, this particular type of modification usually does not cost you anything. These requests are relayed to IBM SVC Development and Test and are routinely granted. The purpose behind this process is to ensure that SVC Test has not run into an interoperability issue in the level of code that you want to run.
Check your switch logs for issues. Pay special attention to your SVC and storage ports. Things to look for are signal errors, such as Link Resets and cyclic redundancy check (CRC) errors, unexplained logouts, or ports in an error state. Also, make sure that your fabric is stable with no ISLs going up and down often. Examine the readme files or release notes for the code that you are preparing to upgrade. There can be important notes about required pre-upgrade dependencies, unfixed issues, necessary APARs, and so on. This requirement applies to all SAN-attached devices, such as your HBAs and switches, not just the SVC. You must also expect a write performance hit during an SVC upgrade. Because node resets are part of the upgrade, the write cache will be disabled on the I/O Group currently being upgraded.
255
256
The steps are: 1. Run datapath query WWPN, which will return output similar to: [root@abc]> datapath query wwpn Adapter Name PortWWN fscsi0 10000000C925F5B0 fscsi1 10000000C9266FD1 As you can see, the adapter that we want is fscsi0. 2. Next, cross-reference fscsi0 with the output of datapath query adapter: Active Adapters :4 Adpt# Name State 0 scsi3 NORMAL 1 scsi2 NORMAL 2 fscsi2 NORMAL 3 fscsi0 NORMAL Mode ACTIVE ACTIVE ACTIVE ACTIVE Select 129062051 88765386 407075697 341204788 Errors 0 303 5427 63835 Paths 64 64 1024 256 Active 0 0 0 0
From here, we can see that fscsi0 has the adapter ID of 3 in SDD. We will use this ID when taking the adapter offline prior to maintenance. Note how the SDD ID was 3 even though the adapter had been assigned the device name fscsi0 by the OS.
AIX
In AIX without the SDDPCM, if you do not properly manage a destination FCID change, running cfgmgr will create brand-new hdisk devices, all of your old paths will go into a defined state, and you will have difficulty removing them from your Object Data Manager (ODM) database. There are two ways of preventing this issue in AIX.
Dynamic Tracking
This is an AIX feature present in AIX 5.2 Technology Level (TL) 1 and higher. It causes AIX to bind hdisks to the WWPN instead of the destination FCID. However, this feature is not enabled by default, has extensive prerequisite requirements, and is disruptive to enable. For these reasons, we do not recommend that you rely on this feature to aid in scheduled changes. The alternate procedure is not particularly difficult, but if you are still interested in Dynamic Tracking, refer to the IBM System Storage Multipath Subsystem Device Driver Users Guide, SC30-4096, for full details. If you choose to use Dynamic Tracking, we strongly recommend that AIX is at the latest available TL. If Dynamic Tracking is enabled, no special procedures are necessary to change the FCID.
257
258
8. Plug the ISLs into the new switch and make sure that the new switch merges into the fabric successfully. 9. Attach the storage ports, making sure to use the same physical ports as the old switch. 10.Attach the SVC ports and perform appropriate maintenance procedures to bring the disk paths back online. 11.Attach the host ports and bring their paths back online.
259
12.5.2 Controllers
It is common to refer to disk controllers by part of their serial number, which helps facilitate troubleshooting by making the cross-referencing of logs easier. If you have a unique name, by all means, use it, but it is helpful to append the serial number to the end.
12.5.3 MDisks
The MDisks must most certainly be changed from the default name of mDisk X. The name must include the serial number of the controller, the array number/name, and the volume number/name. Unfortunately, you are limited to fifteen characters. This design builds a name similar to: 23K45_A7V10 - Serial 23K45, Array 7, Volume 10.
12.5.4 VDisks
The VDisk name must indicate for what host the VDisk is intended, along with any other identifying information that might distinguish this VDisk from other VDisks.
12.5.5 MDGs
MDG names must indicate from which controller the group comes, the RAID level, and the disk size and type. For example, 23K45_R1015k300 is an MDG on 23K45, RAID 10, 15k, 300 GB drives. (As with the other names on the SVC, you are limited to 15 characters).
260
13
Chapter 13.
261
13.1 Cabling
None of the advice in the following section is specific to the SVC. However, because cabling problems can produce SVC issues that will be troublesome and tedious to diagnose, reminders about how to structure cabling might be useful.
13.1.3 Labeling
All cables must be labeled at both ends with their source and destination locations. Even in the smallest SVC installations, a lack of cable labels quickly becomes an unusable mess when you are trying to trace problems. A small SVC installation consisting of a two-port storage subsystem, 10 hosts, and a single SVC cluster with two nodes will require 30 fiber-optic cables to set up. 262
SAN Volume Controller Best Practices and Performance Guidelines
To ensure that unambiguous information can be read from the labels, we recommend that you institute a standard labeling scheme to be used in your environment. The labels at both cable ends must be identical. An example labeling scheme consists of three lines per label, with the following content: Line 1: Cable first end physical location <-> Cable second end physical location Line 2: Cable first end device name and port number Line 3: Cable second end device name and port number For one of the SVC clusters that was used when writing this book, the label for both ends of the cable connecting SVC node 1, port 1 to the SAN switch, port 2 looks like: NSJ1R2U14 <-> NSJ1R3U16 itsosvccl1_n1 p1 IBM_2005_B5K_1 p2 In line one, NSJ refers to the site name, Rn is the rack number and Un is the rack unit number. Line two has the name of the SVC cluster node 1 together with port 1, and line three has the name of the corresponding SAN switch together with port 2. If your cabling installation includes patch panels in the cabling path, information about these patch panels must be included in the labeling. We recommend using a cable management system to keep track of cabling information and routing. For small installations, you can use a a simple spreadsheet, but for large data center, we recommend that you use one of the customized commercial solutions that are available. Note: We strongly recommend that you use only cable labels that are made for this purpose, because they have a specific adhesive that works well with the cable jacket. Otherwise, labels made for other purposes tend to lose their grip on the cable over time.
We do not recommend that you install more than 1 500 ports into a single rack cabinet.
Most SAN installations are far too dynamic for this idea to ever work. If you ever have to swap out a faulty line card/port blade, or even worse, a switch chassis, you will be presented with an inaccessible nightmare of cables. For this reason, we strongly advise you to use proper cable management trays and guides. As a general rule, cable management takes about as much space as your switches take.
263
bend radius will become even more important as SAN speeds increase. You can expect well over twice the number of physical layer issues at 4 Gbps as you might have seen in a 2 Gbps SAN. And, 8 Gbps will have even more stringent requirements. There are two major causes of insufficient bend radius: Incorrect use of server cable management arms. These hinged arms are extremely popular in racked server designs, including the IBM design. However, you must be careful to ensure that when these arms are slid in and out, the cables in the arm do not become kinked. Insufficient cable support. You cannot rely on the strain-relief boots built into the ends of the cable to provide support. Over time, your cables will inevitably sag if you rely on these strain-relief boots. A common scene in many data centers is a waterfall of cables hanging down from the SAN switch without any other support than the strain-relief boots. Use loosely looped cable ties or cable straps to support the weight of your cables. And as stated elsewhere, make sure that you install a proper cable management system.
13.2 Power
Because the SVC nodes can be compared to standard one unit rack servers, they have no particularly exotic power requirements. Nevertheless, it is often a source of field issues.
The most important consideration with the UPS units is to make sure that they are not cross-connected, which means that you must ensure that the serial cable and the power cable from a specific UPS unit connect to the same SVC node.
Also, remember that the function of the UPS units is solely to provide battery power to the SVC nodes long enough to copy the write cache from memory onto the internal disk of the nodes. The shutdown process will begin immediately when power is lost, and the shutdown cannot be stopped by bringing back power during the shutdown. The SVC nodes will restart 264
SAN Volume Controller Best Practices and Performance Guidelines
immediately when power is restored. Therefore, compare the UPS units to the built-in batteries found in most storage subsystem controllers, and do not think of them as substitutes to the normal data center UPS units. If you want continuous availability, you will need to provide other sources of backup power to ensure that the power feed to your SVC cluster is never interrupted.
13.3 Cooling
The SVC has no extraordinary cooling requirements. From the perspective of a data center space planner, it can be compared to a pack of standard one unit rack servers. The most important considerations are: The SVC nodes cools front-to-back. When installing the nodes, make sure that the node front faces toward where the cold air comes in. Fill empty spaces in your rack with filler panels to help prevent recirculating hot exhaust air back into the air intakes. The most common filler panels do not even require screws to mount. Data centers with rows of racks must be set up with hot and cold aisles. Air intakes must face the cold aisles, and hot air is then blown into the hot aisles. You do not want the hot air from one rack dumping into the intake of another rack.
Chapter 13. Cabling, power, cooling, scripting, support, and classes
265
In a raised-floor installation, the vent tiles must only be in the cold aisles. Vent tiles in the hot aisle can cause air recirculation problems. If you need to deploy fans on the floor to fix hot spots, you need to reevaluate your data center cooling configuration. Fans on the floor is a poor solution that will almost certainly lead to reduced equipment life. Instead, engage IBM, or any one of a number of professional data center contractors, to evaluate your cooling configuration. It might be possible to fix your cooling by reconfiguring existing airflow without having to purchase any additional cooling units.
266
You can obtain notifications for the SVC from the System Storage support notifications section of this Web site. You need an IBM ID to subscribe. If you do not have an IBM ID, you can create one (for free) by following a link from the sign-on page.
267
There are many other IBM Redbooks publications available that describe TPC, SANs, and IBM System Storage Products, as well as many other topics. To browse all of the IBM Redbooks publications about Storage, go to: http://www.redbooks.ibm.com/portals/Storage
13.7.2 Courses
IBM offers several courses to help you learn how to implement the SVC: SAN Volume Controller (SVC) - Planning and Implementation (ID: SN821) or SAN Volume Controller (SVC) Planning and Implementation Workshop (ID: SN830). These courses provide a basic introduction to SVC implementation. The workshop course includes a hands-on lab; otherwise, the course content is identical. IBM TotalStorage Productivity Center Implementation and Configuration (ID: SN856). This course is extremely useful if you plan to use TPC to manage your SVC environment. TotalStorage Productivity Center for Replication Workshop (ID: SN880). This course describes managing replication with TPC. The replication part of TPC is virtually a separate product from the rest of TPC, and it is not covered in the basic implementation and configuration course.
268
14
Chapter 14.
269
problem occurs in one of the SVC nodes. The fast node reset function means that SVC software problems can be recovered without the host experiencing an I/O error and without requiring the multipathing driver to fail over to an alternative path. The fast node reset is done automatically by the SVC node. This node will inform the other members of the cluster that it is resetting. Other than SVC node hardware and software problems, failures in the SAN zoning configuration are a problem. A misconfiguration in the SAN zoning configuration might lead to the SVC cluster not working, because the SVC cluster nodes communicate with each other by using the Fibre Channel SAN fabrics. You must check the following areas from the SVC perspective: The attached hosts Refer to 14.1.1, Host problems on page 270. The SAN Refer to 14.1.3, SAN problems on page 272. The attached storage subsystem Refer to 14.1.4, Storage subsystem problems on page 272. There are several SVC command line interface (CLI) commands with which you can check the current status of the SVC and the attached storage subsystems. Before starting the complete data collection or starting the problem isolation on the SAN or subsystem level, we recommend that you use the following commands first and check the status from the SVC perspective. You can use these helpful CLI commands to check the environment from the SVC perspective: svcinfo lscontroller controllerid Check that multiple worldwide port names (WWPNs) that match the back-end storage subsystem controller ports are available. Check that the path_counts are evenly distributed across each storage subsystem controller or that they are distributed correctly based on the preferred controller. Use the path_count calculation found in 14.3.4, Solving back-end storage problems on page 288. The total of all path_counts must add up to the number of managed disks (MDisks) multiplied by the number of SVC nodes. svcinfo lsmdisk Check that all MDisks are online (not degraded or offline). svcinfo lsmdisk mdiskid Check several of the MDisks from each storage subsystem controller. Are they online? And, do they all have path_count = number of nodes? svcinfo lsvdisk Check that all virtual disks (VDisks) are online (not degraded or offline). If the VDisks are degraded, are there stopped FlashCopy jobs? Restart these stopped FlashCopy jobs or delete the mappings. svcinfo lshostvdiskmap Check that all VDisks are mapped to the correct hosts. If a VDisk is not mapped correctly, create the necessary VDisk to host mapping.
271
svcinfo lsfabric Use of the various options, such as -controller, can allow you to check different parts of the SVC configuration to ensure that multiple paths are available from each SVC node port to an attached host or controller. Confirm that all SVC node port WWPNs are connected to the back-end storage consistently.
272
Example 14-1 shows how to obtain this information using the commands svcinfo lscontroller controllerid and svcinfo lsnode.
Example 14-1 The svcinfo lscontroller 0 command
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 12 WWPN 200500A0B8174433 path_count 4 max_path_count 8 IBM_2145:itsosvccl1:admin>svcinfo lsnode id name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware 6 Node1 1000739007 50050768010037E5 online 0 io_grp0 no 20400001C3240007 8G4 5 Node2 1000739004 50050768010037DC online 0 io_grp0 yes 20400001C3240004 8G4 4 Node3 100068A006 5005076801001D21 online 1 io_grp1 no 2040000188440006 8F4 8 Node4 100068A008 5005076801021D22 online 1 io_grp1 no 2040000188440008 8F4
Example 14-1 shows that two MDisks are present for the storage subsystem controller with ID 0, and there are four SVC nodes in the SVC cluster, which means that in this example the path_count is: 2 x 4 = 8 If possible, spread the paths across all storage subsystem controller ports, which is the case for Example 14-1 (four for each WWPN).
273
274
paths to both the preferred and non-preferred SVC nodes. For more information, refer to Chapter 9, Hosts on page 175. Check that paths are open for both preferred paths (with select counts in high numbers) and non-preferred paths (the * or nearly zero select counts). In Example 14-2, path 0 and path 2 are the preferred paths with a high select count. Path 1 and path 3 are the non-preferred paths, which show an asterisk (*) and 0 select counts.
Example 14-2 Checking paths
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 0 DEVICE NAME: Disk1 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018101BF2800000000000037 LUN IDENTIFIER: 60050768018101BF2800000000000037 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 1752399 0 1 * Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 0 0 2 Scsi Port3 Bus0/Disk1 Part0 OPEN NORMAL 1752371 0 3 * Scsi Port2 Bus0/Disk1 Part0 OPEN NORMAL 0 0
SDDPCM
SDDPCM has been enhanced to collect SDDPCM trace data periodically and to write the trace data to the systems local hard drive. SDDPCM maintains four files for its trace data: pcm.log pcm_bak.log pcmsrv.log pcmsrv_bak.log
275
Starting with SDDPCM 2.1.0.8, the relevant data for debugging problems is collected by running sddpcmgetdata. The sddpcmgetdata script collects information that is used for problem determination and then creates a tar file at the current directory with the current date and time as a part of the file name, for example: sddpcmdata_hostname_yyyymmdd_hhmmss.tar When you report an SDDPCM problem, it is essential that you run this script and send this tar file to IBM Support for problem determination. Refer to Example 14-3.
Example 14-3 Use of the sddpcmgetdata script (output shortened for clarity)
If the sddpcmgetdata command is not found, collect the following files: pcm.log pcm_bak.log pcmsrv.log pcmsrv_bak.log The output of the pcmpath query adapter command The output of the pcmpath query device command You can find these files in the /var/adm/ras directory.
SDDDSM
SDDDSM also provides the sddgetdata script to collect information to use for problem determination. SDDGETDATA.BAT is the batch file that generates the following files: The sddgetdata_%host%_%date%_%time%.cab file SDD\SDDSrv logs Datapath output Event logs Cluster log SDD specific registry entry HBA information Example 14-4 shows an example of this script.
Example 14-4 Use of the sddgetdata script for SDDDSM (output shortened for clarity)
C:\Program Files\IBM\SDDDSM>sddgetdata.bat Collecting SDD trace Data Collecting datapath command outputs Collecting SDD and SDDSrv logs Collecting Most current driver trace Generating a CAB file for all the Logs 276
SAN Volume Controller Best Practices and Performance Guidelines
sdddata_DIOMEDE_20080814_42211.cab file generated C:\Program Files\IBM\SDDDSM>dir Volume in drive C has no label. Volume Serial Number is 0445-53F4 Directory of C:\Program Files\IBM\SDDDSM 06/29/2008 04:22 AM 574,130 sdddata_DIOMEDE_20080814_42211.cab
#!/bin/ksh export PATH=/bin:/usr/bin:/sbin echo "y" | snap -r # Clean up old snaps snap -gGfkLN # Collect new; don't package yet cd /tmp/ibmsupt/other # Add supporting data cp /var/adm/ras/sdd* . cp /var/adm/ras/pcm* . cp /etc/vpexclude . datapath query device > sddpath_query_device.out datapath query essmap > sddpath_query_essmap.out pcmpath query device > pcmpath_query_device.out pcmpath query essmap > pcmpath_query_essmap.out sddgetdata sddpcmgetdata snap -c # Package snap and other data echo "Please rename /tmp/ibmsupt/snap.pax.Z after the" echo "PMR number and ftp to IBM." exit 0
277
IBM_2145:itsosvccl1:admin>svcinfo lsnode id name WWNN status 1 node1 50050768010037E5 online 2 node2 50050768010037DC online
IO_group_id 0 0
config_node no yes
The output that is shown in Example 14-6 shows that the node with ID 2 is the config node. So, for all nodes, except the config node, you must run the command svctask cpdumps. There is no feedback given for this command. Example 14-7 shows the command for the node with ID 1.
Example 14-7 Copy the dump files from the other nodes
IBM_2145:itsosvccl1:admin>svctask cpdumps -prefix /dumps 1 To collect all the files, including the config.backup file, trace file, errorlog file, and more, you need to run the svc_snap dumpall command. This command collects all of the data, including the dump files. To ensure that there is a current backup of the SVC cluster configuration, run a svcconfig backup before issuing the svc_snap dumpall command. Refer to Example 14-8 for an example run. It is sometimes better to use the svc_snap and ask for the dumps individually, which you do by omitting the dumpall parameter, which captures the data collection apart from the dump files. Note: Dump files are extremely large. Only request them if you really need them.
Example 14-8 The svc_snap dumpall command
IBM_2145:itsosvccl1:admin>svc_snap dumpall Collecting system information... Copying files, please wait... Copying files, please wait... Dumping error log... Waiting for file copying to complete... Waiting for file copying to complete... Waiting for file copying to complete... Waiting for file copying to complete... Creating snap package... Snap data collected in /dumps/snap.104603.080815.160321.tgz After the data collection with the svc_snap dumpall command is complete, you can verify that the new snap file appears in your 2145 dumps directory using this command, svcinfo ls2145dumps. Refer to Example 14-9 on page 279.
278
IBM_2145:itsosvccl1:admin>svcinfo ls2145dumps id 2145_filename 0 dump.104603.080801.161333 1 svc.config.cron.bak_node2 . . 23 104603.trc 24 snap.104603.080815.160321.tgz To copy the file from the SVC cluster, use secure copy (SCP). The PuTTY SCP function is described in more detail in IBM System Storage SAN Volume Controller V4.3, SG24-6423. Information: If there is no dump file available on the SVC cluster or for a particular SVC node, you need to contact your next level of IBM Support. The support personnel will guide you through the procedure to take a new dump.
IBM_2005_B5K_1:admin> supportSave This command will collect RASLOG, TRACE, supportShow, core file, FFDC data and other support information and then transfer them to a FTP/SCP server or a USB device. This operation can take several minutes. NOTE: supportSave will transfer existing trace dump file first, then automatically generate and transfer latest one. There will be two trace dump files transfered after this command. OK to proceed? (yes, y, no, n): [no] y Host IP or Host Name: 9.43.86.133 User Name: fos Password: Protocol (ftp or scp): ftp Remote Directory: / Saving support information for switch:IBM_2005_B5K_1, module:CONSOLE0... ..._files/IBM_2005_B5K_1-S0-200808132042-CONSOLE0.gz: 5.77 kB 156.68 kB/s Saving support information for switch:IBM_2005_B5K_1, module:RASLOG... ...files/IBM_2005_B5K_1-S0-200808132042-RASLOG.ss.gz: 38.79 kB 0.99 MB/s
279
Saving support information for switch:IBM_2005_B5K_1, ...M_2005_B5K_1-S0-200808132042-old-tracedump.dmp.gz: Saving support information for switch:IBM_2005_B5K_1, ...M_2005_B5K_1-S0-200808132042-new-tracedump.dmp.gz: Saving support information for switch:IBM_2005_B5K_1, ...les/IBM_2005_B5K_1-S0-200808132042-ZONE_LOG.ss.gz: Saving support information for switch:IBM_2005_B5K_1, ..._files/IBM_2005_B5K_1-S0-200808132044-CONSOLE1.gz: Saving support information for switch:IBM_2005_B5K_1, ..._files/IBM_2005_B5K_1-S0-200808132044-sslog.ss.gz: SupportSave completed IBM_2005_B5K_1:admin>
module:TRACE_OLD... 239.58 kB 3.66 MB/s module:TRACE_NEW... 1.04 MB 1.81 MB/s module:ZONE_LOG... 51.84 kB 1.65 MB/s module:RCS_LOG... 5.77 kB 175.18 kB/s module:SSAVELOG... 1.87 kB 55.14 kB/s
If you have the group manager license for EFCM, you can collect data from multiple switches in one run. Refer to Figure 14-2 on page 281. 280
To collect data when you are in the EFCM Group Manager, select Run Data Collection as the Group Action (Figure 14-3). From this point, a wizard will guide you through the data collection process. Name the generated zipped file to reflect your problem ticket number before uploading the file to IBM Support.
Figure 14-3 Selecting the data collection action in EFCM Group Manager
281
282
From this IBM Support Web page, you can obtain various types of support by following the links that are provided on this page. To review the SVC Web page for the latest flashes, the concurrent code upgrades, code levels, and matrixes, go to: http://www.ibm.com/storage/support/2145/
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018381BF2800000000000027 LUN IDENTIFIER: 60050768018381BF2800000000000027 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port2 Bus0/Disk4 Part0 CLOSE OFFLINE 218297 0 1 * Scsi Port2 Bus0/Disk4 Part0 CLOSE OFFLINE 0 0 2 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 222394 0 3 * Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 Based on our field experience, we recommend that you check the hardware first: Check if any connection error indicators are lit on the host or SAN switch. Check if all of the parts are seated correctly (cables securely plugged in the SFPs, and the SFPs plugged all the way into the switch port sockets). Ensure that there are no broken fiber optic cables (if possible, swap the cables to cables that are known to work).
283
After the hardware check, continue to check the software setup: Check that the HBA driver level and firmware level are at the recommended and supported levels. Check the multipathing driver level, and make sure that it is at the recommended and supported level. Check for link layer errors reported by the host or the SAN switch, which can indicate a cabling or SFP failure. Verify your SAN zoning configuration. Check the general SAN switch status and health for all switches in the fabric. In Example 14-12, we discovered that one of the HBAs was experiencing a link failure due to a fiber optic cable that has been bent over too far. After we changed the cable, the missing paths reappeared.
Example 14-12 Output from datapath query device command after fiber optic cable change
C:\Program Files\IBM\Subsystem Device Driver>datapath query device -l Total Devices : 1 DEV#: 3 DEVICE NAME: Disk4 Part0 TYPE: 2145 POLICY: OPTIMIZED SERIAL: 60050768018381BF2800000000000027 LUN IDENTIFIER: 60050768018381BF2800000000000027 ============================================================================ Path# Adapter/Hard Disk State Mode Select Errors 0 Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 218457 1 1 * Scsi Port3 Bus0/Disk4 Part0 OPEN NORMAL 0 0 2 Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 222394 0 3 * Scsi Port2 Bus0/Disk4 Part0 OPEN NORMAL 0 0
284
When you click Sense Expert, the sense data is translated into data that is more clearly explained and more easily understood, as shown in Figure 14-6 on page 286.
285
Another common practice is to use the SVC CLI to find problems. The following list of commands provides you with information about the status of your environment: svctask detectmdisk (discovers any changes in the back-end storage configuration) svcinfo lscluster clustername (checks the SVC cluster status) svcinfo lsnode nodeid (checks the SVC nodes and port status) svcinfo lscontroller controllerid (checks the back-end storage status) svcinfo lsmdisk (provides a status of all the MDisks) svcinfo lsmdisk mdiskid (checks the status of a single MDisk) svcinfo lsmdiskgrp (provides a status of all the MDisk groups) svcinfo lsmdiskgrp mdiskgrpid (checks the status of a single MDisk group) svcinfo lsvdisk (checks if VDisks are online) Important: Although the SVC raises error messages, most problems are not caused by the SVC. Most problems are introduced by the storage subsystems or the SAN.
286
If the problem is caused by the SVC and you are unable to fix it either with the Run Maintenance Procedure function or with the error log, you need to collect the SVC debug data as explained in 14.2.2, SVC data collection on page 277. If the problem is related to anything outside of the SVC, refer to the appropriate section in this chapter to try to find and fix the problem.
287
zone:
The correct zoning must look like the zoning shown in Example 14-14.
Example 14-14 Correct WWPN zoning
zone:
The following SVC error codes are related to the SAN environment: Error 1060 Fibre Channel ports are not operational. Error 1220 A remote port is excluded. If you are unable to fix the problem with these actions, use 14.2.3, SAN data collection on page 279, collect the SAN switch debugging data, and then contact IBM Support.
Typical problems for storage subsystem controllers include incorrect configuration, which results in a 1625 error code. Other problems related to the storage subsystem are failures pointing to the managed disk I/O (error code 1310), disk media (error code 1320), and error recovery procedure (error code 1370). However, all messages do not have just one explicit reason for being issued. Therefore, you have to check multiple areas and not just the storage subsystem. Next, we explain how to determine the root cause of the problem and in what order to start checking: 1. Run the maintenance procedures under SVC. 2. Check the attached storage subsystem for misconfigurations or failures. 3. Check the SAN for switch problems or zoning failures. 4. Collect all support data and involve IBM Support. Now, we look at these steps sequentially: 1. Run the maintenance procedures under SVC. To run the SVC Maintenance Procedures, open the SVC Console GUI. Select Service and Maintenance Run Maintenance Procedures. On the Maintenance Procedures panel that appears in the right pane, click Start Analysis (Figure 14-7).
For more information about how to use the SVC Maintenance Procedures, refer to IBM System SAN Volume Controller V4.3, SG24-6423-06, or the SVC Service Guide, S7002158. 2. Check the attached storage subsystem for misconfigurations or failures: a. Independent of the type of storage subsystem, the first thing for you to check is whether there are any open problems on the system. Use the service or maintenance features provided with the storage subsystem to fix these problems. b. Then, check if the LUN masking is correct. When attached to the SVC, you have to make sure that the LUN masking maps to the active zone set on the switch. Create a similar LUN mask for each storage subsystem controller port that is zoned to the SVC. Also, observe the SVC restrictions for back-end storage subsystems, which can be found at: http://www-1.ibm.com/support/docview.wss?rs=591&uid=ssg1S1003283 c. Next, we show an example of a misconfigured storage subsystem, and how this misconfigured storage system will appear from the SVCs point of view. Furthermore, we explain how to fix the problem. By running the svcinfo lscontroller ID command, you will get output similar to the output that is shown in Example 14-15 on page 290. As highlighted in the example, the MDisks, and therefore, the LUNs, are not equally allocated. In our example, the LUNs provided by the storage subsystem are only visible by one path, which is storage subsystem WWPN.
289
IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 8 max_path_count 12 WWPN 200500A0B8174433 path_count 0 max_path_count 8 This imbalance has two possible causes: If the back-end storage subsystem implements a preferred controller design, perhaps the LUNs are all allocated to the same controller. This situation is likely with the IBM System Storage DS4000 series, and you can fix it by redistributing the LUNs evenly across the DS4000 controllers and then rediscovering the LUNs on the SVC. Because we used a DS4500 storage subsystem (type 1742) in the Example 14-15, we need to check for this situation. Another possible cause is that the WWPN with zero count is not visible to all the SVC nodes via the SAN zoning or the LUN masking on the storage subsystem. Use the SVC CLI command svcinfo lsfabric 0 to confirm.
If you are unsure which of the attached MDisks has which corresponding LUN ID, use the SVC CLI command svcinfo lsmdisk (refer to Example 14-16). This command also shows to which storage subsystem a specific MDisk belongs (the controller ID).
Example 14-16 Determine the ID for the MDisk
IBM_2145:itsosvccl1:admin>svcinfo lsmdisk id name status mode mdisk_grp_id mdisk_grp_name capacity ctrl_LUN_# controller_name UID 0 mdisk0 online managed 0 MDG-1 600.0GB 0000000000000000 controller0 600a0b800017423300000059469cf84500000000000000000000000000000000 2 mdisk2 online managed 0 MDG-1 70.9GB 0000000000000002 controller0 600a0b800017443100000096469cf0e800000000000000000000000000000000 The problem turned out to be with the LUN allocation across the DS4500 controllers. After fixing this allocation on the DS4500, an SVC MDisk rediscovery fixed the problem from the SVCs point of view. Example 14-17 on page 291 shows an equally distributed MDisk.
290
IBM_2145:itsosvccl1:admin>svctask detectmdisk IBM_2145:itsosvccl1:admin>svcinfo lscontroller 0 id 0 controller_name controller0 WWNN 200400A0B8174431 mdisk_link_count 2 max_mdisk_link_count 4 degraded no vendor_id IBM product_id_low 1742-900 product_id_high product_revision 0520 ctrl_s/n WWPN 200400A0B8174433 path_count 4 max_path_count 12 WWPN 200500A0B8174433 path_count 4 max_path_count 8 d. In our example, the problem was solved by changing the LUN allocation. If step 2 did not solve the problem, you need to continue with step 3. 3. Check the SANs for switch problems or zoning failures. Many situations can cause problems in the SAN. Refer to 14.2.3, SAN data collection on page 279 for more information. 4. Collect all support data and involve IBM Support. Collect the support data for the involved SAN, SVC, or storage systems as described in 14.2, Collecting data and isolating the problem on page 274.
291
The svcinfo lsmdisk command (Are all MDisks online now?) The svcinfo lscontroller controllerid command (checks that the path_counts are distributed somewhat evenly across the WWPNs) Finally, run the maintenance procedures on the SVC to fix every error.
14.4 Livedump
SVC livedump is a procedure that IBM Support might ask your clients to run for problem
investigation. Note: Only invoke the SVC livedump procedure under the direction of IBM Support. Sometimes, investigations require a livedump from the configuration node in the SVC cluster. A livedump is a lightweight dump from a node, which can be taken without impacting host I/O. The only impact is a slight reduction in system performance (due to reduced memory being available for the I/O cache) until the dump is finished. The instructions for a livedump are: 1. Prepare the node for taking a livedump: svctask preplivedum <node id/name> This command will reserve the necessary system resources to take a livedump. The operation can take some time, because the node might have to flush data from the cache. System performance might be slightly affected after running this command, because part of the memory, which normally is available to the cache, is not available while the node is prepared for a livedump. After the command has completed, then the livedump is ready to be triggered, which you can see by looking at the output from svcinfo lslivedump <node id/name>. The status must be reported as prepared. 2. Trigger the livedump: svctask triggerlivedump <node id/name> This command completes as soon as the data capture is complete, but before the dump file has been written to disk. 3. Query the status and copy the dump off when complete: svcinfo lslivedump <nodeid/name> The status shows dumping while the file is being written to disk and inactive after it is completed. After the status returns to the inactive state, you can find the livedump file in /dumps on the node with a filename of the format: livedump.<panel_id>.<date>.<time> You can then copy this file off the node, just as you copy a normal dump, by using the GUI or SCP. The dump must then be uploaded to IBM Support for analysis.
292
15
Chapter 15.
293
294
In Figure 15-2 on page 296, we show the improvement for throughput. Because the SPC-2 benchmark was only introduced in 2006, this graph is of necessity over a shorter time span.
295
296
Figure 15-3 Comparison of a software only upgrade to a full upgrade of an 8F4 node (variety of workloads, I/O rate times 1000)
As you can see in Figure 15-3, significant gains can be achieved with the software-only upgrade. The 70/30 miss workload, consisting of 70 percent read misses and 30 percent write misses, is of special interest. This workload contains a mix of both reads and writes, which we ordinarily expect to see under production conditions. Figure 15-4 on page 298 presents another view of the effect of moving to the latest level of software and hardware.
297
Figure 15-5 presents a more detailed view of performance on this specific workload. Figure 15-5 shows that the SVC 4.2 software-only upgrade boosts the maximum throughput for the 70/30 workload by more than 30%. Thus, a significant portion of the overall throughput gain achieved with full hardware and software replacement comes from the software enhancements.
25
20
15
10
4.1.0 8F4
4.2.0 8F4
4.2.0 8G4
Figure 15-5 Comparison of a software only upgrade to a full upgrade of an 8F4 node 70/30 miss workload
298
Figure 15-6 OLTP workload performance with two, four, six, or eight nodes
Figure 15-7 on page 300 presents the database scalability results at a higher level by pulling together the maximum throughputs (observed at a response time of 30 milliseconds or less) for each configuration. The latter figure shows that SVC performance scales in a nearly linear manner depending upon the number of nodes.
299
As Figure 15-6 on page 299 and Figure 15-7 show, the tested SVC configuration is capable of delivering over 270 000 I/Os per second (IOPS) for the OLTP workload. You are encouraged to compare this result against any other disk storage product currently posted on the SPC Web site at: http://www.storageperformance.org
300
Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.
Other resources
These publications are also relevant as further information sources: IBM System Storage Open Software Family SAN Volume Controller: Planning Guide, GA22-1052 IBM System Storage Master Console: Installation and Users Guide, GC30-4090 IBM System Storage Open Software Family SAN Volume Controller: Installation Guide, SC26-7541 IBM System Storage Open Software Family SAN Volume Controller: Service Guide, SC26-7542 IBM System Storage Open Software Family SAN Volume Controller: Configuration Guide, SC26-7543 IBM System Storage Open Software Family SAN Volume Controller: Command-Line Interface User's Guide, SC26-7544 IBM System Storage Open Software Family SAN Volume Controller: CIM Agent Developers Reference, SC26-7545 IBM TotalStorage Multipath Subsystem Device Driver Users Guide, SC30-4096 IBM System Storage Open Software Family SAN Volume Controller: Host Attachment Guide, SC26-7563 IBM System Storage SAN Volume Controller V4.3, SG24-6423-06 Implementing the SVC in an OEM Environment, SG24-7275 IBM TotalStorage Productivity Center V3.1: The Next Generation, SG24-7194 TPC Version 3.3 Update Guide, SG24-7490 Implementing an IBM/Brocade SAN, SG24-6116 Implementing an IBM/Cisco SAN, SG24-7545
301
IBM System Storage/Brocade Multiprotocol Routing: An Introduction and Implementation, SG24-7544 IBM System Storage/Cisco Multiprotocol Routing: An Introduction and Implementation, SG24-7543 Considerations and Comparisons between IBM SDD for Linux and DM-MPIO, which is available at: http://www-1.ibm.com/support/docview.wss?rs=540&context=ST52G7&q1=linux&uid=ssg 1S7001664&loc=en_US&cs=utf-8&lang=en TotalStorage Productivity Center User Guide, which is located at: http://publib.boulder.ibm.com/infocenter/tivihelp/v4r1/topic/com.ibm.itpc.doc/t pcugd31389.htm
302
Related publications
303
304
Index
Numerics
1862 error 99 2-way write-back cached 120 500 84 array overdriving 80 arrays 2, 24, 66, 86, 102, 127, 160, 211, 246 ASIC 9 Assist 50 asynchronous 80, 134, 162 asynchronously 162 attached 3, 58, 84, 123, 175, 248, 270 attention 10, 255 attributes 83 audit 49 audit log 49 audit log file 49 Audit logging 49 audit logging facility 49 auto 124 Auto-Expand 121 Automated configuration backup 54 automatically discover 187 automation 42, 133 auxiliary 171 availability 10, 66, 86, 102, 182, 265
A
access 2, 24, 58, 86, 109, 125, 163, 177, 222, 258 access pattern 130 accident 163 action commands 49 active 42, 58, 111, 162, 201, 226, 289 Active Directory domain 39 adapters 67, 109, 177, 219, 230, 256 address 120 Address Resolution Protocol 41 adds 77, 210 Admin 212 admin password 53 administraive rights 48 administration 88, 251 administrative access 49 administrative rights 48 administrator 24, 75, 163, 251, 270 administrators 205, 211, 246, 294 advanced copy 24, 164 aggregate 58, 108 AIX 63, 176, 207, 256 AIX host 186, 193, 277 AIX LVM admin roles 212 alert 9, 167, 222 alerts 3, 241, 266 algorithms 129 Alias 17 alias 16 aliases 14, 248 alignment 215 amount of I/O 30, 104, 130, 169 analysis 76, 170, 231, 292 antivirus software 38 AOS 50 application availability 86, 102, 219 performance 86, 102, 127, 162, 208, 233 Application Specific Integrated Circuit 9 application testing 158 applications 21, 24, 103, 130, 162, 177, 207 architecture 58, 116, 191, 294 architectures 109, 199 area 189, 217, 274 areas 175, 209, 270 ARP 41 ARP entry 41 array 2, 24, 57, 66, 8586, 102, 104, 138, 160, 169, 201, 210, 249, 262 Copyright IBM Corp. 2008. All rights reserved.
B
backend storage controller 148 back-end storage controllers 169 background copy 167 background copy rate 167 backplane 9 backup 3, 53, 159, 198, 209, 265, 278 backup files 53 backup node 15 backup sessions 215 balance 15, 59, 96, 102, 123, 167, 183, 211 balance the workload 129 balanced 15, 59, 114, 145, 178, 216 balancing 19, 96, 123, 178, 213, 215 band 138 Bandwidth 175 bandwidth 2, 25, 68, 112, 130, 160, 178, 209, 252 bandwidth requirements 21 baseline 78, 141 Basic 4, 41 basic 2, 25, 138, 176, 246, 268, 272 beat effect 218 best practices xiii, 1, 86, 102, 122, 162, 175 between 3, 31, 58, 86, 103, 123, 177, 210, 222, 265, 293 BIOS 35, 200 blade 14 BladeCenter 22 blades 14 block 67, 123, 152, 208, 235 block size 67, 144, 210 blocking 2 blocks 129
305
BM System Storage SAN Volume Controller Host Attachment Users Guide Version 4.2.0 175, 200 boot 178 boot device 196 bottlenecks 144, 208 boundary crossing 215 bridge 6 Brocade 27, 279 buffer 152, 236 buffers 124, 161, 176, 220 bus 24, 188, 230, 294
C
cache 2, 57, 85, 104, 126, 176, 208210, 235, 255, 264, 292, 294 cache disabled 133, 155 cache enabled 133 cache mode 135 cache-disabled VDisk 133134 Cache-disabled VDisks 133 cache-enabled 162 cache-enabled VDisk 133 caching 24, 41, 66, 85, 130, 133, 164 caching mechanism 133 cap 104 capacity 8, 24, 85, 123, 214, 249, 290 cards 58, 200 certified 20, 262 changes 3, 29, 78, 85, 144, 164, 176, 219, 245, 266, 270 channel 194 chdev 193 choice 3031, 67, 86, 130, 183 CIMOM 42, 222 Cisco 2, 27, 250, 263, 280 classes 103, 267 CLI 61, 88, 123, 153, 188, 233, 271 commands 69, 88, 286 client 197, 215 cluster 2, 23, 38, 55, 58, 84, 102, 123, 177, 222, 251, 271, 298 creation 52, 123 IP address 52, 223 cluster connection problems 50 cluster ID 49 cluster IP address 41 cluster partnership 54 cluster state information 41 clustering 191 clustering software 191 clusters 20, 24, 191, 222, 246 code update 34 combination 145, 162, 251 command 42, 59, 88, 123, 153, 179, 214, 222, 258, 266, 275 command prompt 51 commit 155 Common Information Model Object Manager 42 compatibility 34, 39, 254 complexity 11, 294 conception 12
concurrent 34, 42, 144, 189, 287 config node 41 configuration 1, 25, 41, 57, 84, 102, 162, 176, 208, 222, 245, 266, 271, 299 configuration backup 54 configuration backup file 54 configuration changes 187 configuration data 187, 282 configuration file 53 configuration node 52, 292 configuration parameters 171, 188 configure 86, 194, 219, 222, 245 congested 9 congestion 2, 238 control 3 connected 2, 58, 175, 223, 247, 264, 272 connection 42, 74, 192, 222, 248 connections 8, 42, 58, 196 connectivity 195, 222, 253, 270, 299 consistency 204 consistent 139, 170, 204, 240, 251 consolidation 102, 294 container 215 containers 215, 217 control 24, 71, 96, 133, 177, 212, 246, 294 controller port 84 copy 24, 103, 124, 204, 235, 246, 264, 278 copy rate 155 copy services 24, 31, 124 core 262 core switch 4, 8 core switches 10 core/edge ASIC 9 core-edge 5 correctly configured 170, 224 corrupted 204 corruption 20, 73 cost 20, 86, 102, 164, 254 counters 205, 234 create a FlashCopy 155 credentials 52 critical 66, 93, 208 cross-bar architecture 9 current 25, 65, 164, 188, 223, 248, 271 CWDM 20
D
data 3, 24, 59, 85, 162, 177, 208, 245, 262, 269 consistency 156 data formats 198 data integrity 126, 153 data layout 115, 124, 211 Data layout strategies 219 data migration 160, 198 data mining 158 data path 77 data pattern 208 data rate 104, 144, 174, 234 data structures 216 data traffic 9
306
database 3, 79, 130, 156, 185, 209, 234, 248, 270, 299 log 210 Database Administrator 213 date 223, 248, 276 DB2 container 216 DB2 I/O characteristics 216 db2logs 216 DBA 213 debug 75, 274 dedicate bandwidth 21 dedicated ISLs 9 default 58, 123, 167, 179, 223, 257 default values 67 defined 18, 148, 152, 210, 226, 257 degraded 141, 162, 271 delay 139, 156 delete a VDisk 125 deleted 155 demand 103, 299 dependency 114 design 1, 24, 79, 103, 138, 184, 215, 263 destage 66, 85, 138 device 2, 66, 109, 138, 164, 179, 213, 225, 245, 274 device driver 164, 191 diagnose 15, 170, 262 diagnostic 192, 282 different vendors 164 director 10 directors 10 directory I/O 120 disabled 133, 255 disaster 29, 163, 204, 263 discovery 59, 96, 186, 253 disk 2, 24, 64, 83, 102, 123, 152, 185, 208, 231, 246, 264, 280, 300 latency 208 disk access profile 130 disk groups 29 Disk Magic 144 disruptive 3, 34, 123, 251 distance 20, 164, 262 limitations 20 distance extension 21 distances 20 DMP 184 documentation 1, 246 domain 72 Domain ID 20, 258 domain ID 20 Domain IDs 20 domains 102 download 206, 287 downtime 156 driver 34, 58, 164, 191, 258, 270 drops 108, 237 DS4000 58, 88, 105, 206, 241, 281 DS4000 Storage Server 209 DS4100 84
DS4500 224 DS4800 17, 67, 84 DS6000 58, 88, 105, 195, 282 DS8000 18, 58, 84, 104, 195, 282 dual fabrics 14 dual-redundant switch controllers 9 DWDM 20
E
edge 2 edge switch 3 edge switches 45, 10 efficiency 129 egress 9 element 25 eliminates 76 e-mail 21, 205, 251 EMC 57 EMC Symmetrix 59 enable 11, 24, 53, 127, 156, 194, 209, 233, 257 enforce 8 Enterprise 58, 241, 250, 280 error 20, 42, 57, 88, 176, 222, 246, 270 Error Code 65 error handling 65 error log 64, 254, 274 error logging 63 errors 20, 176, 254, 270 ESS 58, 88, 105 Ethernet 2, 52 evenly balancing I/Os 218 event 3, 58, 102, 130, 163, 194, 241 events 164, 241, 284 exchange 156 execution throttle 200 expand 30 expansion 3, 217 extenders 164 extension 20 extent 29, 68, 83, 123, 210, 215 size 123, 215 extent size 123, 214 extent sizes 123, 214 extents 57, 123, 215
F
Fabric 22, 32, 230, 250, 280 fabric 1, 25, 144, 160, 176, 223, 270 isolation 183 login 185 fabric outage 3 Fabric Watch 10 fabrics 5, 177, 223 failover 58, 130, 176, 271 failure boundaries 103, 213 failure boundary 213 FAStT 14, 200 storage 14 FAStT200 84
Index
307
fastwrite cache 120 fault isolation 11 fault tolerant 86 FC 2, 67, 185 fcs 19, 193, 256 fcs device 194 features 24, 164, 196, 245, 264, 270 Fibre Channel 2, 58, 164, 175, 257, 262, 270 ports 21, 58 routers 164 traffic 3 Fibre Channel (FC) 177 Fibre Channel ports 58, 178, 258, 288 file level access control 49 file system 152, 201, 216 file system directories 217 file system level 204 filesets 197 firmware 170, 256 flag 126, 169 FlashCopy 27, 63, 85, 114, 124, 235, 271 applications 65, 114 mapping 76 prepare 155 rules 161 source 64, 124 Start 124 target 134, 235 FlashCopy mapping 153 FlashCopy mappings 125 flexibility 25, 130, 164, 190, 246 flow 3, 145 flush the cache 188 force flag 126 format 49, 75, 198, 247, 267, 292 frames 2 free extents 129 front panel 42 full bandwidth 10 fully allocated copy 127 fully allocated VDisk 127 function 61, 117, 163, 200, 264 functions 24, 63, 152, 195, 237, 272
growth 78, 217 GUI 12, 34, 59, 88, 123, 183, 222, 266, 277 GUI session 46
H
HACMP 42, 195 hardware 2, 25, 38, 58, 87, 103, 170, 199, 249, 271, 294 HBA 21, 24, 183, 193, 200, 230, 251, 270 HBAs 12, 142, 177178, 200, 209, 230, 255 health 196, 222, 284 healthy 171, 225 heartbeat 165 help 8, 42, 104, 119, 160, 193, 211, 235, 246, 266, 270 heterogeneous 24, 272 high-bandwidth 10 high-bandwidth hosts 4 hops 3 host 2, 24, 58, 84, 112, 123, 162, 175, 207, 246, 270 configuration 15, 125, 161, 211, 272 creating 17 definitions 125, 186, 209 HBAs 15 information 35, 184, 231, 252, 275 systems 30, 175, 209, 270 zone 14, 123, 177, 272 host bus adapter 199 host level 178 host mapping 138, 178, 271 host type 58, 252 host zones 17, 249
I
I/O governing 130 I/O governing rate 133 I/O group 8, 27, 123, 183, 235, 252, 265 I/O Groups 129 I/O groups 16, 123, 174, 187, 240 I/O performance 194, 217 I/O rate setting 132 I/O response time 141 I/O workload 213 IBM Subsystem Device Driver 58, 88, 125126, 164, 195 IBM TotalStorage Productivity Center 22, 167, 223, 267, 274, 301 identification 93, 179 identify 57, 88, 103, 195, 234 identity 49 IDs and passwords 52 IEEE 198 image 27, 86, 123, 152, 178, 211, 249 Image mode 32, 127, 162 image mode 30, 124, 185 image mode VDisk 124, 218 Image Mode VDisks 164 image mode virtual disk 134 image type VDisk 127 implement 3, 29, 31, 88, 199, 251, 268 implementing xiii, 1, 104, 191 import 124
G
GB 250 Gb 67, 237 General Public License (GNU) 206 Global 227 Global Mirror 228 Global Mirror relationship 162 gmlinktolerance 167 GNU 206 governing throttle 130 grain 85 granularity 123, 204 graph 79, 144, 236, 295 graphs 189 group 8, 69, 102, 123, 178, 210, 233, 249, 262, 280 groups 9, 29, 70, 83, 119, 179, 212, 236, 287, 299
308
import failed 99 improvements 27, 31, 114, 145, 196, 293 Improves 24 in-band 138 information 1, 59, 129, 185, 207, 222, 246, 266, 270 infrastructure 103, 133, 164, 224, 254 ingress 9 initial configuration 182 initiating 76 initiators 84, 191 install 5, 51, 160, 199, 233, 254 installation 1, 87, 222, 246, 262 insufficient bandwidth 3 integrity 126, 152 Inter Switch Link 2 interface 24, 153, 175, 222, 259 Internet Protocol 21 interoperability 21, 254 interval 170, 234 inter-VSAN routing 11 introduction 77, 268, 294 iogrp 126, 178 IOPS 177, 208, 294 IP 2021, 222 IP communication 41 IP connectivity considerations 41 IP traffic 21 IPv4 41 IPv6 41 IPv6 communication 39 ISL 2, 248 ISL capacity 10 ISL links 5 ISL oversubscription 3 ISL trunks 10 ISLs 3, 247 isolated 72, 183 isolation 2, 59, 88, 104, 183, 271 IVR 11
J
journal 201, 210
K
kernel 200 key 185, 215, 250 key based SSH communications 46 key pairs 48 keys 46, 48, 192
library 201 license 29, 280 light 104, 208, 283 limitation 42, 189, 234 limitations 1, 29, 42, 163, 210, 282 limiting factor 138 limits 24, 139, 162, 189, 219 lines of business 213 link 2, 29, 42, 162, 198 bandwidth 21, 165 latency 165 link quality 10 link reset 10 links 164, 223, 262 Linux 200 list 12, 24, 50, 61, 85, 164, 202, 246, 270 list dump 54 livedump 292 load balance 130, 183 load balances traffic 7 Load balancing 196 load balancing 123, 199 loading 68, 114 LOBs 213 location 75, 85, 142, 208, 246 locking 191 log 49, 64, 164, 236, 254, 274 logged 42, 72 Logical Block Address 64 logical drive 58, 95, 193, 210, 215 logical unit number 163 logical units 29 logical volumes 215 login 42, 177 logins 177 logs 156, 210, 255, 276 long distance 165 loops 67, 264 lower-performance 122 LPAR 198 LU 178 LUN 30, 57, 83, 104, 134, 163, 176, 210, 228, 249 access 164, 191 LUN mapping 93, 178 LUN masking 20, 72, 272 LUN Number 59, 93 LUN per 105, 213 LUNs 58, 84, 101, 164, 178, 211, 213, 233, 249 LVM 125, 196, 212 LVM volume groups 215
L
last extent 123 latency 9, 138, 155, 208 LBA 64 level 12, 25, 58, 86, 139, 162, 178, 218, 239, 270, 297 storage 75, 204, 253, 271 levels 65, 86, 104, 141, 190, 217, 251 lg_term_dma 194
M
MAC 41 MAC address 41 maintenance 34, 42, 167, 184, 256, 270 maintenance procedures 42, 259, 289 maintenance window 167 manage 24, 58, 119, 176, 213, 222, 268 managed disk 219, 289 managed disk group 127, 219 Index
309
Managed Mode 67, 127 management xiii, 6, 41, 102, 162, 176, 211, 222, 263, 272 capability 177 port 177, 241 software 179 management communication 46 managing 24, 31, 176, 215, 246, 268, 270 map 61, 137, 161, 179 map a VDisk 183 mapping 57, 93, 109, 125, 153, 176, 213, 271 mappings 125, 192, 271 maps 219, 289 mask 9, 164, 177, 289 masking 30, 72, 161, 177, 272 master 35, 153 master console 41, 154 Master Console server 39 max_xfer_size 194 maximum IOs 216 MB 21, 67, 123, 194 Mb 21, 29 McDATA 27, 280 MDGs 83, 123, 213 MDisk 53, 57, 83, 103, 123, 163, 183, 210, 228, 249 adding 87, 140 removing 192 MDisk group 124, 163, 210 media 171, 233, 289 Media Access Control 41 media error 64 medium errors 63 member 16 members 67, 271 memory 152, 176, 210, 235, 253, 264, 292 message 20, 42, 170, 258 messages 183, 258, 284 metadata 120 metadata corruption 99 MetaSANs 10 metric 78, 139, 173, 236 Metro 27, 124, 227 Metro Mirror 162, 236 Metro Mirror relationship 155 microcode 65 Microsoft Windows Active Directory 38 Microsoft Windows Server professionals 49 migrate 22, 124, 160, 178 migrate data 127, 198 migrate VDisks 125 migration 3, 30, 63, 126, 160, 185, 253, 267 migration scenarios 8 mirrored 138, 165, 204 mirrored VDisk 122 mirroring 20, 125, 162, 196 misalignment 215 mkrcrelationship 169 Mode 67, 180, 252, 275 mode 27, 52, 86, 101, 162, 177, 211, 262, 290 settings 162
monitor 22, 141, 167, 272 monitored 78, 141, 172, 204, 270 monitoring 77, 167, 175, 288 monitors 145, 234 mount 126, 156, 265 MPIO 196, 258 multi-cluster installations 5 multipath drivers 88, 256 multipath software 190 multipathing 34, 58, 176, 256, 269 Multipathing software 184 multipathing software 183, 259 multiple paths 130, 183, 272 multiple striping 218 multiple vendors 21 multiplexing 20
N
name server 185, 259 names 16, 49, 122, 199, 259 nameserver 185 naming 13, 60, 87, 122, 252 naming convention 13 naming conventions 53 nest aliases 16 new disks 186 new MDisk 95 No Contact 50 no synchronization 122 NOCOPY 155 node 3, 27, 52, 72, 84, 104, 123, 176, 223, 255, 264, 270, 294 adding 29 failure 130, 185 port 14, 130, 172, 177, 223, 272 node port 14 nodes 3, 24, 52, 71, 84, 123, 177, 223, 254, 264, 271, 293 noise 138 non 11, 24, 73, 124, 183, 213, 237, 251, 267, 275 non-disruptive 127 non-preferred path 129 num_cmd_elem 193194
O
offline 52, 65, 87, 126, 163, 184, 230, 256, 271 OLTP 210 Online 210 online 87, 115, 225, 257, 271 online transaction processing (OLTP) 210 operating system (OS) 208 operating systems 183, 215, 258, 274 optimize 115, 293 Oracle 196, 213 organizations 21 OS 52, 176, 220, 256 outage 133 overlap 14 overloading 148, 174, 242
310
P
packet filters 41 parameters 49, 57, 84, 131, 171, 178, 209, 252 partition 197, 216 partitions 66, 144, 197, 215 partnership 54, 167 password 52 password reset feature 53 passwords 52 path 3, 58, 104, 176, 220, 228, 256, 270 selection 195 paths 7, 34, 58, 130, 176, 228, 253, 272 peak 3, 165 per cluster 28, 123, 236 performance xiii, 3, 24, 57, 86, 102, 119, 162, 175, 207, 223, 245, 270, 293 degradation 59, 104, 162 performance advantage 86, 108 performance characteristics 103, 124, 206, 219 performance improvement 127, 237, 294 performance monitoring 173, 178 performance requirements 31 Performance Upgrade kit 39 permanent 170 permit 3 persistent 88, 191 PFE xiv physical 20, 24, 57, 85, 147, 152, 175, 235, 249, 264 physical volume 197, 219 ping 52 PiT 134 Plain Old Documentation 92 planning 15, 86, 101, 139, 165, 209 plink 51 plink.exe 51 PLOGI 185 point-in-time 163 point-in-time copy 164 policies 196 policy 53, 104, 191, 254 pool 24, 68, 168 port 2, 24, 57, 84, 142, 172, 176, 223, 246, 262, 272 types 59 port bandwidth 9 Port Channels 11 port errors 10 port event 10 Port Fencing 10 port layout 10 port zoning 12 port/traffic isolation 11 port-density 9 ports 2, 27, 58, 84, 142, 176, 223, 248, 263, 271 power 188, 258, 264, 288
preferred 20, 30, 50, 58, 123, 167, 177, 214, 271 preferred node 15, 123, 167, 183 preferred owner node 129 preferred path 58, 129, 183 preferred paths 130, 183, 275 prepare a FlashCopy 172 prepared state 172 Pre-zoning tips 13 primary 29, 85, 102, 134, 162, 212 priority 42 private key 46 problems 2, 34, 59, 87, 138, 161, 192, 208, 250, 262, 269 profile xiv, 66, 96, 130, 287 properties 138, 201 protect 167 protecting 67 provisioning 87, 105 pSeries 19, 73, 206 public key 46 PuTTY 39, 52 PuTTY generated SSH 46 PuTTY SSH 38 PuTTYgen 46, 48 PVID 198 PVIDs 199
Q
queue depth 83, 188, 194, 200, 219 quickly 2, 76, 138, 155, 183, 230, 251, 262 quiesce 125, 156, 187
R
RAID 67, 86, 127, 169, 210, 249, 299 RAID array 139, 171, 211, 213 RAID arrays 138, 212 RAID types 211 ranges 139 RDAC 58, 88 Read cache 208 read miss performance 130 real capacity 63 reboot 125, 188 rebooted 197 receive 97, 237, 258 recovery 29, 52, 95, 128, 156, 176, 289 recovery point 167 Redbooks Web site 303 Contact us xvi redundancy 2, 910, 58, 116, 165, 177, 224, 272 redundant 24, 45, 72, 165, 177, 219, 230, 270 redundant paths 177 redundant SAN 72 registry 185, 276 relationship 19, 58, 124, 197, 227 reliability 15, 87, 245, 294 remote cluster 35, 165, 227, 255 remote copy 134 remote mirroring 20 remotely 50
Index
311
remount 138 removed 20, 31, 125, 186 rename 161, 277 repairsevdisk 99 replicate 163 replication 162, 268 reporting 77, 139, 241, 274 reports 142, 186, 221 reset 42, 185, 254, 270 resource consumption 42 resources 24, 75, 96, 102, 133, 168, 176, 216, 235, 292 restart 161, 264 restarting 167 restarts 185 restore 55, 166, 173, 198 restricted rights 48 restricting access 191 rights 48 risk 76, 87, 102, 164, 266 role 48, 210 role-based security 48 roles 47, 212 root 141, 192, 241, 257 round 96, 165, 216 round-robin 97 route 167 router 164 routers 165 routes 11 routing 58, 263 RPQ 3, 200, 254 RSCN 185 rules 77, 149, 161, 176, 272
S
SAN xiii, 1, 2324, 39, 58, 122, 160, 175, 219, 245, 262, 269270, 294 availability 183 fabric 1, 160, 183, 224 SAN bridge 6 SAN configuration 1 SAN fabric 1, 160, 178, 223, 272 SAN switch models 9 SAN Volume Controller 1, 3, 12, 15, 24, 127, 175 multipathing 200 SAN zoning 130, 226, 251 save capacity 121 scalability 2, 23, 299 scalable 1, 24 scale 25, 116, 299 scaling 31, 117, 293 scan 186 SCP 49 scripts 133, 189 SCSI 64, 130, 185, 287 commands 191, 287 SCSI disk 198 SCSI-3 191 SDD 14, 58, 88, 125126, 164, 176, 195, 218, 254, 274 SDD for Linux 201, 302
SDDDSM 179, 274 SE VDisks 120 secondary 29, 134, 163, 210 secondary site 29, 163 secure 52 Secure Copy Protocol 49 Secure Shell 49 security 12, 47, 196, 256 segment 66 separate zone 18 sequential 85, 101, 123, 176, 208, 249, 299 serial number 60, 178, 249 serial numbers 179 Server 58, 196197, 218, 222, 251, 275 server 3, 24, 66, 142, 156, 185, 233, 248, 264 Servers 199 servers 3, 25, 196, 207, 251, 265 service 35, 41, 78, 86, 219, 236, 270 settings 52, 171, 193, 208, 272 setup 41, 193, 214, 262, 272 SEV 159 SFPs 21 shapers 41 share 19, 72, 86, 104, 147, 177, 216 shared 21, 169, 191, 217, 224 sharing 11, 146, 191, 209 shortcuts 13 shutdown 125, 161, 185, 264 Simple Network Management Protocol 41 single host 15 single initiator zones 15 single storage device 183 single-member aliases 16 site 29, 31, 64, 134, 163, 203, 234, 251, 300, 302 slice 215 slot number 19, 256 slots 6768 slotted design 9 SMS 217 snapshot 160, 249 SNMP 41, 241 Software 1, 3, 12, 15, 187, 253, 255, 270 software 2, 164, 176, 245, 266, 271 Solaris 201, 275 solution 1, 38, 86, 139, 173, 208, 246, 266 solutions 119, 246 source 11, 64, 124, 201, 233 sources 265 space 80, 86, 123, 210, 252 Space Efficient 121 space efficient copy 127 Space Efficient VDisk 159 Space Efficient VDisk Performance 120 space-efficient VDisk 63 spare 3, 30, 67, 86 speed 10, 25, 139, 169, 248 speeds 20, 80, 139, 262, 294 split 5, 23, 70, 146 splitting 121 SPS 71
312
SSH 42, 49, 223 SSH communication 48 SSH connection 46 SSH connection limitations 46 SSH connectivity 38 SSH keypairs 46 SSH Secure Copy 54 SSH session 48 SSPC 39, 89, 244 SSPC server 39 standards 22, 222, 262 start 20, 25, 79, 148, 162, 178, 215, 222, 270 state 42, 127, 162, 176, 255, 292 synchronized 170 statistics 85, 170, 205, 234 statistics collection 170 status 50, 63, 192, 222, 248, 271 storage 1, 24, 57, 83, 101, 123, 162, 175, 207, 245, 262, 270, 294 storage controller 14, 24, 57, 84, 103, 134, 163, 224 storage controllers 14, 24, 59, 86, 104, 147, 163, 222 Storage Manager 68, 166, 172, 281 storage performance 78, 138, 239 Storage Pool Striping 71 storage subsystems 49 storage traffic 3 streaming 114, 130, 209 strip 215 Strip Size Considerations 215 strip sizes 215 stripe 102, 213 striped 96, 123, 185, 210 striped mode 155, 211 striped mode VDisks 214 striped VDisk recommendation 218 Striping 101 striping 24, 66, 108, 212, 215, 293 striping policy 121 Subsystem Device Driver 14, 58, 88, 125126, 164, 180, 195, 257, 275 support 25, 58, 88, 210, 246, 263, 294 SVC xiii, 1, 23, 58, 84, 123, 152, 175, 210, 245, 262, 269, 293 SVC CLI 52 SVC Cluster 52 SVC cluster 3, 23, 60, 84, 103, 182, 222, 279 SVC configuration 53, 177, 250, 267, 300 SVC Console 52 SVC Console server 52 SVC Console software 38 SVC error log 99 SVC installations 4, 104, 262 SVC master console 52, 157 SVC node 14, 29, 164, 177, 223, 271 SVC nodes 7, 24, 31, 72, 138, 178, 227, 288, 294 SVC Service mode 52 SVC software 178, 270 SVC zoning 16 svcinfo 88, 125, 179, 271 svcinfo lsmigrate 88
svctask 59, 88, 123, 160, 201, 278 svctask dumpinternallog 50 svctask finderr 50 switch 53, 142, 170, 175, 223, 247 fabric 3, 255 failure 3, 205 interoperability 21 switch fabric 2, 178, 228 switch ports 8, 225 switch splitting 8 switches 2, 49, 167, 247, 262, 270 Symmetrix 57 synchronization 165 Synchronized 169 synchronized 122, 162 system 30, 78, 115, 123, 152, 175, 209, 231, 256, 274, 293 system performance 124, 201, 292 System Storage Productivity Center 39
T
tablespace 210, 216 tablespaces 217218 tape 3, 166, 177 target 59, 84, 124, 177, 233, 288 target ports 72, 177 targets 187 tasks 241 test 2, 51, 87, 108, 165, 175, 218, 236 tested 25, 87, 164, 176, 218, 254, 266, 300 This 1, 23, 58, 83, 101, 123, 163, 175, 207, 221, 246, 261, 270, 297 thread 188, 215 threads 215 threshold 3, 162, 241 thresholds 138 throttle 130, 200 throttle setting 131 throttles 130 throttling 131 throughput 28, 66, 86, 104, 138, 156, 184, 194, 208, 210, 237, 294 throughput based 208209 tier 87, 104 time 2, 27, 52, 58, 96, 103, 122, 176, 208, 223, 248, 264, 269, 294 tips 13 Tivoli 166, 172, 241 Tivoli Storage Manager (TSM) 209 tools 175, 246, 272 Topology 142, 223, 274 topology 2, 223, 274 topology issues 7 topology problems 7 total load 218 TPC CIMOM 52 TPC for Replication 39 traditional 11 traffic 3, 7, 165, 183, 235 congestion 3 Index
313
Fibre Channel 21 Traffic Isolation 8 traffic threshold 9 transaction 66, 156, 193, 208 transaction based 208209 Transaction log 210 transceivers 21 transfer 58, 83, 130, 176, 208 transit 3 trends 78 trigger 15, 250 troubleshooting 12, 175, 252 TSM 215 ttle 131 tuning 145, 175
W
Web browser 49 Windows 2003 32, 199 workload 29, 58, 96, 102, 123, 165, 193, 208, 235, 297 throughput based 208 transaction based 208 workload type 209 workloads 3, 66, 86, 103, 133, 165, 176, 208, 296 worldwide node name 11 write performance 122 writes 66, 85, 104, 138, 162, 176, 210, 235, 297 WWNN 1112, 59, 187, 259 WWNs 12 WWPN 12, 30, 57, 84, 223, 248 WWPN zoning 12 WWPNs 12, 73, 177, 259, 271
U
UID 93, 290 unique identifier 75, 178 UNIX 156, 205 Unmanaged MDisk 164 unmanaged MDisk 128 unmap 124 unused space 123 upgrade 171, 185, 254, 287, 297 upgrades 184, 253, 287 upgrading 35, 190, 237, 255256, 296 upstream 2, 241 URL 50 user data 120 user IDs 52 users 10, 24, 139, 185, 224, 266 using SDD 164, 195, 258 utility 88, 206
X
XFPs 21
Z
zone 7, 160, 177, 226, 272 zone name 19 zoned 2, 177, 227, 248, 288 zones 12, 160, 226, 249, 272 zoneset 18, 227, 289 Zoning 20, 249 zoning 6, 11, 29, 73, 130, 177, 226, 249 zoning configuration 11, 231, 249 zoning scheme 13 zSeries 117
V
VDisk 14, 27, 53, 57, 83, 103, 119, 178, 210, 231, 249, 271 creating 87, 158 migrating 127 modifying 146 showing 142 VDisk migration 64 VDisk Mirroring 121 VIO clients 219 VIO server 198, 219 VIOC 197, 219 VIOS 196197, 219 virtual address space 120 virtual disk 98, 130, 198 Virtual LAN 41 virtualization 23, 75, 211, 269 virtualization policy 121 virtualizing 11, 185 VLAN 41 volume abstraction 211 volume group 72, 195 VSAN 11 VSANs 2, 11
314
Back cover