Split SVC Nodes

IBM Storwize V7000 Clustering and SVC Split I/O Group Deeper Dive
Bill Wiegand - ATS Senior I/T Specialist Storage Virtualization
Copyright IBM Corporation, 2011
Agenda
Quick Basics of Virtualization
Scaling Storwize V7000 via Clustering

Scaling Storwize V7000 Unified
Q&A 1
SVC Split I/O Group V6.3 Q&A 2
Virtualization The Big Picture

Designed to be redundant, modular and scalable solution
C luster consisting of one to four I/O Groups managed as a single system

Volumes Volumes
Storage Network
Volumes
T wo nodes make up an I/O Group and own given volumes

Volumes
Node
Node
Node
Node
Node
Node
Node
Node
Managed Disks
Virtualization The Big Picture

V olumes: M ax 8192 Volumes, 2048 per I/O Group, with each up to 256TB in size and each assigned to: A specific I/O Group
Built from a specific Storage Pool
I /O Group A
N odes C ontrol Enclosure
I /O Group B
N odes C ontrol Enclosure
S VC Cluster or Storwize V7000 Clustered System
C luster: Max 4 I/O Groups built from 4 SWV7K control enclosures or 8 SVC nodes M anaged Disks (MDisks): Internally or externally provided Max 4096 MDisks per System S torage Pools: Max 128 Storage Pools Max 128 MDisks per Pool
M DG1
M DG3
P ool 1 P ool 2 P ool 3
Scale the Storwize V7000 Multiple Ways

S torwize V7000 O ne I/O Group System
C ontrol Enclosure
S torwize V7000 2 4- I/O Groups C lustered System

C ontrol Enclosure C ontrol Enclosure
Cluster
An I/O Group is a control enclosure and its associated SAS attached expansion enclosures Clustered system can consist of 2-4 I/O Groups
SCORE approval for > 2
Expand
N o interconnection of SAS chains between control enclosures as control enclosures communicate via FC and must use all 8 FC ports on enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures
Expand
Scale Storage
Add up to 4x the capacity Add up to 4x the throughput
Cluster
Non-disruptive upgrades
From smallest to largest configurations Purchase hardware only when you need it
A n I/O Group is a control enclosure and its associated SAS connected expansion enclosures 5
NOTE: Storwize V7000 Clustered System with greater then two I/O Groups/Frames requires SCORE/RPQ approval
Virtualize storage arrays behind Storwize V7000 for even greater capacity and throughput
Storwize V7000 Unified Scaling

S torwize S torwize V7000 V7000 Unified 2 Unified 4- I/O Groups C lustered System O ne I/O Group N System OT SUPPORTED
C ontrol Enclosure
C C ontrol Enclosure ontrol Enclosure
Storwize V7000 Unified can scale disk capacity by adding up to nine expansion enclosures to the standard control enclosure Virtualize external storage arrays behind Storwize V7000 Unified for even greater capacity
CIFS not supported currently with externally virtualized storage
Expand
E xpansion Enclosures
E E xpansion xpansion Enclosures Enclosures
CAN NOT horizontally scale out by adding additional Unified systems or even adding just another Storwize V7000 control enclosure and associated expansion enclosures at this time
If customer has clustered Storwize V7000 system today they will not be able to upgrade to Unified system in 2012 when MES is available
Clustered System Facts

Clustered system provides ability to independently grow capacity and performance
Add expansion enclosures for more capacity
Add control enclosure for more performance No extra feature to order and no extra charge for a clustered system
Configure one system using USB stick and then add second using GUI
Clustered systems GA support is for up to 480 SFF disk drives or 240 LFF disk drives or a mix thereof
Up to 480TB raw capacity in one 42U rack Enables Storwize V7000 to compete effectively against larger EMC, NetApp, HP systems
Support for a larger system can be requested by submitting a SCORE/RPQ

E.g.EightStorwizeV7000nodecanistersinfourcontrolenclosures Up to 960TB raw capacity in two 42U racks
7 Copyright IBM Corporation, 2011

Adding additional control enclosures to existing V6.2+ system is non-disruptive
Requires new control enclosures be loaded with V6.2.x minimum
Control enclosures can be any combination of models

2076-112, 124, 312, 324
Clustered system operates as a single storage system

Managed via one IP address
Both node canisters in a given control enclosure are part of the same I/O Group
Cannot create an I/O Group with one node from each of 2 different control enclosures Adding one node in control enclosure to an I/O Group will automatically add the other Storwize V7000 clustered system does not support split I/O group configurations - (also known as stretch cluster)

Inter control enclosure communication provided by a Fibre Channel (FC) SAN
Must use all 4 FC ports on each node canister and zone all together
All FC ports on a node canister must have at least one path to every node canister in the clustered system that is not in the same control enclosure Node canisters in the same control enclosure have connectivity via the PCIelinkofthemidplaneanddontrequireFCportsbezonedtogether
However, recommended guideline is to zone them together as it provides a secondary path should the PCIe link have issues
Only 1 control enclosure can appear on a given SAS chain Only 1 node canister can appear on a single strand of SAS chain
Key to realize is there is no access by one control enclosure (I/O Group) to the SAS attached expansion enclosures of another control enclosure (I/O Group) other then via the SAN

Currently volumes built on internal MDisks in a storage pool will be owned by the same I/O group (IOG) that owns the majority of the MDisks in that storage pool
E.g. Pool-1 has 3 MDisks from IOG-0 and 4 from IOG-1 then by default IOG-1 will own all volumes created
DefaultGUIbehaviorcanbeoverriddenusingtheAdvancedoptioninGUI
If pool owns exact same number of MDisks from each I/O group then volumes will be owned by IOG-0
Expansion enclosures only communicate with their owning control enclosure meaning host I/Os coming into IOG-0 but data is on IOG-1 means I/O is forwarded to IOG-0 over FC
Similar process to SVC accessing external storage systems
Does not go thru cache on owning I/O group but directly to MDisk
Uses very lowest layer of I/O stack to minimize any additional latency
10
Clustered System Example

S AN
N ode Canister ode Canister N
N ode Canister ode Canister N
C ontrol Enclosure #1
A ll cabling shown is logical
C ontrol Enclosure #2 S torage Pool C m disk m disk
S torage Pool A m disk m disk
E xpansion Enclosure E xpansion Enclosure S torage Pool B m disk m disk
E xpansion Enclosure E xpansion Enclosure
m disk
m disk
E xpansion Enclosure
E xpansion Enclosure
I /O Group #1
I /O Group #2
Expansion enclosures are connected through one control enclosure and can be part of only one I/O group All MDisks are part of only one I/O group
11
Storage pools can contain MDisks from more than one I/O group Inter-control enclosure communications happens over the SAN A volume is serviced by only one I/O group
Storwize V7000 Clustered System DR

P roduction Site D isaster Recovery Site
S torwize V7000 2 One 4- I/O Group Group C S lustered ingle Frame System
An I/O Group is a control enclosure and its associated SAS attached expansion enclosures
S torwize V7000 O ne to four I/O Group System
A Clustered System can consist of 2-4 I/O Groups

SCORE approval for > 2
E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures C ontrol Enclosure C ontrol Enclosure C ontrol Enclosure C E ontrol xpansion Enclosure Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures
G lobal Mirror o r M etro Mirror
E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures C ontrol Enclosure C ontrol Enclosure C ontrol Enclosure C E ontrol xpansion Enclosure Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures
Replication between clustered systems is via fibre channel ports only

Replication between up to four clustered systems is allowed Requires 5639-RM1 license(s) at each site
NOTE: Storwize V7000 Clustered System with greater then 2 I/O Groups/Frames requires SCORE/RPQ approval
Storwize V7000 Clustered System HA

M irrored V olume
H ost
A High Availability clustered system similar to a SVC Split I/O Group configuration is not possible since we can not split a control enclosure in half and install at two different sites
One I/O Group will be at each site unlike SVC where each node in an I/O Group can be installed in a different site
So if you lose a site you lose access to all volumes owned by that I/O Group There is no automatic failover of a volume from one I/O Group to another
C ontrol Enclosure
C lustered System S eparated by Distance
C ontrol Enclosure
Volume mirroring does allow for a single host volume to have pointers to two sets of data which can be on different I/O Groups in a clustered system, but again if you lose a site you lose the entire I/O Group so any volumes owned by that I/O Group will be offline
You can migrate the volume ownership from the failed IOG to the other IOG but data may be lost as unwritten data still in cache on offline IOG is discarded in process of migration or could have been lost if IOG failed hard without saving cached data
S torwize V7000 Clustered System I /O Group 1
S torwize V7000 Clustered System I /O Group 2
P roduction Site A
13
P roduction Site B
So Begs the Question Why Cluster

One reason it is offered is because we can
Runs same software as SVC which supports 1-4 I/O Groups
Can start very small and grow very large storage system with single management interface
Helps to compete with larger midrange systems from other vendors
Can virtualize external storage too providing same virtualization features across entire Clustered System
Just like SVC cluster so desirable for same reasons large SVC clusters are
************************************************************************** However, nothing wrong with going with 1-4 separate systems versus a Clustered System if customer prefers
Systemmanagementisntthathardanyway If customer will lose sleep over possible complete failure of a control enclosure, no matter how unlikely that is, then go with separate systems
Q&A
15
Q&A
16
IBM System Storage
SVC Split I/O Group Update

Bill Wiegand/Thomas Vogel ATS System Storage
2010 IBM Corporation
IBM System Storage
Agenda
Terminology SVC Split I/O Group Review Long distance: refresh WDM devices Buffer-to-Buffer credits SVC Quorum disk Split I/O Group without ISLs between SVC nodes Supported configurations SAN configuration for long distance Split I/O Group with ISLs between SVC nodes Supported configurations SAN configuration for long distance
1 8
IBM System Storage
Terminology
SVC Split I/O Group = SVC Stretched Cluster = SVC Split Cluster
Two independent SVC nodes in two independent sites + one independent site for Quorum Acts just like a single I/O Group with distributed high availability
Site 1 Site 2
I/O Group 1
I/O Group 1
Distributed I/O groups NOT a HA Configuration and not recommended, if one site failed: Manual volume move required Some data still in cache of offline I/O Group Site 1
I/O Group 1 I/O Group 1 I/O Group 2
Site 2
I/O Group 2
Storwize V7000 Split I/O Group not an option: Single enclosure includes both nodes Physical distribution across two sites not possible Site 1 Site 2
1 9
IBM Systems and Technology Group
SVC What is a Failure Domain

Generally a failure domain will
represent a physical location, but depends on what type of failure you are trying to protect against
Could all be in one building on different floors/rooms or just different power domains in same data center Could be multiple buildings on the same campus Could be multiple buildings up to 300KM apart
Key is the quorum disk

If only have two physical sites and quorum disk to be in one of them then somefailurescenarioswontallow cluster to survive Minimum is to have active quorum disk system on separate power grid in one of the two failure domains
2 0
SVC How Quorum Disks Affect Availability (1)

Failure Domain 1 Failure Domain 2
1) Loss of active quorum: SVC selects another quorum Continuation of operations 2) Loss of storage system: Loss of active quorum SVC selects another quorum Continuation of operations Mirrored Volumes continue operation but may take 60sec or more since active quorum disk failed Note: The loss of all quorum disks will not cause the cluster to stop as long as there are a majority of the nodes operational in the cluster. However, mirrored Volumes will likely go offline. This is why you would manually configure the cluster so the quorum disk candidates are located SVC Quorum 2 SVC Quorum 1 on disk systems in both failure domains. Active Quorum
ISL 1
TotalStorage TotalStorage
Storage Engine 336
Storage Engine 336
Node 1
Node 2
ISL 2
Volume
Mirroring
SVC Quorum 3 2 1
SVC How Quorum Disks Affect Availability (2)

Lose of Failure Domain 1 : Active quorum not affected Continuation of operations
ISL 1
Lose of Failure Domain 2: Active quorum lost Half of nodes lost Loss of cluster majority
TotalStorage
Storage Engine 336
TotalStorage
Storage Engine 336
Node 1
Node 2
ISL 2
No Active Quorum
Node 1 can not utilize quorum candidate to recover and survive Node 1 shuts down and cluster stopped May not be recoverable and may require cluster rebuild and data restore from backups
Volume
Mirroring
No Access to Data on Disk

SVC Quorum 3 SVC Quorum 2 SVC Quorum 1
Active Quorum
2 2
Active Quorum
Current Supported Configuration for Split I/O Group

Failure Domain 3
Automated failover with SVC handling The loss of:
ISL 1
- SVC node - Quorum disk - Storage subsystem

Can incorporate
TotalStorage TotalStorage
Storage Engine 336
Storage Engine 336
Node 1
Node 2
MM/GM to provide disaster recovery - 3 site like capability
ISL 2 Disk system that supports ExtendedQuorum Volume Mirroring
SVC Quorum 3
SVC Quorum 2 SVC Quorum 1
SVC Quorum 1
Active Quorum
2 3
Active Quorum
Active Quorum
IBM System Storage
SVC Split I/O Group
Site 1 SVC Node 1

Operational Failed Operational Operational
Site 2 SVC Node 2

Operational Operational Failed Operational
Site 3 Quorum disk

Operational Operational Operational Failed
Cluster Status
Operational, optimal Operational, Write cache disabled Operational, Write cache disabled Operational, Write Cache enabled, but different active Quorum disk Whichever node accesses the active quorum disk first survives and the partner node goes offline Stopped Stopped
Operational, link to Site 2 failed: Split Brain
Operational, link to Site 1 failed: Split Brain
Operational
Operational Failed same time with Site 3

2 4
Failed same time with Site 3 Operational
Failed same time with Site 2 Failed same time with Site 1
IBM System Storage
Advantages / Disadvantages of SVC Split I/O Group

Advantages No manual intervention required Automatic and fast handling of storage failures Volumes mirrored in both locations Transparent for servers and host based clusters Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility) Disadvantages Mix between HA and DR solution but not a true DR solution Non-trivial implementation
2 5
SVC Split I/O Group V6.3 Enhancements
Split I/O Group Physical Configurations

The following charts show supported physical configurations for the new Split I/O Group support in V6.3
VSANs (CISCO) and Virtual Fabrics (Brocade) are not supported by all switch models from the respective vendors
Consult vendor for further information
Enhancements designed to help us compete more effectively with EMC VPLEX at longer distances Note that this information is all very new even to ATS and some requirements could change prior to GA Highly recommend engaging ATS for solution design review
w3.ibm.com/support/techxpress
Storwize V7000 does not provide any sort of split I/O group, split cluster, stretch cluster HA configurations
A clustered Storwize V7000 provides the ability to grow system capacity and scale performance within a localized single system image
Extension of Currently Supported Configuration

U ser chooses number of ISLs on SAN
Server Cluster 1 Server Cluster 2
S AN Active DWDM over shared single mode fibre(s) SVC + UPS
S AN
0-10 KM Fibre Channel distance supported up to 8Gbps

S AN 11-20KM Fibre Channel distance supported up to 4Gbps 21-40KM Fibre Channel distance supported up to 2Gbps
SVC + UPS
S AN
U ser chooses number of ISLs on SAN
T wo ports per SVC node attached to local SANs T wo ports per SVC node attached to remote SANs via DWDM H osts and storage attached to SANs via ISLs sufficient for workload 3 rd site quorum (not shown) attached to SANs
Configuration With 4 Switches at Each Site

U ser chooses number of ISLs on public SAN
Server Cluster 1 Server Cluster 2
P ublic S AN
1 P rivate ISL per I/O group C onfigured as trunk S AN
P rivate S AN
P ublic S AN
SVC + UPS
SVC + UPS
P ublic S AN
P rivate S AN
1 ISL per I/O group C onfigured as trunk
P rivate S AN
P ublic S AN
U ser chooses number of ISLs on public SAN
T wo ports per SVC node attached to public SANs T wo ports per SVC node attached to private SANs H osts and storage attached to public SANs 3 rd site quorum (not shown) attached to public SANs
Configuration Using CISCO VSANs

S witches are partitioned using VSANs Server Cluster 1 Server Cluster 2
P ublic V SAN
P rivate V SAN
P rivate V SAN
P ublic V SAN
SVC + UPS
SVC + UPS
P ublic V SAN
P rivate V SAN
P rivate V SAN
P ublic V SAN
N ote ISLs/Trunks for private VSANs are dedicated r ather than being shared to guarantee dedicated bandwidth is available for node to node traffic
Configuration Using Brocade Virtual Fabrics

Server Cluster 1 P hysical switches are partitioned into t wo logical switches Server Cluster 2
P ublic S AN
P rivate S AN
P rivate S AN
P ublic S AN
SVC + UPS
SVC + UPS
P ublic S AN
P rivate S AN
P rivate S AN
P ublic S AN
N ote ISLs/Trunks for private SANs are dedicated r ather than being shared to guarantee dedicated bandwidth is available for node to node traffic
Split I/O Group Distance
The new Split I/O Group configurations will support distances of up to 300km (same recommendation as for Metro Mirror) However for the typical deployment of a split I/O group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used The following charts explain why
32
Metro Mirror
Technically SVC supports distances up to 8000km
SVC will tolerate a round trip delay of up to 80ms between nodes
The same code is used for all inter-node communication Global Mirror, Metro Mirror, Cache Mirroring, Clustering SVCs proprietary SCSI protocol only has 1 round trip
In practice Applications are not designed to support a Write I/O latency of 80ms

Hence Metro Mirror is deployed for shorter distances (up to 300km) and Global Mirror is used for longer distances
33
IBM Presentation Template Full Version
Metro Mirror: Application Latency = 1 long distance round trip
Server Cluster 1
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host
Server Cluster 2
1 round trip
4) Metro Mirror Data transfer to remote site 5) Acknowledgment
SVC Cluster 1
7a) Write request from SVC 8a) Xfer ready to SVC 9a) Data transfer from SVC 10a) Write completed to SVC
SVC Cluster 2
7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC
Data center 1
34
Steps 1 to 6 affect application latency
Data center 2
Steps 7 to 10 should not affect the application
Split I/O Group for Business Continuity
Split I/O Group splits the nodes in an I/O group across two sites

SVC will tolerate a round trip delay of up to 80ms Cache Mirroring traffic rather than Metro Mirror traffic is sent across the inter-site link
Data is mirrored to back-end storage using Volume Mirroring
Data is written by the 'preferred' node to both the local and remote storage The SCSI Write protocol results in 2 round trips This latency is generally hidden from the Application by the write cache
35
Split I/O Group Local I/O: Application Latency = 1 round trip
Server Cluster 1
Server Cluster 2
1 round trip
4) Cache Mirror Data transfer to remote site 5) Acknowledgment
Node 1
SVC Split I/O Group
Node 2
2 round trips but SVC write cache hides this latency from the host
Data center 1
36
Data center 2 Steps 1 to 6 affect application latency Steps 7 to 10 should not affect the application
Split I/O Group for Mobility
Split I/O Group is also often used to move workload between servers at different sites VMotion or equivalent can be used to move Applications between servers Applications no longer necessarily issue I/O requests to the local SVC nodes SCSI Write commands from hosts to remote SVC nodes results in an additional 2 round trips worth of latency that is visible to the Application
37
Split I/O Group Remote I/O: Application Latency = 3 round trips
Server Cluster 1
Server Cluster 2
2 round trips
1 round trip
Node 1
SVC Split I/O Group
Node 2
2 round trips but SVC write cache hides this latency from the host
Data center 1
38
Split I/O Group for Mobility
Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands

These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os
39
Split I/O Group Remote I/O: Application Latency = 2 round trips

5) Write request to SVC 6) Xfer ready from SVC 7) Data transfer to SVC 10) Write completed from SVC
Server Cluster 1
Server Cluster 2
1 round trip
11) Write completion to remote site
4) Write+ data transfer to remote site
1 round trip
Node 1
SVC Split I/O Group

16) Write+ data transfer to remote site 21) Write completion to remote site
Node 2
1 round trip hidden from the host

Distance Extenders
13) Write request from SVC 14) Xfer ready to SVC 15) Data transfer from SVC 22) Write completed to SVC
17) Write request to storage 18) Xfer ready from storage 19) Data transfer to storage 20) Write completed from storage
Data center 1
40
IBM System Storage
Long Distance Impact

Additional latency because of long distance Light speed in glass: ~ 200.000 km/sec 1 km distance = 2 km round trip Additional round trip time because of distance: 1 km = 0.01 ms 10 km = 0.10 ms 25 km = 0.25 ms 100 km = 1.00 ms 300 km = 3.00 ms SCSI protocol: Read: 1 I/O operation = 0.01 ms / km Initiator requests data and target provides data Write: 2 I/O operations = 0.02 ms / km Initiator announces amount of data, target acknowledges Initiator send data, target acknowledge SVCsproprietarySCSIprotocolfornode-to-node traffic has only 1 round trip Fibre channel frame: User data per FC frame (Fibre channel payload): up to 2048 bytes = 2KB Also for very small user data (< 2KB) a complete frame is required Large user data is split across multiple frames
4 1
IBM System Storage
Passive/Active WDM devices

Passive WDM
No power required Can use CWDM or DWDM technology Colored SFPs required They create different Wavelength Customer must own the physical cable end to end No rental of some wavelength from a service provider possible Limited equipment cost Max distance 70km depending on SFP
Active WDM
Power required Can use CWDM or DWDM technology Change incoming/outgoing wavelengths Adds negligible latency because of signal change Consolidate multiple wavelengths in one cable No dedicated link required Customers can rent some frequencies High equipment cost Longer distances supported
4 2
IBM System Storage
CWDM / DWDM Devices

WDM means Wavelength Division Multiplexing Parallel transmission of number of wavelengths over a fiber
CWDM (Coarse Wavelength Division Multiplex) 16 or 32 wavelength into a fibre Uses wide-range frequencies Wider channel spacing - 20nm (2.5THz grid) CWDM Spectrum DWDM (Dense Wavelength Division Multiplex ) 32, 64 or 128 wavelength into a fibre Narrow frequencies Narrow channel spacing - e.g. 0.8nm (100GHz grid) DWDM Spectrum
4 3
IBM System Storage
WDM Optical Networking: Passive vs. Active Solutions

Passive
Active
8G 10G 2G N x 4G 8G 100G
TXP TXP
TDM TDM
TXP TXP
TDM TDM
TXP
TXP
FSP 3000 Higher capacity (more channels per fiber) Higher aggregate bandwidth (up to 100G per wavelength) Higher distance (up to 200 km without mid-span amplifier)
FSP 3000
More secure (automated fail over, NMS, optical monitoring tools, embedded encryption)
Advanced features through usage of active xWDM technology

Source: ADVA
4 4 2011 IBM Corporation
IBM System Storage
SAN and Buffer-to-Buffer Credits
Buffer-to-Buffer (B2B) credits Are used as a flow control method by Fibre Channel technology and represent the number of frames a port can store Provides best performance Light must cover the distance 2 times Submit data from Node 1 to Node 2 Submit acknowledge from Node 2 back to Node 1 B2B Calculation depends on link speed and distance Number of multiple frames in flight increase equivalent to the link speed
4 5
IBM System Storage
SVC Split I/O Group Quorum Disk

SVC creates three Quorum disk candidates on the first three managed MDisks One Quorum disk is active
SVC 5.1 and later: SVC is able to handle the Quorum disk management in a very flexible way, but in a Split I/O Group configuration a well defined setup is required. ->DisablethedynamicquorumfeatureusingtheoverrideflagforV6.2andlater
svctask chquorum -MDisk <mdisk_id or name> -override yes This flag is currently not configurable in the GUI
SplitBrainsituation: SVC uses the quorum disk to decide which SVC node(s) should survive No access to the active Quorum disk: In a standard situation (no split brain): SVC will select one of the other Quorum candidates as active Quorum In a split brain situation: SVC may take mirrored Volumes offline
4 6
IBM System Storage
SVC Split I/O Group Quorum Disk

Quorum disk requirements: Must be placed in a third, independent site Must be fibre channel connected ISLs with one hop to Quorum storage system are supported Supported infrastructure: WDM equipment similar to Metro Mirror Link requirement similar to Metro Mirror Max round trip delay time is 80 ms, 40 ms each direction FCIP to Quorum disk can be used with the following requirements: Max round trip delay time is 80 ms, 40 ms each direction The fabrics are not merged so routers required Independent long distance equipment from each site to Site 3 is required
iSCSI storage not supported

Requirement for active / passive storage devices (like DS3/4/5K): Each quorum disk storage controller must be connected to both sites
4 7
IBM System Storage
Split I/O Group without ISLs between SVC nodes

Split I/O Group without ISLs between SVC nodes (Classic Split I/O Group) SVC 6.2 and earlier: TwoportsoneachSVCnodeneededtobeconnectedtotheremoteswitch No ISLs between SVC nodes Third site required for Quorum disk ISLs with max. 1 hop can be used for Server traffic and Quorum disk attachment SVC 6.2 (late) update: Distance extension to max. 40 km with passive WDM devices Up to 20km at 4Gb/s or up to 40km at 2Gb/s. LongWave SFPs for long distances required LongWave SFPs must be supported from the switch vendor
SVC 6.3: Similar to the support statement in SVC 6.2 Additional: support for active WDM devices Quorum disk requirement similar to Remote Copy (MM/GM) requirments: Max. 80 ms Round Trip delay time, 40 ms each direction FCIP connectivity supported No support for iSCSI storage system
Minimum distance
>= 0 km > 10 km > 20km
Maximum distance
= 10 km = 20 km = 40km
Maximum Link Speed

8 Gbps 4 Gbps 2 Gbps
4 8
IBM System Storage

Supported configuration Site 1 and Site 2 are connected via fibre channel connections A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuoruminthe SVC Supported Hardware List Two ports on each SVC node needed to be connected to theremoteswitches SVC Volume mirroring between Site 1 and Site 2
Server 1
Site 1
Site 2
Server 2
Switch 1 Switch 2
Switch 3 Switch 4
SVC node1
SVC node2
Storage
Storage
Site 3
Active Quorum
Minimum distance >= 0 km > 10 km > 20km
Maximum distance = 10 km = 20 km = 40km
Maximum Link Speed 8 Gbps 4 Gbps 2 Gbps
4 9
IBM System Storage

Supported configuration Site 1 and Site 2 are connected via fibre channel connections A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuoruminthe SVC Supported Hardware List Two ports on each SVC node needed to be connected to theremoteswitch SVC Volume mirroring between Site 1 and Site 2
Site 1
Server 1 ISL (Server)
Site 2
Switch 1 Switch 2
Switch 3 Switch 4
ISL (Server)
ISL (Server)
SVC node 2
Active/Passive WDM devices can be used to reduce number of required FC links between both sites Distance extension to max. 40km with WDM devices
SVC node 1
Storage 3
Storage 2
Switch 5
Switch 6
Minimum distance
Maximum distance
Maximum Link Speed
Site 3
Act. Quorum
>= 0 km > 10 km > 20km
= 10 km = 20 km = 40km
8 Gbps 4 Gbps 2 Gbps
5 0
IBM System Storage

Supported configuration Site 1 and Site 2 are connected via fibre channel connections A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuoruminthe SVC Supported Hardware List Two ports on each SVC node needed to be connected to theremoteswitch SVC Volume mirroring between Site 1 and Site 2
Site 1
Site 2
Switch 1 Switch 2
Switch 3 Switch 4
Active/Passive WDM devices can be used to reduce number of required FC links between both sites Distance extension to max. 40km with WDM devices
ISL (Server)
SVC node 1
ISL (Server)
SVC node 2
Quorum devices with active / passive controller without I/O rerouting (for example DS3/4/5K) must be connected to both controllers from each Site
Storage 3 Storage 2
Minimum distance
Maximum distance
Maximum Link Speed
Switch 5
Switch 6
>= 0 km > 10 km > 20km
= 10 km = 20 km = 40km
8 Gbps
Ctl. A Ctl. B
Site 3
4 Gbps 2 Gbps
DS4700
Act. Quorum
5 1
IBM System Storage
Split I/O Group without ISLs: Long distance configuration
SVC Buffer to Buffer credits 2145CF8 / CG8 have 41 B2B credits Enough for 10km at 8Gb/sec with 2 KB payload All earlier models: Use 1/2/4Gb/sec fibre channel adapters Have 8 B2B credits which is enough for 4km at 4Gb/sec Recommendation 1: Use CF8 / CG8 nodes for more than 4km distance for best performance Recommendation 2: SAN switches do not auto-negotiate B2B credits and 8 B2B credits is the default setting so change the B2B credits in the switch to 41 as well
Link speed
FC frame length 1 km 0.5 km 0.25 km 0.125 km
Required B2B credits for 10 km distance 5 10 20 40
Max distance with 8 B2B credits 16 km 8 km 4 km 2 km

1Gb/sec 2 Gb/sec 4 Gb/sec 8 Gb/sec

5 2
IBM System Storage
Split I/O Group with ISLs between SVC nodes

Server 1 Server 2 Server 3
Site 1
Site 2
Server 4
Split I/O Group with ISLs between SVC nodes Support with SVC 6.3: Supports Metro Mirror distances between nodes Third Site required for Quorum disk ISLs with max. 1 hop can be used for: Quorum traffic SVC node to node communication Requires dedicated private SAN only for inter-node traffic (which can be a Brocade virtual fabric, or a Cisco VSAN) Requires one ISL for each I/O Group between the private SANs at each site
WDM
ISL Publ.SAN1 Priv.SAN1 ISL
WDM
Publ.SAN1 Priv.SAN1
WDM
Publ.SAN2 Priv.SAN2 ISL ISL
WDM
Publ.SAN2 Priv.SAN2
SVC-01
SVC-02 ISL
ISL
Storage
Quorum candidate
Storage
Site 3
Quorum candidate
Maximum distances: 100km for live data mobility (150km with distance extenders) 300km for fail-over / recovery scenarios SVC supports up to 80ms latency, far greater than most application workloads would tolerate The two sites can be connected using active or passive technologies such as CWDM / DWDM if desired. Supported infrastructure: WDM equipment similar to Metro Mirror Link requirement similar to Metro Mirror
Switch
Switch
Ctl. A
Ctl. B
Act. Quorum
5 3
IBM System Storage
Split I/O Group with ISLs between SVC nodes

Supported configuration Site 1 and Site 2 are connected via fibre channel Server 1 connections Server 2 A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuorum in the SVC Supported Hardware List Two ports per SVC node attached to private SANs Two ports per SVC node attached to private SANs Publ.SAN1 Priv.SAN1 SVC Volume mirroring between Site 1 and Site 2 Hosts and storage attached to public SANs Publ.SAN2 3rd site quorum attached to public SANs Priv.SAN2 Note 1: ISLs / Trunks are dedicated to a CiscoVSAN to guarantee bandwidth rater than beeing shared Note 2: ISLs / Trunks are dedicated to a Brocade logical switch to guarantee bandwidth rather than beeing shared (i.e. ISLs are supported, LISLs and XISLs are not) WDM devices: Same link and device requirements as for Metro Mirror
ISL Server 3
Site 1
Site 2
Server 4
WDM
ISL ISL
WDM
Publ.SAN1 Priv.SAN1
WDM
ISL ISL
WDM
Publ.SAN2 Priv.SAN2
SVC-01
SVC-02 ISL
Storage
Quorum candidate
Storage
Site 3
Quorum candidate
Distances Support of up to 300km (same recommendation as for Metro Mirror) Typical deployment of Split I/O Group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used
5 4
Switch
Switch
Ctl. A
Ctl. B
Act. Quorum
IBM System Storage
Long distance with ISLs between SVC nodes

Technically SVC supports distances up to 8000km SVC will tolerate a round trip delay of up to 80ms between nodes In practice Applications are not designed to support a Write I/O latency of 80ms Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os Consequences: Metro Mirror is deployed for shorter distances (up to 300km) Global Mirror is used for longer distances Split I/O Group supported distance will depend on application latency restrictions 100km for live data mobility (150km with distance extenders) 300km for fail-over / recovery scenarios SVC supports up to 80ms latency, far greater than most application workloads would tolerate
5 5
IBM System Storage
Split I/O Group Configuration: Examples

Example 1) Configuration with live data mobility:
Server 1 Server 3 Server 2
VMware ESX with VMotion or AIX with live partition mobility Distance between sites: 12km -> SVC 6.3: Configuration with or without ISLs are supported -> SVC 6.2: Only configuration without ISLs is supported
Site 1
Site 2
Server 4
WDM
ISL Publ.SAN1 Priv.SAN1 ISL
WDM
Publ.SAN1 Priv.SAN1
WDM
Publ.SAN2 Priv.SAN2 ISL ISL
WDM
Publ.SAN2 Priv.SAN2
Example 2)
SVC-01 SVC-02 ISL ISL
Configuration with live data mobility : VMware ESX with VMotion or AIX with live partition mobility Distance between sites: 70km -> Only SVC 6.3 Split I/O Group with ISLs is supported.
Storage
Quorum candidate
Storage
Site 3
Quorum candidate
Example 3)
Configuration without live data mobility : VMware ESX with SRM, AIX HACMP, or MS Cluster Distance between sites: 180km -> Only SVC 6.3 Split I/O Group with ISLs is supported or -> Metro Mirror configuration Because of long distances: only in active / passive configuration
5 6
Switch
Switch
Ctl. A
Ctl. B
Act. Quorum
IBM System Storage
Split I/O Group - Disaster Recovery
Split I/O groups provide distributed HA functionality

Usage of Metro Mirror / Global Mirror is recommended for disaster protection Both major Split I/O Group sites must be connected to the MM / GM infrastructure Without ISLs between SVC nodes: All SVC ports can be used for MM / GM connectivity With ISLs between SVC nodes: Only MM / GM connectivity to the public SAN network is supported Only 2 FC ports per SVC node will be available for MM or GM and will also be used for host to SVC and SVC to disk system I/O Going to limit capabilities of overall system in my opinion
5 7
IBM System Storage
Summary
SVC Split I/O Group: Is a very powerful solution for automatic and fast handling of storage failures Transparent for servers Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility) Transparent for all OS based clusters Distances up to 300 km (SVC 6.3) are supported Two possible scenarios: Without ISLs between SVC nodes (classic SVC Split I/O Group) Up to 40 km distance with support for active (SVC 6.3) and passive (SVC 6.2) WDM With ISLs between SVC nodes: Up to 100 km distance for live data mobility (150 km with distance extenders) Up to 300 km for fail-over / recovery scenarios Long distance performance impact can be optimized by: Load distribution across both sites Appropriate SAN Buffer to Buffer credits
5 8
IBM System Storage
Q&A
IBM System Storage
Q&A
IBM System Storage
Q&A
IBM System Storage
Q&A
IBM System Storage
Q&A

Split SVC Nodes

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Split SVC Nodes

Transféré par

Droits d'auteur :

Formats disponibles

IBM Storwize V7000 Clustering and SVC Split I/O Group Deeper Dive

Bill Wiegand - ATS Senior I/T Specialist Storage Virtualization

Copyright IBM Corporation, 2011

Scaling Storwize V7000 via Clustering

SVC Split I/O Group V6.3 Q&A 2

Copyright IBM Corporation, 2011

Virtualization The Big Picture

C luster consisting of one to four I/O Groups managed as a single system

T wo nodes make up an I/O Group and own given volumes

Copyright IBM Corporation, 2011

Virtualization The Big Picture

S VC Cluster or Storwize V7000 Clustered System

Copyright IBM Corporation, 2011

Scale the Storwize V7000 Multiple Ways

S torwize V7000 2 4- I/O Groups C lustered System

Storwize V7000 Unified Scaling

C C ontrol Enclosure ontrol Enclosure

E E xpansion xpansion Enclosures Enclosures

Copyright IBM Corporation, 2011

Clustered System Facts

Support for a larger system can be requested by submitting a SCORE/RPQ

Clustered System Facts

Control enclosures can be any combination of models

Clustered system operates as a single storage system

Clustered System Facts

Copyright IBM Corporation, 2011

Clustered System Facts

Copyright IBM Corporation, 2011

Clustered System Example

N ode Canister ode Canister N

N ode Canister ode Canister N

C ontrol Enclosure #2 S torage Pool C m disk m disk

S torage Pool A m disk m disk

E xpansion Enclosure E xpansion Enclosure S torage Pool B m disk m disk

E xpansion Enclosure E xpansion Enclosure

Storwize V7000 Clustered System DR

S torwize V7000 O ne to four I/O Group System

A Clustered System can consist of 2-4 I/O Groups

G lobal Mirror o r M etro Mirror

Replication between clustered systems is via fibre channel ports only

Storwize V7000 Clustered System HA

C lustered System S eparated by Distance

S torwize V7000 Clustered System I /O Group 1

S torwize V7000 Clustered System I /O Group 2

So Begs the Question Why Cluster

Copyright IBM Corporation, 2011

Copyright IBM Corporation, 2011

IBM System Storage

SVC Split I/O Group Update

2010 IBM Corporation

IBM System Storage

2011 IBM Corporation

IBM System Storage

2011 IBM Corporation

IBM Systems and Technology Group

SVC What is a Failure Domain

Key is the quorum disk

2008 IBM Corporation

IBM Systems and Technology Group

SVC How Quorum Disks Affect Availability (1)

Storage Engine 336

IBM Systems and Technology Group