Vous êtes sur la page 1sur 63

IBM Storwize V7000 Clustering and SVC Split I/O Group Deeper Dive

Bill Wiegand - ATS Senior I/T Specialist Storage Virtualization

Copyright IBM Corporation, 2011

Agenda
Quick Basics of Virtualization

Scaling Storwize V7000 via Clustering


Scaling Storwize V7000 Unified
Q&A 1

SVC Split I/O Group V6.3 Q&A 2

Copyright IBM Corporation, 2011

Virtualization The Big Picture


Designed to be redundant, modular and scalable solution

C luster consisting of one to four I/O Groups managed as a single system


Volumes Volumes

Storage Network
Volumes

T wo nodes make up an I/O Group and own given volumes


Volumes

Node

Node

Node

Node

Node

Node

Node

Node

Managed Disks

Copyright IBM Corporation, 2011

Virtualization The Big Picture


V olumes: M ax 8192 Volumes, 2048 per I/O Group, with each up to 256TB in size and each assigned to: A specific I/O Group
Built from a specific Storage Pool

I /O Group A
N odes C ontrol Enclosure

I /O Group B
N odes C ontrol Enclosure

S VC Cluster or Storwize V7000 Clustered System

C luster: Max 4 I/O Groups built from 4 SWV7K control enclosures or 8 SVC nodes M anaged Disks (MDisks): Internally or externally provided Max 4096 MDisks per System S torage Pools: Max 128 Storage Pools Max 128 MDisks per Pool

M DG1
M DG3
P ool 1 P ool 2 P ool 3

Copyright IBM Corporation, 2011

Scale the Storwize V7000 Multiple Ways


S torwize V7000 O ne I/O Group System
C ontrol Enclosure

S torwize V7000 2 4- I/O Groups C lustered System


C ontrol Enclosure C ontrol Enclosure

Cluster

An I/O Group is a control enclosure and its associated SAS attached expansion enclosures Clustered system can consist of 2-4 I/O Groups
SCORE approval for > 2

Expand

N o interconnection of SAS chains between control enclosures as control enclosures communicate via FC and must use all 8 FC ports on enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures

Expand

Scale Storage
Add up to 4x the capacity Add up to 4x the throughput

Cluster

Non-disruptive upgrades
From smallest to largest configurations Purchase hardware only when you need it

A n I/O Group is a control enclosure and its associated SAS connected expansion enclosures 5

NOTE: Storwize V7000 Clustered System with greater then two I/O Groups/Frames requires SCORE/RPQ approval

Virtualize storage arrays behind Storwize V7000 for even greater capacity and throughput
Copyright IBM Corporation, 2011

Storwize V7000 Unified Scaling


S torwize S torwize V7000 V7000 Unified 2 Unified 4- I/O Groups C lustered System O ne I/O Group N System OT SUPPORTED

C ontrol Enclosure

C C ontrol Enclosure ontrol Enclosure

Storwize V7000 Unified can scale disk capacity by adding up to nine expansion enclosures to the standard control enclosure Virtualize external storage arrays behind Storwize V7000 Unified for even greater capacity
CIFS not supported currently with externally virtualized storage

Expand

E xpansion Enclosures

E E xpansion xpansion Enclosures Enclosures

CAN NOT horizontally scale out by adding additional Unified systems or even adding just another Storwize V7000 control enclosure and associated expansion enclosures at this time
If customer has clustered Storwize V7000 system today they will not be able to upgrade to Unified system in 2012 when MES is available

Copyright IBM Corporation, 2011

Clustered System Facts


Clustered system provides ability to independently grow capacity and performance
Add expansion enclosures for more capacity
Add control enclosure for more performance No extra feature to order and no extra charge for a clustered system
Configure one system using USB stick and then add second using GUI

Clustered systems GA support is for up to 480 SFF disk drives or 240 LFF disk drives or a mix thereof
Up to 480TB raw capacity in one 42U rack Enables Storwize V7000 to compete effectively against larger EMC, NetApp, HP systems

Support for a larger system can be requested by submitting a SCORE/RPQ


E.g.EightStorwizeV7000nodecanistersinfourcontrolenclosures Up to 960TB raw capacity in two 42U racks
7 Copyright IBM Corporation, 2011

Clustered System Facts


Adding additional control enclosures to existing V6.2+ system is non-disruptive
Requires new control enclosures be loaded with V6.2.x minimum

Control enclosures can be any combination of models


2076-112, 124, 312, 324

Clustered system operates as a single storage system


Managed via one IP address

Both node canisters in a given control enclosure are part of the same I/O Group
Cannot create an I/O Group with one node from each of 2 different control enclosures Adding one node in control enclosure to an I/O Group will automatically add the other Storwize V7000 clustered system does not support split I/O group configurations - (also known as stretch cluster)
8 Copyright IBM Corporation, 2011

Clustered System Facts


Inter control enclosure communication provided by a Fibre Channel (FC) SAN
Must use all 4 FC ports on each node canister and zone all together
All FC ports on a node canister must have at least one path to every node canister in the clustered system that is not in the same control enclosure Node canisters in the same control enclosure have connectivity via the PCIelinkofthemidplaneanddontrequireFCportsbezonedtogether
However, recommended guideline is to zone them together as it provides a secondary path should the PCIe link have issues

Only 1 control enclosure can appear on a given SAS chain Only 1 node canister can appear on a single strand of SAS chain
Key to realize is there is no access by one control enclosure (I/O Group) to the SAS attached expansion enclosures of another control enclosure (I/O Group) other then via the SAN

Copyright IBM Corporation, 2011

Clustered System Facts


Currently volumes built on internal MDisks in a storage pool will be owned by the same I/O group (IOG) that owns the majority of the MDisks in that storage pool
E.g. Pool-1 has 3 MDisks from IOG-0 and 4 from IOG-1 then by default IOG-1 will own all volumes created
DefaultGUIbehaviorcanbeoverriddenusingtheAdvancedoptioninGUI

If pool owns exact same number of MDisks from each I/O group then volumes will be owned by IOG-0

Expansion enclosures only communicate with their owning control enclosure meaning host I/Os coming into IOG-0 but data is on IOG-1 means I/O is forwarded to IOG-0 over FC
Similar process to SVC accessing external storage systems
Does not go thru cache on owning I/O group but directly to MDisk
Uses very lowest layer of I/O stack to minimize any additional latency

10

Copyright IBM Corporation, 2011

Clustered System Example


S AN

N ode Canister ode Canister N

N ode Canister ode Canister N

C ontrol Enclosure #1
A ll cabling shown is logical

C ontrol Enclosure #2 S torage Pool C m disk m disk

S torage Pool A m disk m disk

E xpansion Enclosure E xpansion Enclosure S torage Pool B m disk m disk

E xpansion Enclosure E xpansion Enclosure

m disk

m disk

E xpansion Enclosure

E xpansion Enclosure

I /O Group #1

I /O Group #2

Expansion enclosures are connected through one control enclosure and can be part of only one I/O group All MDisks are part of only one I/O group
11

Storage pools can contain MDisks from more than one I/O group Inter-control enclosure communications happens over the SAN A volume is serviced by only one I/O group
Copyright IBM Corporation, 2011

Storwize V7000 Clustered System DR


P roduction Site D isaster Recovery Site

S torwize V7000 2 One 4- I/O Group Group C S lustered ingle Frame System

An I/O Group is a control enclosure and its associated SAS attached expansion enclosures

S torwize V7000 O ne to four I/O Group System

A Clustered System can consist of 2-4 I/O Groups


SCORE approval for > 2

E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures C ontrol Enclosure C ontrol Enclosure C ontrol Enclosure C E ontrol xpansion Enclosure Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures

G lobal Mirror o r M etro Mirror

E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures C ontrol Enclosure C ontrol Enclosure C ontrol Enclosure C E ontrol xpansion Enclosure Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures

Replication between clustered systems is via fibre channel ports only


Replication between up to four clustered systems is allowed Requires 5639-RM1 license(s) at each site

NOTE: Storwize V7000 Clustered System with greater then 2 I/O Groups/Frames requires SCORE/RPQ approval
12 Copyright IBM Corporation, 2011

Storwize V7000 Clustered System HA


M irrored V olume

H ost

A High Availability clustered system similar to a SVC Split I/O Group configuration is not possible since we can not split a control enclosure in half and install at two different sites
One I/O Group will be at each site unlike SVC where each node in an I/O Group can be installed in a different site

E xpansion Enclosures

E xpansion Enclosures

E xpansion Enclosures

So if you lose a site you lose access to all volumes owned by that I/O Group There is no automatic failover of a volume from one I/O Group to another

C ontrol Enclosure

E xpansion Enclosures

C lustered System S eparated by Distance

C ontrol Enclosure

E xpansion Enclosures

Volume mirroring does allow for a single host volume to have pointers to two sets of data which can be on different I/O Groups in a clustered system, but again if you lose a site you lose the entire I/O Group so any volumes owned by that I/O Group will be offline
You can migrate the volume ownership from the failed IOG to the other IOG but data may be lost as unwritten data still in cache on offline IOG is discarded in process of migration or could have been lost if IOG failed hard without saving cached data
Copyright IBM Corporation, 2011

S torwize V7000 Clustered System I /O Group 1

S torwize V7000 Clustered System I /O Group 2

P roduction Site A
13

P roduction Site B

So Begs the Question Why Cluster


One reason it is offered is because we can
Runs same software as SVC which supports 1-4 I/O Groups

Can start very small and grow very large storage system with single management interface
Helps to compete with larger midrange systems from other vendors

Can virtualize external storage too providing same virtualization features across entire Clustered System
Just like SVC cluster so desirable for same reasons large SVC clusters are

************************************************************************** However, nothing wrong with going with 1-4 separate systems versus a Clustered System if customer prefers
Systemmanagementisntthathardanyway If customer will lose sleep over possible complete failure of a control enclosure, no matter how unlikely that is, then go with separate systems
14 Copyright IBM Corporation, 2011

Q&A

15

Copyright IBM Corporation, 2011

Q&A

16

Copyright IBM Corporation, 2011

IBM System Storage

SVC Split I/O Group Update


Bill Wiegand/Thomas Vogel ATS System Storage

2010 IBM Corporation

IBM System Storage

Agenda
Terminology SVC Split I/O Group Review Long distance: refresh WDM devices Buffer-to-Buffer credits SVC Quorum disk Split I/O Group without ISLs between SVC nodes Supported configurations SAN configuration for long distance Split I/O Group with ISLs between SVC nodes Supported configurations SAN configuration for long distance

1 8

2011 IBM Corporation

IBM System Storage

Terminology
SVC Split I/O Group = SVC Stretched Cluster = SVC Split Cluster

Two independent SVC nodes in two independent sites + one independent site for Quorum Acts just like a single I/O Group with distributed high availability
Site 1 Site 2
I/O Group 1

I/O Group 1

Distributed I/O groups NOT a HA Configuration and not recommended, if one site failed: Manual volume move required Some data still in cache of offline I/O Group Site 1
I/O Group 1 I/O Group 1 I/O Group 2

Site 2

I/O Group 2

Storwize V7000 Split I/O Group not an option: Single enclosure includes both nodes Physical distribution across two sites not possible Site 1 Site 2

1 9

2011 IBM Corporation

IBM Systems and Technology Group

SVC What is a Failure Domain


Generally a failure domain will
represent a physical location, but depends on what type of failure you are trying to protect against
Could all be in one building on different floors/rooms or just different power domains in same data center Could be multiple buildings on the same campus Could be multiple buildings up to 300KM apart

Key is the quorum disk


If only have two physical sites and quorum disk to be in one of them then somefailurescenarioswontallow cluster to survive Minimum is to have active quorum disk system on separate power grid in one of the two failure domains
2 0

2008 IBM Corporation

IBM Systems and Technology Group

SVC How Quorum Disks Affect Availability (1)


Failure Domain 1 Failure Domain 2
1) Loss of active quorum: SVC selects another quorum Continuation of operations 2) Loss of storage system: Loss of active quorum SVC selects another quorum Continuation of operations Mirrored Volumes continue operation but may take 60sec or more since active quorum disk failed Note: The loss of all quorum disks will not cause the cluster to stop as long as there are a majority of the nodes operational in the cluster. However, mirrored Volumes will likely go offline. This is why you would manually configure the cluster so the quorum disk candidates are located SVC Quorum 2 SVC Quorum 1 on disk systems in both failure domains. Active Quorum
2008 IBM Corporation

ISL 1

TotalStorage TotalStorage
Storage Engine 336

Storage Engine 336

Node 1

Node 2

ISL 2

Volume

Mirroring

SVC Quorum 3 2 1

IBM Systems and Technology Group

SVC How Quorum Disks Affect Availability (2)


Failure Domain 1 Failure Domain 2
Lose of Failure Domain 1 : Active quorum not affected Continuation of operations

ISL 1

Lose of Failure Domain 2: Active quorum lost Half of nodes lost Loss of cluster majority
TotalStorage
Storage Engine 336

TotalStorage

Storage Engine 336

Node 1

Node 2

ISL 2

No Active Quorum

Node 1 can not utilize quorum candidate to recover and survive Node 1 shuts down and cluster stopped May not be recoverable and may require cluster rebuild and data restore from backups

Volume

Mirroring

No Access to Data on Disk


SVC Quorum 3 SVC Quorum 2 SVC Quorum 1

Active Quorum
2 2

Active Quorum
2008 IBM Corporation

IBM Systems and Technology Group

Current Supported Configuration for Split I/O Group


Failure Domain 1 Failure Domain 2

Failure Domain 3
Automated failover with SVC handling The loss of:

ISL 1

- SVC node - Quorum disk - Storage subsystem


Can incorporate

TotalStorage TotalStorage
Storage Engine 336

Storage Engine 336

Node 1

Node 2

MM/GM to provide disaster recovery - 3 site like capability

ISL 2 Disk system that supports ExtendedQuorum Volume Mirroring

SVC Quorum 3

SVC Quorum 2 SVC Quorum 1

SVC Quorum 1

Active Quorum
2 3

Active Quorum

Active Quorum
2008 IBM Corporation

IBM System Storage

SVC Split I/O Group

Site 1 SVC Node 1


Operational Failed Operational Operational

Site 2 SVC Node 2


Operational Operational Failed Operational

Site 3 Quorum disk


Operational Operational Operational Failed

Cluster Status
Operational, optimal Operational, Write cache disabled Operational, Write cache disabled Operational, Write Cache enabled, but different active Quorum disk Whichever node accesses the active quorum disk first survives and the partner node goes offline Stopped Stopped
2011 IBM Corporation

Operational, link to Site 2 failed: Split Brain

Operational, link to Site 1 failed: Split Brain

Operational

Operational Failed same time with Site 3


2 4

Failed same time with Site 3 Operational

Failed same time with Site 2 Failed same time with Site 1

IBM System Storage

Advantages / Disadvantages of SVC Split I/O Group


Advantages No manual intervention required Automatic and fast handling of storage failures Volumes mirrored in both locations Transparent for servers and host based clusters Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility) Disadvantages Mix between HA and DR solution but not a true DR solution Non-trivial implementation

2 5

2011 IBM Corporation

SVC Split I/O Group V6.3 Enhancements

Copyright IBM Corporation, 2011

Split I/O Group Physical Configurations


The following charts show supported physical configurations for the new Split I/O Group support in V6.3
VSANs (CISCO) and Virtual Fabrics (Brocade) are not supported by all switch models from the respective vendors
Consult vendor for further information

Enhancements designed to help us compete more effectively with EMC VPLEX at longer distances Note that this information is all very new even to ATS and some requirements could change prior to GA Highly recommend engaging ATS for solution design review
w3.ibm.com/support/techxpress

Storwize V7000 does not provide any sort of split I/O group, split cluster, stretch cluster HA configurations
A clustered Storwize V7000 provides the ability to grow system capacity and scale performance within a localized single system image
27 Copyright IBM Corporation, 2011

Extension of Currently Supported Configuration


U ser chooses number of ISLs on SAN
Server Cluster 1 Server Cluster 2

S AN Active DWDM over shared single mode fibre(s) SVC + UPS

S AN

0-10 KM Fibre Channel distance supported up to 8Gbps


S AN 11-20KM Fibre Channel distance supported up to 4Gbps 21-40KM Fibre Channel distance supported up to 2Gbps

SVC + UPS

S AN

U ser chooses number of ISLs on SAN

T wo ports per SVC node attached to local SANs T wo ports per SVC node attached to remote SANs via DWDM H osts and storage attached to SANs via ISLs sufficient for workload 3 rd site quorum (not shown) attached to SANs
28 Copyright IBM Corporation, 2011

Configuration With 4 Switches at Each Site


U ser chooses number of ISLs on public SAN
Server Cluster 1 Server Cluster 2

P ublic S AN

1 P rivate ISL per I/O group C onfigured as trunk S AN

P rivate S AN

P ublic S AN

SVC + UPS

SVC + UPS

P ublic S AN

P rivate S AN

1 ISL per I/O group C onfigured as trunk

P rivate S AN

P ublic S AN

U ser chooses number of ISLs on public SAN

T wo ports per SVC node attached to public SANs T wo ports per SVC node attached to private SANs H osts and storage attached to public SANs 3 rd site quorum (not shown) attached to public SANs
29 Copyright IBM Corporation, 2011

Configuration Using CISCO VSANs


S witches are partitioned using VSANs Server Cluster 1 Server Cluster 2

P ublic V SAN

P rivate V SAN

P rivate V SAN

P ublic V SAN

SVC + UPS

SVC + UPS

P ublic V SAN

P rivate V SAN

P rivate V SAN

P ublic V SAN

N ote ISLs/Trunks for private VSANs are dedicated r ather than being shared to guarantee dedicated bandwidth is available for node to node traffic
30 Copyright IBM Corporation, 2011

Configuration Using Brocade Virtual Fabrics


Server Cluster 1 P hysical switches are partitioned into t wo logical switches Server Cluster 2

P ublic S AN

P rivate S AN

P rivate S AN

P ublic S AN

SVC + UPS

SVC + UPS

P ublic S AN

P rivate S AN

P rivate S AN

P ublic S AN

N ote ISLs/Trunks for private SANs are dedicated r ather than being shared to guarantee dedicated bandwidth is available for node to node traffic
31 Copyright IBM Corporation, 2011

Split I/O Group Distance

The new Split I/O Group configurations will support distances of up to 300km (same recommendation as for Metro Mirror) However for the typical deployment of a split I/O group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used The following charts explain why

32

2009 IBM Corporation

Metro Mirror

Technically SVC supports distances up to 8000km

SVC will tolerate a round trip delay of up to 80ms between nodes

The same code is used for all inter-node communication Global Mirror, Metro Mirror, Cache Mirroring, Clustering SVCs proprietary SCSI protocol only has 1 round trip

In practice Applications are not designed to support a Write I/O latency of 80ms

Hence Metro Mirror is deployed for shorter distances (up to 300km) and Global Mirror is used for longer distances
2009 IBM Corporation

33

IBM Presentation Template Full Version

Metro Mirror: Application Latency = 1 long distance round trip

Server Cluster 1
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host

Server Cluster 2

1 round trip
4) Metro Mirror Data transfer to remote site 5) Acknowledgment

SVC Cluster 1
7a) Write request from SVC 8a) Xfer ready to SVC 9a) Data transfer from SVC 10a) Write completed to SVC

SVC Cluster 2
7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC

Data center 1
34

Steps 1 to 6 affect application latency

Data center 2
2009 IBM Corporation

Steps 7 to 10 should not affect the application

Split I/O Group for Business Continuity

Split I/O Group splits the nodes in an I/O group across two sites

SVC will tolerate a round trip delay of up to 80ms Cache Mirroring traffic rather than Metro Mirror traffic is sent across the inter-site link

Data is mirrored to back-end storage using Volume Mirroring

Data is written by the 'preferred' node to both the local and remote storage The SCSI Write protocol results in 2 round trips This latency is generally hidden from the Application by the write cache
2009 IBM Corporation

35

IBM Presentation Template Full Version

Split I/O Group Local I/O: Application Latency = 1 round trip

Server Cluster 1
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host

Server Cluster 2

1 round trip
4) Cache Mirror Data transfer to remote site 5) Acknowledgment

Node 1

SVC Split I/O Group

Node 2

7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC

2 round trips but SVC write cache hides this latency from the host
Data center 1
36

Data center 2 Steps 1 to 6 affect application latency Steps 7 to 10 should not affect the application

2009 IBM Corporation

Split I/O Group for Mobility

Split I/O Group is also often used to move workload between servers at different sites VMotion or equivalent can be used to move Applications between servers Applications no longer necessarily issue I/O requests to the local SVC nodes SCSI Write commands from hosts to remote SVC nodes results in an additional 2 round trips worth of latency that is visible to the Application

37

2009 IBM Corporation

IBM Presentation Template Full Version

Split I/O Group Remote I/O: Application Latency = 3 round trips

Server Cluster 1

Server Cluster 2

2 round trips

1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host

1 round trip
4) Cache Mirror Data transfer to remote site 5) Acknowledgment

Node 1

SVC Split I/O Group

Node 2

7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC

2 round trips but SVC write cache hides this latency from the host
Data center 1
38

Data center 2 Steps 1 to 6 affect application latency Steps 7 to 10 should not affect the application

2009 IBM Corporation

Split I/O Group for Mobility

Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands

These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os

39

2009 IBM Corporation

IBM Presentation Template Full Version

Split I/O Group Remote I/O: Application Latency = 2 round trips


5) Write request to SVC 6) Xfer ready from SVC 7) Data transfer to SVC 10) Write completed from SVC

Server Cluster 1

Server Cluster 2
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 12) Write completed to host

1 round trip
11) Write completion to remote site

4) Write+ data transfer to remote site

1 round trip
8) Cache Mirror Data transfer to remote site 9) Acknowledgment

Node 1

SVC Split I/O Group


16) Write+ data transfer to remote site 21) Write completion to remote site

Node 2

1 round trip hidden from the host


Distance Extenders
13) Write request from SVC 14) Xfer ready to SVC 15) Data transfer from SVC 22) Write completed to SVC

17) Write request to storage 18) Xfer ready from storage 19) Data transfer to storage 20) Write completed from storage

Data center 1
40

Data center 2 Steps 1 to 12 affect application latency Steps 13 to 22 should not affect the application

2009 IBM Corporation

IBM System Storage

Long Distance Impact


Additional latency because of long distance Light speed in glass: ~ 200.000 km/sec 1 km distance = 2 km round trip Additional round trip time because of distance: 1 km = 0.01 ms 10 km = 0.10 ms 25 km = 0.25 ms 100 km = 1.00 ms 300 km = 3.00 ms SCSI protocol: Read: 1 I/O operation = 0.01 ms / km Initiator requests data and target provides data Write: 2 I/O operations = 0.02 ms / km Initiator announces amount of data, target acknowledges Initiator send data, target acknowledge SVCsproprietarySCSIprotocolfornode-to-node traffic has only 1 round trip Fibre channel frame: User data per FC frame (Fibre channel payload): up to 2048 bytes = 2KB Also for very small user data (< 2KB) a complete frame is required Large user data is split across multiple frames

4 1

2011 IBM Corporation

IBM System Storage

Passive/Active WDM devices


Passive WDM
No power required Can use CWDM or DWDM technology Colored SFPs required They create different Wavelength Customer must own the physical cable end to end No rental of some wavelength from a service provider possible Limited equipment cost Max distance 70km depending on SFP

Active WDM
Power required Can use CWDM or DWDM technology Change incoming/outgoing wavelengths Adds negligible latency because of signal change Consolidate multiple wavelengths in one cable No dedicated link required Customers can rent some frequencies High equipment cost Longer distances supported

4 2

2011 IBM Corporation

IBM System Storage

CWDM / DWDM Devices


WDM means Wavelength Division Multiplexing Parallel transmission of number of wavelengths over a fiber
CWDM (Coarse Wavelength Division Multiplex) 16 or 32 wavelength into a fibre Uses wide-range frequencies Wider channel spacing - 20nm (2.5THz grid) CWDM Spectrum DWDM (Dense Wavelength Division Multiplex ) 32, 64 or 128 wavelength into a fibre Narrow frequencies Narrow channel spacing - e.g. 0.8nm (100GHz grid) DWDM Spectrum

4 3

2011 IBM Corporation

IBM System Storage

WDM Optical Networking: Passive vs. Active Solutions


Passive

Active

8G 10G 2G N x 4G 8G 100G

TXP TXP
TDM TDM

TXP TXP
TDM TDM

TXP

TXP

FSP 3000 Higher capacity (more channels per fiber) Higher aggregate bandwidth (up to 100G per wavelength) Higher distance (up to 200 km without mid-span amplifier)

FSP 3000

More secure (automated fail over, NMS, optical monitoring tools, embedded encryption)

Advanced features through usage of active xWDM technology


Source: ADVA
4 4 2011 IBM Corporation

IBM System Storage

SAN and Buffer-to-Buffer Credits

Buffer-to-Buffer (B2B) credits Are used as a flow control method by Fibre Channel technology and represent the number of frames a port can store Provides best performance Light must cover the distance 2 times Submit data from Node 1 to Node 2 Submit acknowledge from Node 2 back to Node 1 B2B Calculation depends on link speed and distance Number of multiple frames in flight increase equivalent to the link speed

4 5

2011 IBM Corporation

IBM System Storage

SVC Split I/O Group Quorum Disk


SVC creates three Quorum disk candidates on the first three managed MDisks One Quorum disk is active

SVC 5.1 and later: SVC is able to handle the Quorum disk management in a very flexible way, but in a Split I/O Group configuration a well defined setup is required. ->DisablethedynamicquorumfeatureusingtheoverrideflagforV6.2andlater
svctask chquorum -MDisk <mdisk_id or name> -override yes This flag is currently not configurable in the GUI

SplitBrainsituation: SVC uses the quorum disk to decide which SVC node(s) should survive No access to the active Quorum disk: In a standard situation (no split brain): SVC will select one of the other Quorum candidates as active Quorum In a split brain situation: SVC may take mirrored Volumes offline

4 6

2011 IBM Corporation

IBM System Storage

SVC Split I/O Group Quorum Disk


Quorum disk requirements: Must be placed in a third, independent site Must be fibre channel connected ISLs with one hop to Quorum storage system are supported Supported infrastructure: WDM equipment similar to Metro Mirror Link requirement similar to Metro Mirror Max round trip delay time is 80 ms, 40 ms each direction FCIP to Quorum disk can be used with the following requirements: Max round trip delay time is 80 ms, 40 ms each direction The fabrics are not merged so routers required Independent long distance equipment from each site to Site 3 is required

iSCSI storage not supported


Requirement for active / passive storage devices (like DS3/4/5K): Each quorum disk storage controller must be connected to both sites

4 7

2011 IBM Corporation

IBM System Storage

Split I/O Group without ISLs between SVC nodes


Split I/O Group without ISLs between SVC nodes (Classic Split I/O Group) SVC 6.2 and earlier: TwoportsoneachSVCnodeneededtobeconnectedtotheremoteswitch No ISLs between SVC nodes Third site required for Quorum disk ISLs with max. 1 hop can be used for Server traffic and Quorum disk attachment SVC 6.2 (late) update: Distance extension to max. 40 km with passive WDM devices Up to 20km at 4Gb/s or up to 40km at 2Gb/s. LongWave SFPs for long distances required LongWave SFPs must be supported from the switch vendor

SVC 6.3: Similar to the support statement in SVC 6.2 Additional: support for active WDM devices Quorum disk requirement similar to Remote Copy (MM/GM) requirments: Max. 80 ms Round Trip delay time, 40 ms each direction FCIP connectivity supported No support for iSCSI storage system

Minimum distance
>= 0 km > 10 km > 20km

Maximum distance
= 10 km = 20 km = 40km

Maximum Link Speed


8 Gbps 4 Gbps 2 Gbps
2011 IBM Corporation

4 8

IBM System Storage

Split I/O Group without ISLs between SVC nodes


Supported configuration Site 1 and Site 2 are connected via fibre channel connections A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuoruminthe SVC Supported Hardware List Two ports on each SVC node needed to be connected to theremoteswitches SVC Volume mirroring between Site 1 and Site 2
Server 1

Site 1

Site 2

Server 2

Switch 1 Switch 2

Switch 3 Switch 4

SVC node1

SVC node2

Storage

Storage

Site 3
Active Quorum

Minimum distance >= 0 km > 10 km > 20km

Maximum distance = 10 km = 20 km = 40km

Maximum Link Speed 8 Gbps 4 Gbps 2 Gbps

4 9

2011 IBM Corporation

IBM System Storage

Split I/O Group without ISLs between SVC nodes


Supported configuration Site 1 and Site 2 are connected via fibre channel connections A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuoruminthe SVC Supported Hardware List Two ports on each SVC node needed to be connected to theremoteswitch SVC Volume mirroring between Site 1 and Site 2

Site 1
Server 1 ISL (Server)

Site 2
Server 2 ISL (Server)

Switch 1 Switch 2

Switch 3 Switch 4

ISL (Server)

ISL (Server)
SVC node 2

Active/Passive WDM devices can be used to reduce number of required FC links between both sites Distance extension to max. 40km with WDM devices

SVC node 1

Storage 3

Storage 2

Switch 5

Switch 6

Minimum distance

Maximum distance

Maximum Link Speed

Site 3
Act. Quorum

>= 0 km > 10 km > 20km

= 10 km = 20 km = 40km

8 Gbps 4 Gbps 2 Gbps

5 0

2011 IBM Corporation

IBM System Storage

Split I/O Group without ISLs between SVC nodes


Supported configuration Site 1 and Site 2 are connected via fibre channel connections A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuoruminthe SVC Supported Hardware List Two ports on each SVC node needed to be connected to theremoteswitch SVC Volume mirroring between Site 1 and Site 2

Site 1
Server 1 ISL (Server)

Site 2
Server 2 ISL (Server)

Switch 1 Switch 2

Switch 3 Switch 4

Active/Passive WDM devices can be used to reduce number of required FC links between both sites Distance extension to max. 40km with WDM devices

ISL (Server)
SVC node 1

ISL (Server)
SVC node 2

Quorum devices with active / passive controller without I/O rerouting (for example DS3/4/5K) must be connected to both controllers from each Site
Storage 3 Storage 2

Minimum distance

Maximum distance

Maximum Link Speed

Switch 5

Switch 6

>= 0 km > 10 km > 20km

= 10 km = 20 km = 40km

8 Gbps
Ctl. A Ctl. B

Site 3

4 Gbps 2 Gbps

DS4700

Act. Quorum

5 1

2011 IBM Corporation

IBM System Storage

Split I/O Group without ISLs: Long distance configuration

SVC Buffer to Buffer credits 2145CF8 / CG8 have 41 B2B credits Enough for 10km at 8Gb/sec with 2 KB payload All earlier models: Use 1/2/4Gb/sec fibre channel adapters Have 8 B2B credits which is enough for 4km at 4Gb/sec Recommendation 1: Use CF8 / CG8 nodes for more than 4km distance for best performance Recommendation 2: SAN switches do not auto-negotiate B2B credits and 8 B2B credits is the default setting so change the B2B credits in the switch to 41 as well

Link speed

FC frame length 1 km 0.5 km 0.25 km 0.125 km

Required B2B credits for 10 km distance 5 10 20 40

Max distance with 8 B2B credits 16 km 8 km 4 km 2 km


2011 IBM Corporation

1Gb/sec 2 Gb/sec 4 Gb/sec 8 Gb/sec


5 2

IBM System Storage

Split I/O Group with ISLs between SVC nodes


Server 1 Server 2 Server 3

Site 1

Site 2

Server 4

Split I/O Group with ISLs between SVC nodes Support with SVC 6.3: Supports Metro Mirror distances between nodes Third Site required for Quorum disk ISLs with max. 1 hop can be used for: Quorum traffic SVC node to node communication Requires dedicated private SAN only for inter-node traffic (which can be a Brocade virtual fabric, or a Cisco VSAN) Requires one ISL for each I/O Group between the private SANs at each site
WDM
ISL Publ.SAN1 Priv.SAN1 ISL

WDM
Publ.SAN1 Priv.SAN1

WDM
Publ.SAN2 Priv.SAN2 ISL ISL

WDM
Publ.SAN2 Priv.SAN2

SVC-01

SVC-02 ISL

ISL

Storage
Quorum candidate

Storage

Site 3

Quorum candidate

Maximum distances: 100km for live data mobility (150km with distance extenders) 300km for fail-over / recovery scenarios SVC supports up to 80ms latency, far greater than most application workloads would tolerate The two sites can be connected using active or passive technologies such as CWDM / DWDM if desired. Supported infrastructure: WDM equipment similar to Metro Mirror Link requirement similar to Metro Mirror

Switch

Switch

Ctl. A

Ctl. B

Act. Quorum

5 3

2011 IBM Corporation

IBM System Storage

Split I/O Group with ISLs between SVC nodes


Supported configuration Site 1 and Site 2 are connected via fibre channel Server 1 connections Server 2 A third site is required for Quorum disk placement QuorumdiskmustbelistedasExtendedQuorum in the SVC Supported Hardware List Two ports per SVC node attached to private SANs Two ports per SVC node attached to private SANs Publ.SAN1 Priv.SAN1 SVC Volume mirroring between Site 1 and Site 2 Hosts and storage attached to public SANs Publ.SAN2 3rd site quorum attached to public SANs Priv.SAN2 Note 1: ISLs / Trunks are dedicated to a CiscoVSAN to guarantee bandwidth rater than beeing shared Note 2: ISLs / Trunks are dedicated to a Brocade logical switch to guarantee bandwidth rather than beeing shared (i.e. ISLs are supported, LISLs and XISLs are not) WDM devices: Same link and device requirements as for Metro Mirror
ISL Server 3

Site 1

Site 2

Server 4

WDM
ISL ISL

WDM
Publ.SAN1 Priv.SAN1

WDM
ISL ISL

WDM
Publ.SAN2 Priv.SAN2

SVC-01

SVC-02 ISL

Storage
Quorum candidate

Storage

Site 3

Quorum candidate

Distances Support of up to 300km (same recommendation as for Metro Mirror) Typical deployment of Split I/O Group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used
5 4

Switch

Switch

Ctl. A

Ctl. B

Act. Quorum

2011 IBM Corporation

IBM System Storage

Long distance with ISLs between SVC nodes


Technically SVC supports distances up to 8000km SVC will tolerate a round trip delay of up to 80ms between nodes In practice Applications are not designed to support a Write I/O latency of 80ms Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os Consequences: Metro Mirror is deployed for shorter distances (up to 300km) Global Mirror is used for longer distances Split I/O Group supported distance will depend on application latency restrictions 100km for live data mobility (150km with distance extenders) 300km for fail-over / recovery scenarios SVC supports up to 80ms latency, far greater than most application workloads would tolerate

5 5

2011 IBM Corporation

IBM System Storage

Split I/O Group Configuration: Examples


Example 1) Configuration with live data mobility:
Server 1 Server 3 Server 2

VMware ESX with VMotion or AIX with live partition mobility Distance between sites: 12km -> SVC 6.3: Configuration with or without ISLs are supported -> SVC 6.2: Only configuration without ISLs is supported

Site 1

Site 2

Server 4

WDM
ISL Publ.SAN1 Priv.SAN1 ISL

WDM
Publ.SAN1 Priv.SAN1

WDM
Publ.SAN2 Priv.SAN2 ISL ISL

WDM
Publ.SAN2 Priv.SAN2

Example 2)
SVC-01 SVC-02 ISL ISL

Configuration with live data mobility : VMware ESX with VMotion or AIX with live partition mobility Distance between sites: 70km -> Only SVC 6.3 Split I/O Group with ISLs is supported.
Storage
Quorum candidate

Storage

Site 3

Quorum candidate

Example 3)
Configuration without live data mobility : VMware ESX with SRM, AIX HACMP, or MS Cluster Distance between sites: 180km -> Only SVC 6.3 Split I/O Group with ISLs is supported or -> Metro Mirror configuration Because of long distances: only in active / passive configuration
5 6

Switch

Switch

Ctl. A

Ctl. B

Act. Quorum

2011 IBM Corporation

IBM System Storage

Split I/O Group - Disaster Recovery

Split I/O groups provide distributed HA functionality


Usage of Metro Mirror / Global Mirror is recommended for disaster protection Both major Split I/O Group sites must be connected to the MM / GM infrastructure Without ISLs between SVC nodes: All SVC ports can be used for MM / GM connectivity With ISLs between SVC nodes: Only MM / GM connectivity to the public SAN network is supported Only 2 FC ports per SVC node will be available for MM or GM and will also be used for host to SVC and SVC to disk system I/O Going to limit capabilities of overall system in my opinion

5 7

2011 IBM Corporation

IBM System Storage

Summary
SVC Split I/O Group: Is a very powerful solution for automatic and fast handling of storage failures Transparent for servers Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility) Transparent for all OS based clusters Distances up to 300 km (SVC 6.3) are supported Two possible scenarios: Without ISLs between SVC nodes (classic SVC Split I/O Group) Up to 40 km distance with support for active (SVC 6.3) and passive (SVC 6.2) WDM With ISLs between SVC nodes: Up to 100 km distance for live data mobility (150 km with distance extenders) Up to 300 km for fail-over / recovery scenarios Long distance performance impact can be optimized by: Load distribution across both sites Appropriate SAN Buffer to Buffer credits

5 8

2011 IBM Corporation

IBM System Storage

Q&A

2011 IBM Corporation

IBM System Storage

Q&A

2011 IBM Corporation

IBM System Storage

Q&A

2011 IBM Corporation

IBM System Storage

Q&A

2011 IBM Corporation

IBM System Storage

Q&A

2011 IBM Corporation

Vous aimerez peut-être aussi