Académique Documents
Professionnel Documents
Culture Documents
Agenda
Quick Basics of Virtualization
Storage Network
Volumes
Node
Node
Node
Node
Node
Node
Node
Node
Managed Disks
I /O Group A
N odes C ontrol Enclosure
I /O Group B
N odes C ontrol Enclosure
C luster: Max 4 I/O Groups built from 4 SWV7K control enclosures or 8 SVC nodes M anaged Disks (MDisks): Internally or externally provided Max 4096 MDisks per System S torage Pools: Max 128 Storage Pools Max 128 MDisks per Pool
M DG1
M DG3
P ool 1 P ool 2 P ool 3
Cluster
An I/O Group is a control enclosure and its associated SAS attached expansion enclosures Clustered system can consist of 2-4 I/O Groups
SCORE approval for > 2
Expand
N o interconnection of SAS chains between control enclosures as control enclosures communicate via FC and must use all 8 FC ports on enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures
Expand
Scale Storage
Add up to 4x the capacity Add up to 4x the throughput
Cluster
Non-disruptive upgrades
From smallest to largest configurations Purchase hardware only when you need it
A n I/O Group is a control enclosure and its associated SAS connected expansion enclosures 5
NOTE: Storwize V7000 Clustered System with greater then two I/O Groups/Frames requires SCORE/RPQ approval
Virtualize storage arrays behind Storwize V7000 for even greater capacity and throughput
Copyright IBM Corporation, 2011
C ontrol Enclosure
Storwize V7000 Unified can scale disk capacity by adding up to nine expansion enclosures to the standard control enclosure Virtualize external storage arrays behind Storwize V7000 Unified for even greater capacity
CIFS not supported currently with externally virtualized storage
Expand
E xpansion Enclosures
CAN NOT horizontally scale out by adding additional Unified systems or even adding just another Storwize V7000 control enclosure and associated expansion enclosures at this time
If customer has clustered Storwize V7000 system today they will not be able to upgrade to Unified system in 2012 when MES is available
Clustered systems GA support is for up to 480 SFF disk drives or 240 LFF disk drives or a mix thereof
Up to 480TB raw capacity in one 42U rack Enables Storwize V7000 to compete effectively against larger EMC, NetApp, HP systems
Both node canisters in a given control enclosure are part of the same I/O Group
Cannot create an I/O Group with one node from each of 2 different control enclosures Adding one node in control enclosure to an I/O Group will automatically add the other Storwize V7000 clustered system does not support split I/O group configurations - (also known as stretch cluster)
8 Copyright IBM Corporation, 2011
Only 1 control enclosure can appear on a given SAS chain Only 1 node canister can appear on a single strand of SAS chain
Key to realize is there is no access by one control enclosure (I/O Group) to the SAS attached expansion enclosures of another control enclosure (I/O Group) other then via the SAN
If pool owns exact same number of MDisks from each I/O group then volumes will be owned by IOG-0
Expansion enclosures only communicate with their owning control enclosure meaning host I/Os coming into IOG-0 but data is on IOG-1 means I/O is forwarded to IOG-0 over FC
Similar process to SVC accessing external storage systems
Does not go thru cache on owning I/O group but directly to MDisk
Uses very lowest layer of I/O stack to minimize any additional latency
10
C ontrol Enclosure #1
A ll cabling shown is logical
m disk
m disk
E xpansion Enclosure
E xpansion Enclosure
I /O Group #1
I /O Group #2
Expansion enclosures are connected through one control enclosure and can be part of only one I/O group All MDisks are part of only one I/O group
11
Storage pools can contain MDisks from more than one I/O group Inter-control enclosure communications happens over the SAN A volume is serviced by only one I/O group
Copyright IBM Corporation, 2011
S torwize V7000 2 One 4- I/O Group Group C S lustered ingle Frame System
An I/O Group is a control enclosure and its associated SAS attached expansion enclosures
E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures C ontrol Enclosure C ontrol Enclosure C ontrol Enclosure C E ontrol xpansion Enclosure Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures
E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures C ontrol Enclosure C ontrol Enclosure C ontrol Enclosure C E ontrol xpansion Enclosure Enclosures E xpansion Enclosures E xpansion Enclosures E xpansion Enclosures
NOTE: Storwize V7000 Clustered System with greater then 2 I/O Groups/Frames requires SCORE/RPQ approval
12 Copyright IBM Corporation, 2011
H ost
A High Availability clustered system similar to a SVC Split I/O Group configuration is not possible since we can not split a control enclosure in half and install at two different sites
One I/O Group will be at each site unlike SVC where each node in an I/O Group can be installed in a different site
E xpansion Enclosures
E xpansion Enclosures
E xpansion Enclosures
So if you lose a site you lose access to all volumes owned by that I/O Group There is no automatic failover of a volume from one I/O Group to another
C ontrol Enclosure
E xpansion Enclosures
C ontrol Enclosure
E xpansion Enclosures
Volume mirroring does allow for a single host volume to have pointers to two sets of data which can be on different I/O Groups in a clustered system, but again if you lose a site you lose the entire I/O Group so any volumes owned by that I/O Group will be offline
You can migrate the volume ownership from the failed IOG to the other IOG but data may be lost as unwritten data still in cache on offline IOG is discarded in process of migration or could have been lost if IOG failed hard without saving cached data
Copyright IBM Corporation, 2011
P roduction Site A
13
P roduction Site B
Can start very small and grow very large storage system with single management interface
Helps to compete with larger midrange systems from other vendors
Can virtualize external storage too providing same virtualization features across entire Clustered System
Just like SVC cluster so desirable for same reasons large SVC clusters are
************************************************************************** However, nothing wrong with going with 1-4 separate systems versus a Clustered System if customer prefers
Systemmanagementisntthathardanyway If customer will lose sleep over possible complete failure of a control enclosure, no matter how unlikely that is, then go with separate systems
14 Copyright IBM Corporation, 2011
Q&A
15
Q&A
16
Agenda
Terminology SVC Split I/O Group Review Long distance: refresh WDM devices Buffer-to-Buffer credits SVC Quorum disk Split I/O Group without ISLs between SVC nodes Supported configurations SAN configuration for long distance Split I/O Group with ISLs between SVC nodes Supported configurations SAN configuration for long distance
1 8
Terminology
SVC Split I/O Group = SVC Stretched Cluster = SVC Split Cluster
Two independent SVC nodes in two independent sites + one independent site for Quorum Acts just like a single I/O Group with distributed high availability
Site 1 Site 2
I/O Group 1
I/O Group 1
Distributed I/O groups NOT a HA Configuration and not recommended, if one site failed: Manual volume move required Some data still in cache of offline I/O Group Site 1
I/O Group 1 I/O Group 1 I/O Group 2
Site 2
I/O Group 2
Storwize V7000 Split I/O Group not an option: Single enclosure includes both nodes Physical distribution across two sites not possible Site 1 Site 2
1 9
ISL 1
TotalStorage TotalStorage
Storage Engine 336
Node 1
Node 2
ISL 2
Volume
Mirroring
SVC Quorum 3 2 1
ISL 1
Lose of Failure Domain 2: Active quorum lost Half of nodes lost Loss of cluster majority
TotalStorage
Storage Engine 336
TotalStorage
Node 1
Node 2
ISL 2
No Active Quorum
Node 1 can not utilize quorum candidate to recover and survive Node 1 shuts down and cluster stopped May not be recoverable and may require cluster rebuild and data restore from backups
Volume
Mirroring
Active Quorum
2 2
Active Quorum
2008 IBM Corporation
Failure Domain 3
Automated failover with SVC handling The loss of:
ISL 1
TotalStorage TotalStorage
Storage Engine 336
Node 1
Node 2
SVC Quorum 3
SVC Quorum 1
Active Quorum
2 3
Active Quorum
Active Quorum
2008 IBM Corporation
Cluster Status
Operational, optimal Operational, Write cache disabled Operational, Write cache disabled Operational, Write Cache enabled, but different active Quorum disk Whichever node accesses the active quorum disk first survives and the partner node goes offline Stopped Stopped
2011 IBM Corporation
Operational
Failed same time with Site 2 Failed same time with Site 1
2 5
Enhancements designed to help us compete more effectively with EMC VPLEX at longer distances Note that this information is all very new even to ATS and some requirements could change prior to GA Highly recommend engaging ATS for solution design review
w3.ibm.com/support/techxpress
Storwize V7000 does not provide any sort of split I/O group, split cluster, stretch cluster HA configurations
A clustered Storwize V7000 provides the ability to grow system capacity and scale performance within a localized single system image
27 Copyright IBM Corporation, 2011
S AN
SVC + UPS
S AN
T wo ports per SVC node attached to local SANs T wo ports per SVC node attached to remote SANs via DWDM H osts and storage attached to SANs via ISLs sufficient for workload 3 rd site quorum (not shown) attached to SANs
28 Copyright IBM Corporation, 2011
P ublic S AN
P rivate S AN
P ublic S AN
SVC + UPS
SVC + UPS
P ublic S AN
P rivate S AN
P rivate S AN
P ublic S AN
T wo ports per SVC node attached to public SANs T wo ports per SVC node attached to private SANs H osts and storage attached to public SANs 3 rd site quorum (not shown) attached to public SANs
29 Copyright IBM Corporation, 2011
P ublic V SAN
P rivate V SAN
P rivate V SAN
P ublic V SAN
SVC + UPS
SVC + UPS
P ublic V SAN
P rivate V SAN
P rivate V SAN
P ublic V SAN
N ote ISLs/Trunks for private VSANs are dedicated r ather than being shared to guarantee dedicated bandwidth is available for node to node traffic
30 Copyright IBM Corporation, 2011
P ublic S AN
P rivate S AN
P rivate S AN
P ublic S AN
SVC + UPS
SVC + UPS
P ublic S AN
P rivate S AN
P rivate S AN
P ublic S AN
N ote ISLs/Trunks for private SANs are dedicated r ather than being shared to guarantee dedicated bandwidth is available for node to node traffic
31 Copyright IBM Corporation, 2011
The new Split I/O Group configurations will support distances of up to 300km (same recommendation as for Metro Mirror) However for the typical deployment of a split I/O group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used The following charts explain why
32
Metro Mirror
The same code is used for all inter-node communication Global Mirror, Metro Mirror, Cache Mirroring, Clustering SVCs proprietary SCSI protocol only has 1 round trip
In practice Applications are not designed to support a Write I/O latency of 80ms
Hence Metro Mirror is deployed for shorter distances (up to 300km) and Global Mirror is used for longer distances
2009 IBM Corporation
33
Server Cluster 1
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host
Server Cluster 2
1 round trip
4) Metro Mirror Data transfer to remote site 5) Acknowledgment
SVC Cluster 1
7a) Write request from SVC 8a) Xfer ready to SVC 9a) Data transfer from SVC 10a) Write completed to SVC
SVC Cluster 2
7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC
Data center 1
34
Data center 2
2009 IBM Corporation
Split I/O Group splits the nodes in an I/O group across two sites
SVC will tolerate a round trip delay of up to 80ms Cache Mirroring traffic rather than Metro Mirror traffic is sent across the inter-site link
Data is written by the 'preferred' node to both the local and remote storage The SCSI Write protocol results in 2 round trips This latency is generally hidden from the Application by the write cache
2009 IBM Corporation
35
Server Cluster 1
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host
Server Cluster 2
1 round trip
4) Cache Mirror Data transfer to remote site 5) Acknowledgment
Node 1
Node 2
7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC
2 round trips but SVC write cache hides this latency from the host
Data center 1
36
Data center 2 Steps 1 to 6 affect application latency Steps 7 to 10 should not affect the application
Split I/O Group is also often used to move workload between servers at different sites VMotion or equivalent can be used to move Applications between servers Applications no longer necessarily issue I/O requests to the local SVC nodes SCSI Write commands from hosts to remote SVC nodes results in an additional 2 round trips worth of latency that is visible to the Application
37
Server Cluster 1
Server Cluster 2
2 round trips
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 6) Write completed to host
1 round trip
4) Cache Mirror Data transfer to remote site 5) Acknowledgment
Node 1
Node 2
7b) Write request from SVC 8b) Xfer ready to SVC 9b) Data transfer from SVC 10b) Write completed to SVC
2 round trips but SVC write cache hides this latency from the host
Data center 1
38
Data center 2 Steps 1 to 6 affect application latency Steps 7 to 10 should not affect the application
Some switches and distance extenders use extra buffers and proprietary protocols to eliminate one of the round trips worth of latency for SCSI Write commands
These devices are already supported for use with SVC No benefit or impact inter-node communication Does benefit Host to remote SVC I/Os Does benefit SVC to remote Storage Controller I/Os
39
Server Cluster 1
Server Cluster 2
1) Write request from host 2) Xfer ready to host 3) Data transfer from host 12) Write completed to host
1 round trip
11) Write completion to remote site
1 round trip
8) Cache Mirror Data transfer to remote site 9) Acknowledgment
Node 1
Node 2
17) Write request to storage 18) Xfer ready from storage 19) Data transfer to storage 20) Write completed from storage
Data center 1
40
Data center 2 Steps 1 to 12 affect application latency Steps 13 to 22 should not affect the application
4 1
Active WDM
Power required Can use CWDM or DWDM technology Change incoming/outgoing wavelengths Adds negligible latency because of signal change Consolidate multiple wavelengths in one cable No dedicated link required Customers can rent some frequencies High equipment cost Longer distances supported
4 2
4 3
Active
8G 10G 2G N x 4G 8G 100G
TXP TXP
TDM TDM
TXP TXP
TDM TDM
TXP
TXP
FSP 3000 Higher capacity (more channels per fiber) Higher aggregate bandwidth (up to 100G per wavelength) Higher distance (up to 200 km without mid-span amplifier)
FSP 3000
More secure (automated fail over, NMS, optical monitoring tools, embedded encryption)
Buffer-to-Buffer (B2B) credits Are used as a flow control method by Fibre Channel technology and represent the number of frames a port can store Provides best performance Light must cover the distance 2 times Submit data from Node 1 to Node 2 Submit acknowledge from Node 2 back to Node 1 B2B Calculation depends on link speed and distance Number of multiple frames in flight increase equivalent to the link speed
4 5
SVC 5.1 and later: SVC is able to handle the Quorum disk management in a very flexible way, but in a Split I/O Group configuration a well defined setup is required. ->DisablethedynamicquorumfeatureusingtheoverrideflagforV6.2andlater
svctask chquorum -MDisk <mdisk_id or name> -override yes This flag is currently not configurable in the GUI
SplitBrainsituation: SVC uses the quorum disk to decide which SVC node(s) should survive No access to the active Quorum disk: In a standard situation (no split brain): SVC will select one of the other Quorum candidates as active Quorum In a split brain situation: SVC may take mirrored Volumes offline
4 6
4 7
SVC 6.3: Similar to the support statement in SVC 6.2 Additional: support for active WDM devices Quorum disk requirement similar to Remote Copy (MM/GM) requirments: Max. 80 ms Round Trip delay time, 40 ms each direction FCIP connectivity supported No support for iSCSI storage system
Minimum distance
>= 0 km > 10 km > 20km
Maximum distance
= 10 km = 20 km = 40km
4 8
Site 1
Site 2
Server 2
Switch 1 Switch 2
Switch 3 Switch 4
SVC node1
SVC node2
Storage
Storage
Site 3
Active Quorum
4 9
Site 1
Server 1 ISL (Server)
Site 2
Server 2 ISL (Server)
Switch 1 Switch 2
Switch 3 Switch 4
ISL (Server)
ISL (Server)
SVC node 2
Active/Passive WDM devices can be used to reduce number of required FC links between both sites Distance extension to max. 40km with WDM devices
SVC node 1
Storage 3
Storage 2
Switch 5
Switch 6
Minimum distance
Maximum distance
Site 3
Act. Quorum
= 10 km = 20 km = 40km
5 0
Site 1
Server 1 ISL (Server)
Site 2
Server 2 ISL (Server)
Switch 1 Switch 2
Switch 3 Switch 4
Active/Passive WDM devices can be used to reduce number of required FC links between both sites Distance extension to max. 40km with WDM devices
ISL (Server)
SVC node 1
ISL (Server)
SVC node 2
Quorum devices with active / passive controller without I/O rerouting (for example DS3/4/5K) must be connected to both controllers from each Site
Storage 3 Storage 2
Minimum distance
Maximum distance
Switch 5
Switch 6
= 10 km = 20 km = 40km
8 Gbps
Ctl. A Ctl. B
Site 3
4 Gbps 2 Gbps
DS4700
Act. Quorum
5 1
SVC Buffer to Buffer credits 2145CF8 / CG8 have 41 B2B credits Enough for 10km at 8Gb/sec with 2 KB payload All earlier models: Use 1/2/4Gb/sec fibre channel adapters Have 8 B2B credits which is enough for 4km at 4Gb/sec Recommendation 1: Use CF8 / CG8 nodes for more than 4km distance for best performance Recommendation 2: SAN switches do not auto-negotiate B2B credits and 8 B2B credits is the default setting so change the B2B credits in the switch to 41 as well
Link speed
Site 1
Site 2
Server 4
Split I/O Group with ISLs between SVC nodes Support with SVC 6.3: Supports Metro Mirror distances between nodes Third Site required for Quorum disk ISLs with max. 1 hop can be used for: Quorum traffic SVC node to node communication Requires dedicated private SAN only for inter-node traffic (which can be a Brocade virtual fabric, or a Cisco VSAN) Requires one ISL for each I/O Group between the private SANs at each site
WDM
ISL Publ.SAN1 Priv.SAN1 ISL
WDM
Publ.SAN1 Priv.SAN1
WDM
Publ.SAN2 Priv.SAN2 ISL ISL
WDM
Publ.SAN2 Priv.SAN2
SVC-01
SVC-02 ISL
ISL
Storage
Quorum candidate
Storage
Site 3
Quorum candidate
Maximum distances: 100km for live data mobility (150km with distance extenders) 300km for fail-over / recovery scenarios SVC supports up to 80ms latency, far greater than most application workloads would tolerate The two sites can be connected using active or passive technologies such as CWDM / DWDM if desired. Supported infrastructure: WDM equipment similar to Metro Mirror Link requirement similar to Metro Mirror
Switch
Switch
Ctl. A
Ctl. B
Act. Quorum
5 3
Site 1
Site 2
Server 4
WDM
ISL ISL
WDM
Publ.SAN1 Priv.SAN1
WDM
ISL ISL
WDM
Publ.SAN2 Priv.SAN2
SVC-01
SVC-02 ISL
Storage
Quorum candidate
Storage
Site 3
Quorum candidate
Distances Support of up to 300km (same recommendation as for Metro Mirror) Typical deployment of Split I/O Group only 1/2 or 1/3rd of this distance is recommended because there will be 2 or 3 times as much latency depending on what distance extension technology is used
5 4
Switch
Switch
Ctl. A
Ctl. B
Act. Quorum
5 5
VMware ESX with VMotion or AIX with live partition mobility Distance between sites: 12km -> SVC 6.3: Configuration with or without ISLs are supported -> SVC 6.2: Only configuration without ISLs is supported
Site 1
Site 2
Server 4
WDM
ISL Publ.SAN1 Priv.SAN1 ISL
WDM
Publ.SAN1 Priv.SAN1
WDM
Publ.SAN2 Priv.SAN2 ISL ISL
WDM
Publ.SAN2 Priv.SAN2
Example 2)
SVC-01 SVC-02 ISL ISL
Configuration with live data mobility : VMware ESX with VMotion or AIX with live partition mobility Distance between sites: 70km -> Only SVC 6.3 Split I/O Group with ISLs is supported.
Storage
Quorum candidate
Storage
Site 3
Quorum candidate
Example 3)
Configuration without live data mobility : VMware ESX with SRM, AIX HACMP, or MS Cluster Distance between sites: 180km -> Only SVC 6.3 Split I/O Group with ISLs is supported or -> Metro Mirror configuration Because of long distances: only in active / passive configuration
5 6
Switch
Switch
Ctl. A
Ctl. B
Act. Quorum
5 7
Summary
SVC Split I/O Group: Is a very powerful solution for automatic and fast handling of storage failures Transparent for servers Perfect fit in a vitualized environment (like VMware VMotion, AIX Live Partition Mobility) Transparent for all OS based clusters Distances up to 300 km (SVC 6.3) are supported Two possible scenarios: Without ISLs between SVC nodes (classic SVC Split I/O Group) Up to 40 km distance with support for active (SVC 6.3) and passive (SVC 6.2) WDM With ISLs between SVC nodes: Up to 100 km distance for live data mobility (150 km with distance extenders) Up to 300 km for fail-over / recovery scenarios Long distance performance impact can be optimized by: Load distribution across both sites Appropriate SAN Buffer to Buffer credits
5 8
Q&A
Q&A
Q&A
Q&A
Q&A