Académique Documents
Professionnel Documents
Culture Documents
IBM Virtualization Engine TS7700 Series Best Practices Understanding, Monitoring and Tuning the TS7700 Performance Version 1.5
Jim Fisher fisherja@us.ibm.com IBM Advanced Technical Support Americas Carl Bauske cabauske@us.ibm.com IBM Advanced Technical Support - Americas
Introduction
The IBM Virtualization Engine TS7700 Series is the latest in the line of tape virtualization products that has revolutionized the way mainframe customers utilize their tape resources. Tape Virtualization subsystems have become an essential part of most mainframe customers operations. Massive amounts of key customer data are placed under the control of the subsystem. The IBM TS7700 Virtualization Engine, with its virtual tape drives, disk cache and integrated hierarchical storage management, is designed to perform its tasks with no customer involvement once it has been configured. The TS7700 has a set of parameters used to regulate the performance of the subsystem, allowing the configuration and parameters to be altered by each customer thereby effectively customizing the performance of each TS7700 subsystem. Performance varies between standalone, 2, 3, 4, 5, and 6-cluster configurations. Make sure that this is factored into any planned change of your configuration. This document will help you understand the inner workings of the TS7700 so that you can make educated adjustments to the subsystem to achieve peak performance. This document starts by describing the flow of data through the subsystem. Next, the various throttles used to regulate the subsystem are described. Performance monitoring is then discussed finishing with how and when to tune the TS7700.
Version 1.5 August, 2012 Fixed typo in section 2.3. Changed Management Class to Storage Class relative to PG0 and PG1 Major updates to reflect Power7 engine (VEB and VEA) and Release R2.0 and beyond Updated to include Synchronous Copy mode
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read Compressed Host Write Cache Host compressed Read/Write Pre-Migrate to tape Recall from tape Copies to/from other clusters Remote write and read
Disk Cache
Copy to other clusters Copy from other clusters Remote mount write and read
Grid
Copy Export
Pre-Migrate
Recall
Recall (TS7740 only) Recalled data is when a logical volume is transferred from a physical tape volume to the cache to satisfy a mount. Reclaim (TS7740 only) Reclaim involves transferring logical volumes from one physical tape to another. This data passes through the CPUs memory however, it does not pass through the disk cache. Reclaim is controlled by the Reclaim Threshold, the Inhibit Reclaim Schedule, the maximum number of reclaim tasks (set using host console request RCLMMAX), and the number of available back-end drives. Management Interface The Management Interface (MI) is a task that consumes CPU power. The MI is used to configure, operate and monitor the TS7700.
There isnt a similar copy priority scheme for copies to other clusters that originate in a TS7720. Copies from a TS7720 do not use a priority scheme based on the Storage Class construct.
Understanding, Monitoring and Tuning the TS7700 Performance August 2012 2.4 Data Movement through Cache
This chapter describes the movement of data through the TS7700 cache in various configurations. This is to help you understand the various pieces of data movement that have to share the resources of the TS7700. The discussion does not include TS7740 reclaim activity since the data transferred from one tape drive to another does not pass through the cache.
Disk Cache
For the TS7740, we add back-end tape drives for recalls and pre-migrates. If a read is requested and the logical volume does not exist in the cache, a stacked physical tape is mounted and the logical volume is read into cache. The host then reads the logical volume from the TS7740 cache. Host data will be written from cache to the physical stacked volumes in a process called premigrate.
Disk Cache
Pre-Migrate
Recall
Disk Cache
Grid
For the TS7740, we add back-end tape drives for recalls and pre-migrates. A write with copy or no copy to another cluster includes the pre-migrate process. A read with local cache miss will result in one of the following: o A remote mount without recall o A recall into local cache from local stacked volume o A remote mount requiring recall from remote stacked volume Host data will be written from cache to the physical stacked volumes in a process called premigrate. This includes data written as a result of a local mount, volumes copied from other clusters, and a remote mount to this cluster for write (not shown).
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read
Disk Cache
Grid
Pre-Migrate
Recall
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read
Disk Cache
Copy to cluster 1
Grid
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read
Disk Cache
Copy to cluster 1
Grid
Pre-Migrate
Recall
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read
Disk Cache
Copy to cluster 1 Copy from cluster 1 Copy to cluster 2 Remote read from cluster 1
Grid
For the TS7740, we add back-end tape drives for recalls and pre-migrates. A write with no copy and a write with copy also includes the pre-migrate process. A read with local cache miss will result in one of the following: o A remote mount without recall o A recall into local cache from local stacked volume o A remote mount requiring recall from remote stacked volume Host data will be written from cache to the physical stacked volumes in a process called premigrate. This includes data written as a result of a local mount for write, volumes copied from other clusters, and a mount for write from the HA cluster (not shown).
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read
Disk Cache
Copy to cluster 1 Copy from cluster 1 Copy to cluster 2 Remote read from cluster 1
Grid
Pre-Migrate
Recall
HA DR Site 2
TS7700
TS7700
WAN
3
TS7700
Copy Mode NDND
TS7700
With Dynamic Allocation Assist the host will allocate a virtual device for a private mount on the best cluster. The best cluster is typically the cluster that contains the logical volume in its cache.
TS7720
LAN/WAN
TS7740
TS7720
TS7720
Figure 11 - Hybrid Grid - Three TS7720 Production Clusters, One TS7740 DR Cluster
DR Site
Within Family
WAN
Within Family
Family A
Family B
Cooperative replication includes another layer of consistency. A family is considered consistent when just one member of the family has a copy of a volume. Since only one copy is required to be transferred to a family, the family is consistent after the one copy is complete. Since a family member will prefer to get its copy from another family member instead of getting the volume across the long grid link, the copy time is typically much shorter for the family member. Since each family member is pulling a copy of a different volume, this will make a consistent copy of all volumes to the family quicker. With cooperative replication a family will prefer retrieving a new volume that the family doesnt have a copy of yet, over copying a volume within a family. When there are fewer than 20 new copies to be made from other families, the family clusters will copy amongst themselves. This means second copies of volumes within a family are deferred in preference to new volume copies into the family. When a copy within a family has been queued for 12 hours or more, it is given equal priority with copies from other families. This prevents family copies from stagnating in the copy queue. Without families, a source cluster attempts to keep the volume in its cache until all clusters needing a copy have gotten their copy. With families, a clusters responsibility to keep the volume in cache is released once all families needing a copy have it. This allows PG0 volumes in the source cluster to be removed from cache sooner. Refer to the IBM Virtualization Engine TS7700 Series Best Practices - Hybrid Grid white paper on Techdocs for more details concerning Cluster Families.
HA DR Site 2
TS7700
TS7700
WAN
3
TS7700
Copy Mode NDND
TS7700
As of this writing, JES3 does not support Device Allocation Assist, so 50% of the time the host will allocate to the cluster that doesnt have a copy in its cache. Without Retain Copy Mode, 3 or 4 copies of a volume will exist in the grid after the dismount instead of the desired two copies. In the case where host allocation picks the cluster that doesnt have the volume in cache, one or two additional copies are created on clusters 1 and 3 since the CCPs indicate the copies should be made to clusters 1 and 3. For a read operation, four copies remain. For a write append, three copies are created. This is illustrated below.
HA DR Site 2
TS7700
WAN
3
TS7700
Copy Mode NDND
TS7700
Figure 14 - Four-Cluster Grid without Device Allocation Assist, Retain Copy Mode Disabled
With the Retain Copy Mode option set, the original CCPs of a volume are honored instead of applying the CCPs of the mounting cluster. A mount of a volume to the cluster that does not have a copy in its cache will result in a cross cluster (remote) mount instead. The cross cluster mount uses the cache of the cluster that contains the volume. The CCPs of the original mount are used. In this case, the result is that cluster 0 and 2 will have the copies and clusters 1 and 3 will not. This is illustrated below.
HA DR Site 2
TS7700
WAN
3
TS7700
Copy Mode NDND
TS7700
Figure 15 - Four Cluster Grid without Device Allocation Assist , Retain Copy Mode Enabled
Another example of the need for Retain Copy Mode is when one of the production clusters is not available. All allocations are made to the remaining production cluster. When the volume only exists in Copyright IBM Corporation, 2012 Page 24 of 55
HA DR Site 2
TS7700
WAN
3
TS7700
Copy Mode NDND
TS7700
Figure 16 - Four-Cluster Grid - One Production Cluster Down, Retain Copy Mode Disabled
The example below is with the Retain Copy Mode enabled and one of the production clusters down. In the scenario where the cluster containing the volume to be mounted is down, the host will allocate to a device on the other cluster, in this case, cluster 1. A cross cluster mount using Cluster 2s cache occurs. The original two copies remain. If the volume is appended to it is changed on cluster 2 only. Cluster 0 will get a copy of the altered volume when it rejoins the grid.
HA DR Site 2
TS7700
WAN
3
TS7700
Copy Mode NDND
TS7700
Figure 17 - Four-Cluster Grid - One Production Cluster Down, Retain Copy Mode Enabled
Refer to the IBM Virtualization Engine TS7700 Series Best Practices - Hybrid Grid white paper on Techdocs for more details concerning Retain Copy Mode.
This section examines the variety of throttles used by the TS7700 to control the flow of data through the subsystem. The discussion describes the throttling types and how they are triggered. Throttling, in general, is used to encourage or enforce the priorities of the various task and functions running within the TS7700. The subsystem has a limited set of resources, (CPU, cache bandwidth, cache size, channel bandwidth, grid network bandwidth, physical tape drives, and so forth) that are shared by all the tasks moving data. The TS7700 uses a variety of explicit throttling methods to give the higher priority tasks more of the resources. The resources themselves will implicitly throttle items such as host bandwidth when the resource is used to 100%. The following is a list of the normally running tasks that move data. Immediate Copies Recalls Copy Export Host I/O This includes Sync Mode Copy writes Reclaims Pre-migration Deferred Copies
There are special case tasks that can occur, based on the state of the subsystem, that will consume resources be granted a higher priority. Here are some examples: Panic Reclaim The TS7740 detects the number of empty physical volumes has dropped below the minimum value and reclaims need to be done immediately to increase the count. Cache Fills with Copy Data To protect from having un-copied volumes removed from cache the TS7740 throttles data coming into the cache. Cache Overfills If no more data can be placed into the cache before data is removed then other tasks trying to add to the cache are heavily throttled.
Disk Cache
Grid
Pre-Migrate
Recall
Disk Cache
Grid
Pre-Migrate
Recall
Host Read
Disk Cache
Grid
Pre-Migrate
Recall
Understanding, Monitoring and Tuning the TS7700 Performance August 2012 3.2 What Causes Host Write and Copy Throttle to be turned on?
Full Cache - Cache is full of data that needs to be copied to another cluster. o Amount of data to be copied to another cluster is > 95% of cache size AND the TS7700 has been up more than 24 hours. o This is reported as Write Throttle and Copy Throttle in VEHSTATS Immediate Copy - Immediate copies to other clusters, where this cluster is the source, are taking too long or are predicted to take too long. o The TS7700 evaluates the need for this throttle every two minutes. o The depth of the immediate copy queue is examined as well as the amount of time copies have been in the queue to determine if the throttle should be applied. Looking at age of oldest immediate copy in the queue: If oldest is 10-30 minutes old, throttle is set between 0.00166 seconds to 2 seconds. Linear ramp from 10 to 30 minutes. The maximum throttle (2 seconds) is applied immediately if an immediate copy has been in the queue for 30 minutes or longer. Looking at quantity of data, calculate how long transfer will take. If >35 minutes set throttle to max (2 seconds). If 5 to 35 minutes, throttle is set to .001111 seconds to 2 seconds. Linear ramp from 5 to 35 minutes.
o This is reported as Write Throttle in VEHSTATS. o Note: The time required for a 4000 MB immediate copy is 5 times longer than an 800 MB immediate copy. o Host Write Throttle due to Immediate Copies taking too long can be turned off using the Host Console Request. Refer to Section 5.3.3 - Disabling Host Write Throttle due to Immediate Copy on page 47 for more details. Pre-Migrate - Amount of data to be pre-migrated is above threshold (default 2000 GB) o This is reported as Write Throttle and Copy Throttle in VEHSTATS o These throttle values will be equal if Pre-Migrate is the sole reason for throttling. Free Space - Invoked when cache is near full of any data. o Used to make sure there is enough cache to handle the currently mounted volumes. o This is reported as Write Throttle in VEHSTATS
Understanding, Monitoring and Tuning the TS7700 Performance August 2012 3.3 What Causes Deferred Copy Throttle to be turned on?
CPU usage and the compressed host throughput are evaluated every 30 seconds. DCT is invoked when CPU Usage is > 85% OR compressed host throughput is > 100 MB/sec. The 100 MB/sec threshold is the default and can be changed by the customer via the Host Console Request. DCT remains in effect for the subsequent 30 second interval, after which it is reevaluated. Default DCT value is 125ms. The default value of 125 ms severely slows deferred copy activity (125 ms is added between each 32K block of data sent for a volume). The DCT can be set using Host Console Request. The setting of the DCT is discussed in detail in the IBM Virtualization Engine TS7700 Series z/OS Host Command Line Request User's Guide which is available on techdocs. Use the SETTING, THROTTLE, DCOPYT keywords. The DCT Threshold can be set using Host Console Request. The setting of the DCT is discussed in detail in the IBM Virtualization Engine TS7700 Series z/OS Host Command Line Request User's Guide which is available on techdocs. Use the SETTING, THROTTLE, DCTAVGTD keywords.
Understanding, Monitoring and Tuning the TS7700 Performance August 2012 3.4 How are Pre-Migrate Tasks Managed?
Host Read HBA Data is compressed Or decompressed Host Write Compressed Host Read Compressed Host Write
Disk Cache
Grid
Pre-Migrate
Recall
Try to avoid building up too much non-premigrated data in cache Increase number of pre-migrate tasks based on many criteria including: Host compressed write rate CPU activity How much data needs to be pre-migrated per pool How much data needs to be pre-migrated across all pools
The TS7740 uses a variety of criteria to manage the number of pre-migration tasks. The TS7700 looks at these criteria every 5 seconds to determine if one more pre-migration task should be added. Adding a pre-migration task is based on these and other factors: Host compressed write rate CPU activity How much data needs to be pre-migrated per pool How much data needs to be pre-migrated in total
A pre-migration task will not preempt a recall, reclaim or copy export task. There are four different algorithms working in concert to determine if another pre-migration task should be started. General details are described below. The actual algorithm has several nuances not described here. Idle Pre-Migration o If the CPU usage is idle more than 5% then a pre-migrate task is started, if appropriate. o The number of tasks is limited to six or the maximum pre-migration drives defined by pool properties, whichever is less. Fast Host Write Pre-Migration Mode o Compressed host write is > 30 MB/sec AND CPU idle <1% Copyright IBM Corporation, 2012 Page 33 of 55