Vous êtes sur la page 1sur 4

ETA emc308914: VNX: Excessive trespassing causes I/O

performance issues, which may adversely affect Virtual


Provisioning.
Article Number:000003274 Version:15
Key Information
Audience: Level 30 = Customers
Channels: Customer , Internal App
Originally Created By: Gearoid
Griffin
Summary:

Original Create Date: Tue Nov 13


16:00:52 GMT 2012
First Published: Fri May 31 04:07:34
GMT 2013
Last Modified: Fri Sep 20 19:42:51
GMT 2013

Article Type: ETA


Validation Status: Final Approved
Last Published: Fri Sep 20 19:42:51
GMT 2013

Article Content
Impact: ETA emc308914: VNX: Excessive trespassing causes I/O performance issues, which may adversely affect Virtual
Provisioning.
Issue: This issue can be indicated by a LUN or LUNs being trespassed excessively by the Middle Redirector Driver.
The Virtual Provisioning driver (MLU) is calculating the number of I/Os serviced by the local Storage Processor
(SP) compared to the number of I/Os serviced by the peer SP which triggers a trespass to be issued. The 712d8107
event is a symptom of this issue, not the cause. See 86538 for an explanation of the 712d8107 event. If you are
running VNX OE 05.32.000.5.006, 05.32.000.5.008,05.32.000.5.011 and you see excessive 712d8107 and
711b0001 messages in the SP Event Logs, EMC recommend to upgrade to VNX Operating Environment (OE)
05.32.000.5.015 or later, which contains a fix for this issue.
Examples of the Virtual Provisioning issues encountered due to this issue can include:
Performance issues
Pool Luns being marked for recovery
SP Bugchecks
Virtual Provisioning driver becoming degraded

Environment: Product: VNX Series


EMC SW: VNX Operating Environment (OE) 05.32.000.5.006
EMC SW: VNX Operating Environment (OE) 05.32.000.5.008
EMC SW: VNX Operating Environment (OE) 05.32.000.5.011
This statement does not apply: EMC SW: VNX Operating Environment (OE) 05.31

This statement does not apply: EMC SW: VNX Operating Environment (OE)
05.32.000.5.015
Resolution: Workaround:
To temporarily alleviate this issue, disable Autotiering on a per pool basis and reboot the SP's (one at a time)
and schedule an upgrade to VNX OE 05.32.000.5.015. If you are affected by one of the symptoms listed
above please engage EMC Customer Support to resolve the issue.
Autotiering can be switched off by using:
Naviseccli autotiering -relocation -stop -all
Naviseccli autotiering -schedule -disable
Autotiering can be enabled after upgrading to VNX OE 05.32.000.5.015
Autotiering can be reenabled by using:
Naviseccli autotiering -relocation -start -all
Naviseccli autotiering -schedule -enable
Permanent Fix:
This issue is addressed in VNX OE 05.32.000.5.015 (released December 2012).
Notes
Symptom: A sample of what may be seen in the Triiage_SPlogs.txt:
(Employees
A 12/07/12 08:36:05 MLU
712d014e Trespass Execute received on LUN 5 (object ID A00000009, WWN
and
Partners): 6006016010503000:f4e80d127384e111) in pool 0 (object ID 300000003).
A 12/07/12 08:36:05 MidRedirect 711b0001 DynStrings:\Device\CLARiiON\mlu\000000095
A 12/07/12 08:36:05 MLU
712d0003 Operation Promote Replica started by 900000009 on 200000009.
A 12/07/12 08:36:05 MLU
712d0004 Operation Promote Replica completed on 900000009.
A 12/07/12 08:36:05 MLU
712d0004 Operation Mount FS completed on 200000009.
A 12/07/12 08:36:05 MLU
712d0003 Operation Mount FS started by 200000009 on 300000003.
B 12/07/12 08:36:05 MLU
712d014d Trespass Ownership Loss received on LUN 5 (object ID A00000009,
WWN 6006016010503000:f4e80d127384e111) in pool 0 (object ID 300000003).
B 12/07/12 08:36:05 MLU
712d0d01 LUN 6006016010503000:f4e80d127384e111 is ready to service IO.
LU OID A00000009 Pool OID 300000003. [ALU 5]
B 12/07/12 08:36:05 MLU
712d0004 Operation Unmount FS completed on 200000009.
B 12/07/12 08:36:05 MLU
712d0003 Operation Unmount FS started by 200000009 on 300000003.
A 12/07/12 08:37:52 MLU
712d014d Trespass Ownership Loss received on LUN 5 (object ID
A00000009, WWN 6006016010503000:f4e80d127384e111) in pool 0 (object ID 300000003).
Symptom: Continuous trespasses for unknown reason. Trespassing itself is not the reason, but watching for
excessive assignments in the Triiage_analysis file will indicate what LUN is affected.
Symptom: Once a threshold of 2,147,483,649 of I/Os of any kind to a LUN is reached, each subsequent
64,000 I/O requests will result in a Middle Redirector trespass of that LUN.
Cause: The trespass storm is caused by an issue in MLU by using a "LONG" data type to store the result of
subtraction of two values of data type "ULONGLONG." This result is being interpreted as being positive even
when the actual result is negative. After every 64,000 of I/O to a file system (LUN or AdvanceSnap), MLU

checks if it would better for the LUN to be owned by the peer. This is done by querying the middle redirector
for volume statistics. The volume statistics provide a count of how many total I/Os were served locally and
how many were served on the peer. MLU uses this information to calculate:
VolumeStats.totalPeerIOServiced - VolumeStats.totalLocalIOServicedLocally;
If the result is negative, it implies that more I/Os are being served locally compared to the number being
served on the peer (and therefore being redirected). This would be the normal case of a single LUN file
system (LUN without AdvanceSnaps). The bug is that the result of this subtraction is stored in a LONG and
therefore if the totalPeerIOServiced were 0 and totalLocalIoServicedLocallywere greater than a 32- bit
number (0x80000001) (decimal 2147483649), the 32-bit result from the subtraction would positive and
therefore the MLU would incorrectly assume that more I/Os were being redirected than served locally and the
LUN would be better serviced by the peer SP. This will result in MLU requesting a trespass of the LUN.
Since the statistics do not get reset on a trespass, a similar problem happens on the peer when MLU checks
after 64,000 I/Os. The trespass storm continues until the statistics are reset on a reboot.

Note: What to review if you are encountering this issue:


1. Review the triiage_analysis file to determine if any LUNs are indicating a lot of assigns, but not as
many trespasses.
LUN RG RGtype Meta Trespass Assign Shutdown LwrPfail LwrPrest UprPfail UprPrest Storage
Group
3
thin
0
4916 0
0
0
0
0
SGNAME
4
thin
0
2597 0
0
0
0
0
SGNAME
2. Review volume statistics in the SPX_redirector_debug.txt which is contained in the SPcollects (in
this example, SP B is the SP encountering the issue).
Searching for MR 3 you find the following information:
MR 3 VFFFFF88035EAC0C0 -- Initiator: mNumTrackersAllocated 0, mStats: mTotalRequests
2146620772, mMaxAllocatedTrackers 129
Target: mNumAllocatedTrackers 0 mStats: mTotalRequests 46394, mMaxAllocatedTrackers 14
G0xFFFFF88035B96440 Established
mStats.mTotalExplicitOwnershipReceives 0, mStats.mTotalOwnershipReceives 35352
mStats.remoteToLocalIoDelta -65536, mStats.totalLocalIOServicedLocally 4606171032
====> This is greater than 2147483649, which indicates that SP B was encountering the issue.
mStats.totalPeerIOServiced 46394, mStats.totalLocalIORedirected 2146620775
mStats.RetryableErrorsHandled 0, mStats.totalErrorsHandled 389906

Note: EMC Support can download the following program MRv2.exe, which can assist in diagnosing the
issue. Extract the program into the relevant tools folder (e.g. c:\tools\) and run the command in the folder
where the SPcollects are stored and execute using the command MRv2
Example of the Tool being used:
Codelevel = 05320005.011
The following stats are only valid if this is R32 below p15

Middle Redirector stats as per emc308914


========================================
lun SP %
=============
1 " A " 500 <-----5 " A " 118 <-----2 " B " 123 <-----6 " B " 71
4 " B " 147 <------

Note: For more information consult ARS defect numbers 507904, 507478, 513279, 521421, 524467,
526706,and 526071. ARS access is only available to authorized Customer Service Representatives.
Note: Please read! ETAs constitute formal notification from EMC to customers, partners, and EMC field
personnel. Changes to this solution require approval of the Customer Service ETA Approver and this approval
must be recorded in the Comments of the solution. To identify the Customer Service ETA Approver, go to the
Comments tab or the list of ETA Approvers located on the ETA web page of the Global Service web site.
Article Metadata
Product: VNX1 Series
Problem Code: EMC Software
Shared: Yes
RCA Status: Complete
External Source: Primus
Primus/Webtop solution ID: emc308914
Originally Created By: Gearoid Griffin

Legal Information
Please read! ETAs constitute formal notification from EMC to customers, partners, and EMC Customer Service
personnel. Changes to this article require approval of Customer Service ETA Approvers for the EMC products listed. This
approval must be recorded in the Internal Authoring Notes of the article. To identify the Customer Service ETA Approver,
refer to ETA Approvers List on KCS at EMC on EMC ONE.
EMC CONFIDENTIAL INFORMATION

Vous aimerez peut-être aussi