Vous êtes sur la page 1sur 3

Ten steps to troubleshooting SAN NAS performance

problems

Learn how to isolate the cause of performance problems in


your storage system, fix what's broken and learn from your
mistakes.

Both SAN and NAS architectures experience performance problems for a variety of reasons,
including increased workloads or the addition of new applications or tools. Some issues are
specific to a given storage environment, but many can be isolated using this step-by-step
approach:

Ten steps to troubleshooting SAN/NAS performance problems

1. Do you actually have a SAN/NAS related performance problem? In order to understand


whether or not you really have a performance issue, you have to identify the precise nature of the
problem. Are you able to access data and applications -- just very slowly? Or are you unable to
access any data or applications, and receive an error message instead?

2. What is the normal expected behavior of the SAN/NAS environment?

While not ideal, it may be normal for performance to slow down at certain times of the day,
similar to how performance on your home cable or DSL modem slows down in the late afternoon
shortly after school lets out for the day. While known slow-downs may be accepted, ultimately
you will want to know where and why these occur and have a plan to address them. Having a
baseline performance summary helps to know what is normal and what is not.

3. Can the performance problem be reproduced?

Is it a transitory performance issue, or is it consistent and capable of being reproduced? Can you
access data and performance work, albeit at a slower pace, or has everything come to a
screeching halt? Is this a first-time occurrence, or have the symptoms been seen before? Is this a
seasonal performance problem, for example, handling more transactions during the holiday
season, that can be addressed by spending some money for more equipment, or is something that
can be dealt with as an occasional nuisance?

4. Is everything functioning as it should, or has something failed?

Has any hardware failed or exhibited signs that it might be about to fail? What type of error log
and event log activity has taken place? Is the performance problem isolated to specific users,
applications, servers, files and data, or storage resources? Have any disk drives failed, triggering
automatic hot spare disk rebuilds, or has a controller or adapter failed over? Some tools for
monitoring and collecting performance data include: iostat, NetStat, nfstat, PerfMan, NTSMF
from Demand Technologies, ITR client, and Intel Iometer, among other standard and vendor
provided products.

5. What has changed in the SAN/NAS environment since the problem started?

Have any changes been made to storage subsystems (expansion, reconfiguration, and other
changes), NAS appliances or gateways, network and storage interfaces, servers, volume
managers, applications or databases? Do you have a change control process to help determine
what will be changed? Do you have fall-back procedures in case something does not work
correctly? Have any new security polices or access controls been applied? Have file system
eminence or virus detection scans been initiated prior to the performance problem being
reported? Is any maintenance on data or hardware/software components being performed?

6. What other applications and workload are running?

Have any new applications been added or changed? Has new workload been added? Have any
applications changed that subsequently require more storage and I/O resources? Are any
applications misbehaving, for example, a database query taking out and excessively holding
locks on resources? Are any virus, spyware, security auditing, disk defragmentation, backup,
data classification tools or performance monitoring tools running and performing I/O to storage
devices where performance is being impacted?

7. What does a quick scan of your SAN/NAS environment show?

Do you have health statuses monitors that you can look at to determine the general health and
well-being of your environment? What is the status of memory and CPU resources on servers?
What are the busiest processes and what resources are they consuming? What are the busiest
storage volumes and which adapter and I/O paths do they use? What is the status and
performance of interfaces, including Ethernet for IP and Fibre Channel for open systems and
FICON mainframe attachment? What is the performance of the storage subsystem including
cache hits, cache utilization, cache effectiveness, and device activity?

8. Is it a local or remote performance problem?

Can you determine that there are problems with your local LAN or SAN segments, by using a
ping to check network connectivity, or by performing an I/O command to a storage device?
Determining if the performance problem is local or remote can be done by verifying performance
to local storage and then comparing that to remote. Things to look at for remote performance
would be the network interface using ping, NetStat or nfstat to look at link errors, response time,
timeouts, re-transmits and packet loss. What is the status of inter-switch links (ISLs), routers,
bridges and gateways? Are they functioning normally?

9. Do you need outside help to determine and correct the problem?

Do you need to enlist the support of your vendors (hardware, software, networks) to provide
diagnostic and test tools or hands-on assistance? Your vendors may have knowledge bases with
information on troubleshooting performance and other problems that you can use as a source of
information and education. <p>

10. Have you learned from the incident?

Have you documented the findings, resolution and symptoms to help others troubleshoot the
same problem in the future?

Vous aimerez peut-être aussi