Vous êtes sur la page 1sur 3

Netapp performance monitoring : sysstat Part I

Netapp sysstat is like vmstat and iostat rolled into one command. It reports filer performance statistics like CPU utilization, the amount of disk traffic, and tape traffic. When run with out options, sysstat will prints a new line every 15 seconds, of just a basic amount of information. You have to use control-C (^c) or set the interval count (-c count ) to stop sysstat after time. For more detailed information, use the -u option. For specific information to one particular protocol, you can use other options. I’ll list them here.

-f FCP statistics

-i iSCSI statistics

-b SAN (blocks) extended statistics

-u extended utilization statistics

-x extended output format. This includes all available output fields. Be aware that this produces output that is longer than 80 columns and is generally intended for “off-line” types of analysis and not for “real-time” viewing.

-m Displays multi-processor CPU utilization statistics. In addition to the percentage of the time that one or more CPUs were busy (ANY), the average (AVG) is displayed, as well as, the individual utilization of each processor. This is only handy on multi proc systems. Won’t work on single processor machines.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.

Netapp performance monitoring : sysstat : Part II

Here are some explanations on the columns of netapp sysstat command.

Cache age : The age in minutes of the oldest read-only blocks in the buffer cache. Data in this column indicates how fast read operations are cycling through system memory; when the filer is reading very large files, buffer cache age will be very low. Also if reads are random, the cache age will be low. If you have a performance problem, where the read performance is poor, this number may indicate you need a larger memory system or analyze the application to reduce the randomness of the workload.

Cache hit : This is the WAFL cache hit rate percentage. This is the percentage of times where WAFL tried to read a data block from disk that and the data was found already cached in memory. A dash in this column indicates that WAFL did not attempt to load any blocks during the measurement interval.

CP Ty : Consistency Point (CP) type is the reason that a CP started in that interval. The CP types are as follows:

- No CP started during sampling interval (no writes happened to disk at this point of time)

number Number of CPs started during sampling interval

B Back to back CPs (CP generated CP) (The filer is having a tough time keeping up with writes)

b Deferred back to back CPs (CP generated CP) (the back to back condition is getting worse)

F CP caused by full NVLog (one half of the nvram log was full, and so was flushed)

H CP caused by high water mark (rare to see this. The filer was at half way full on one side of the nvram logs, so decides to write on disk).

L CP caused by low water mark

S CP caused by snapshot operation

T CP caused by timer (every 10 seconds filer data is flushed to disk)

U CP caused by flush

: continuation of CP from previous interval (means, A cp is still going on, during 1 second intervals)

The type character is followed by a second character which indicates the phase of the CP at the end of the sampling interval. If the CP completed during the sampling interval, this second character will be blank. The phases are as follows:

0 Initializing

n Processing normal files

s Processing special files

f Flushing modified data to disk

v Flushing modified superblock to disk

CP util : The Consistency Point (CP) utilization, the % of time spent in a CP. 100% time in CP is a good thing. It means, the amount of time, used out of the cpu, that was dedicated to writing data, 100% of it was used. 75% means, that only 75% of the time allocated to writing data was utilized, which means we wasted 25% of that time. A good CP percentage has to be at or near

100%.

You can use Netapp SIO tool to benchmark netapp systems. SIO is a client-side workload generator that works with any target. It generates I/O load and does basic statistics to see how any type of storage performs under certain conditions.

Tuning Netapp snapmirror/snapvault speed

In Netapp, there is a option to set/limit bandwidth of all snapmirror and snapvault transfers. The option can be either system-wide or for a particular transfer. We can tune either the transmit bandwidth on the source or the receive bandwidth on the destination or both together. For particular transfer, you can tune the throttle from snapmirror.conf.

When both per transfer and system-wide throttling are configured, throttling at system wide is applied only if the combined bandwidth used by all the relationships goes above the system- wide throttling value. System-wide throttling is enabled by using three new options using the options command.

To list the tunable replication throttle values:

destination-filer*> options replication replication.throttle.enable off replication.throttle.incoming.max_kbs unlimited replication.throttle.outgoing.max_kbs unlimited destination-filer*>

Enable Throttling: This enables throttling of Snapmirror and Snapvault transfers.

destination-filer*> options replication.throttle.enable on

Set incoming bandwidth limit:

destination-filer*> options replication.throttle.incoming.max_kbs 1024

This netapp option has to be applied on the destination system. This option specifies the maximum total bandwidth used by all the incoming SnapMirror and SnapVault transfers, specified in kilobytes/sec. The default value is “unlimited” .

Set outgoing bandwidth limit:

source-filer*> options replication.throttle.outgoing.max_kbs 1024

This Netapp option has to be applied on the source system. This option specifies the maximum total bandwidth used by all the outgoing SnapMirror and SnapVault transfers specified in kilobytes/sec.