489 6GbsSAS 12Gbs PerfTuningGuide

Avago 6Gb/s SAS and 12Gb/s SAS Performance
Tuning Guide
User Guide
Version 1.0
October 2014
DB15-001127-01
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide
October 2014
For a comprehensive list of changes to this document, see the Revision History.
Corporate Headquarters Email Website

San Jose, CA globalsupport.pdl@avagotech.com www.lsi.com
800-372-2447
Avago Technologies, the A logo, LSI, Storage by LSI, DataBolt, MegaRAID, MegaRAID Storage Manager, and
Fusion-MPT are trademarks of Avago Technologies in the United States and other countries. All other brand and
product names may be trademarks of their respective companies.
Data subject to change. Copyright © 2014 Avago Technologies. All Rights Reserved.
Avago Technologies Confidential

-4-
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Table of Contents
October 2014
Table of Contents
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Performance Measurement Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Performance Testing Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Chapter 2: Calculate Expected Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.1 Bottlenecks and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Interface Connection Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Device Hardware Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Bottleneck Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3.1 6Gb/s SAS Controller Bottleneck Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3.2 12Gb/s SAS Controller PCIe Bottleneck Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.3.3 12Gb/s SAS Controller with PCIe and Drive Bottleneck Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3.4 12Gb/s SAS Controller Small Sequential IOPs Bottleneck Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.3.5 12Gb/s SAS Controller Throughput Bottleneck Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Queue Depth and Expected Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 3: Build Your Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.1 Host System Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.1 Processor Architecture and Core Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.1.2 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.3 PCIe Slot Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.4 Non Uniform Memory Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.5 BIOS Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Storage Components and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Initiators and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.1.1 Initiator Features that Affect Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 Expanders and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2.1 Expanders and Latency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.2.2 DataBolt Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.2.3 Storage Drives and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2.4 Target-Mode Controllers and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.5 SSD Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2.5.1 SNIA SSD Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.5.2 Alternative SSD Preconditioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.3 Storage Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.1 Direct Attached Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.2 Expander Attached Topology - Single . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.3 Expander Attached Topology - Cascade . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.4 Expander Attached Topology - Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.3.5 Multipath Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3.6 Topology Guidelines for Better Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Chapter 4: Configure Your Test Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.1 Operating System Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1 Windows Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1.1 Windows Operating System Hotfixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1.2 MSI-X Interrupt Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1.3 Process Affinity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1.4 Driver Version and Customization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.1.1.5 Disk Write Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

-3-
October 2014
4.1.2 Linux Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.2.1 Linux Kernel Version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.2.2 Linux Drivers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.2.3 MSI-X Interrupt Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1.2.4 I/O Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2.5 Block Layer I/O Scheduler Queue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.1.2.6 SCSI Queue Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2.7 Nomerges Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2.8 Rotational Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2.9 Add Random Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.1.2.10 Linux Write Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Volume Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.1 Volume Configurations and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.2 Volume Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.2.3 Strip Size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.4 Cache Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.5 Disk Cache Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.6 I/O Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.7 Consistency and Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.8 Background Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.9 MegaRAID FastPath Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.2.10 Guidelines on Volume Configurations for Better Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Software Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3.1 Linux Performance Monitoring Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1.1 sar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1.2 iostat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3.1.3 blktrace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.3.1.4 blkparse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3.2 Windows XPerf . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.3.3 Windows Performance Monitor (Perfmon) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Chapter 5: Benchmark Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.1 Benchmarking Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Iometer for Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 Run Iometer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.2 Iometer Tips and Tricks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.3 Interpret Iometer Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.2.4 Iometer References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3 Vdbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.3.1 Install Vdbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
5.3.2 Run Vdbench . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.3 Sample Vdbench Script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3.4 Interpret Vdbench Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Jetstress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.4.1 Install Jetstress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.2 Create your Jetstress Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.2.1 Select Capacity and Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
5.4.2.2 Select Test Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.2.3 Define Test Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.2.4 Configure Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4.2.5 Select Database Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.3 Start the Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.3.1 Characterize the Jetstress Workload . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.4.4 Interpret Jetstress Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.4.1 Transactional I/O Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.4.4.2 Background Database Maintenance I/O Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5 fio for Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.5.1 Get Started with fio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

-4-
October 2014
5.5.2 fio Performance-Related Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

5.5.3 Interpret fio Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5.6 Verify Benchmark Results for Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Chapter 6: Compare Measured Results with Expected Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.1 Performance Result Examples for MegaRAID . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.1 Eight Drive Direct Attached Example Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.1.2 Twenty-four Drive Expander Attached Example Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.1.3 Forty Drive Expander Attached Example Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2 Performance Results Examples for IT Controllers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
6.2.1 Eight Drive Direct Attached Example Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.2 Twenty-four Drive Expander Attached Example Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.2.3 Forty Drive Expander Attached Example Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Chapter 7: Troubleshoot Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Appendix A: Performance Testing Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Version 1.0, October 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Advance, Version 0.1, March 2014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

-5-
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 1: Introduction
October 2014 Overview
Chapter 1: Introduction
Use this Performance Tuning Guide for Avago® 6Gb/s SAS and 12Gb/s SAS I/O controller, ROC controller, and expander
products. This document targets only the storage specific performance of these products and aims to convey the
following ideas:
 Understand the performance measurement process
 Reach a desired performance goal of a storage topology
 Debug any unexpected results or bottlenecks that you might encounter during performance measurement
This document focuses on performance related settings and configurations only. See the References section for
related documents. For any initial and basic device bring up, refer to the product documentation for your product.
1.1 Overview
In general, the performance measurement process might have the following steps:
1. Decide what to measure.
2. Understand what to expect.
3. Build your test configuration.
4. Configure different parameters that might influence your performance tests.
5. Run the performance benchmark test and capture the results.
6. Analyze and compare your results with the expected results.
7. If you have any unexpected results troubleshoot issues until you achieve the expected results.
The performance measurement process can vary depending on your measurement objective. The objective might be
a benchmarking exercise for a new product or a debug effort to understand why a certain measurement is not
attaining expected results. This tuning guide organizes its chapters to match the performance measurement process.
Chapter 1, Introduction
Introduces the performance measurement process with commonly used metrics and methodologies.
Reviews factors to consider during the benchmarking process.
Chapter 2, Calculate Expected Performance
Introduces the bottlenecks and limitations that you might encounter during performance measurement. This
chapter helps you learn what to expect from a specific storage configuration with Avago 6Gb/s and 12Gb/s
SAS products.
Chapter 3, Build Your Test Setup
Helps to set up your storage topology and configure specific parameters that can affect what you try to
measure. Addresses settings and options that may not change between different runs of a specific
performance measurement project, such as storage topology.
Chapter 4, Configure Your Test Parameters
Helps with understanding different tunable hardware and software options after your system is setup.
Addresses options that may change between different runs of a specific performance measurement project,
such as different volume configurations of a specific storage topology.
Chapter 5, Benchmark Resources
Helps you choose the correct benchmarking tool or a system monitoring tool that best suites the metric that
you intend to measure. Different settings and tool tips are discussed to get reliable results from these tools
and to validate the results.

-6-
October 2014 Performance Metrics
Chapter 6, Compare Measured Results with Expected Results

Helps you analyze the results and compare your results with the expected results. This chapter provides
example results from Avago standard performance runs to gauge your results.
Chapter 7, Troubleshoot Performance Issues
Reviews questions that you might ask or additional tests that you might run in the case of unexpected results.
By doing so, this chapter takes you through different debugging steps to isolate and root cause the issue
quicker.
1.2 Performance Metrics
This section lists the commonly used primary and secondary performance metrics for performance analysis. Primary
performance metrics include throughput and latency.
Throughput (MBPS and IOPs)
A rate at which the data can be transferred in a unit of time. Throughput is typically given in terms of I/Os per
second (IOPs) and Megabytes per second (MB/s or MBPS). IOPs generally measure data of a random nature.
MB/s generally measure data of a sequential nature.
Often throughput of small I/O sizes are expressed in IOPs, whereas large I/O throughputs are expressed in
MBPS. Both units represent the same quantity, but with a different scale factor. A larger throughput value
indicates greater performance. When expressed as MB/s, throughput is often called bandwidth.
NOTE Avago uses binary base (1 KB = 1024 Bytes) when representing MBPS.
Be wary of tools that might represent MBPS in decimal base (1 KB =
1000 bytes).
Latency
The time to complete an I/O. Under certain conditions latency is the inverse of throughput, and a tradeoff
exists between the two. Latency is generally lower on lightly loaded systems and higher on heavily loaded
systems that issue several I/Os simultaneously. Lower latencies are more desirable and many applications
have requirements around latency thresholds. Several different latency variations follow:
 Minimum Latency: Latency of the single fastest I/O measured.
 Maximum Latency: Latency of the single slowest I/O measured.
 Average Latency: Latency of all I/O measured and averaged together.
 Percentile Latency: Maximum latency of a certain percent of all the I/O measured. Typical percentiles
used are 95%, 99% and 99.9%. Use percentile latency to remove extreme, uncharacteristic I/O outliers
that skew the latency calculations.
 Histogram Latency: Distribution of latencies, of all the I/O measured, by using predetermined
ranges (buckets).
NOTE When you compare latencies of different products make sure that the
throughput is the same. A storage controller might tune for its
maximum throughput, thus compromise on latency, or vice versa.
Less commonly used performance metrics, or secondary metrics, might prove useful in certain situations depending
on the point of interest. Secondary performance metrics include the following:
Utilization
Percent of time that a resource is used, such as CPU, a storage link, or a disk
Efficiency
A ratio, typically throughput divided by utilization. A commonly used efficiency metric is IOPs / % CPU

-7-
October 2014 Performance Measurement Characteristics
Interrupt rate
Number of host driver interrupts per second or per I/O
The metrics outlined in this section describe storage performance at a fundamental level. Most applications and third
party benchmarks have their own method to express performance by using different terminologies that depend on
what is measured. For example, database benchmarks use transactions as the base unit to measure performance. A
transaction could be a single I/O or more complex with multiple read and write I/Os issued to complete a task. Refer to
the application or benchmark documentation before you analyze the metrics it produces.
1.3 Performance Measurement Characteristics
Any metrics that you use to characterize performance must have two common characteristics:
Reliability
Performance must be a measurement of the system or device in a deterministic, known state. Performance
measured when the storage system or device is in a state unknowingly influenced by external variables, such
as equipment failures or transient cache states, might result in inaccurate measurements.
Repeatability
Performance measurements of a storage system under the same configurations and environment conditions
must always provide the same results. Only then can you consider those results valid. Measurements with a
high level of variance must not be considered as valid, but analyzed closely for any possible discrepancies
between runs. Such analysis helps determine any variables that previously were considered as constants.
1.4 Performance Testing Overview
This section provides a performance testing overview. Each topic is treated in more detail throughout this document.
As previously listed, any performance task uses the following steps:
1. Decide what to measure.
2. Understand what to expect.
3. Build your test configuration.
4. Configure different parameters that might influence your performance tests.
5. Run the performance benchmark test and capture the results.
6. Analyze and compare your results with the expected results.
7. If you have any unexpected results troubleshoot issues until you achieve the expected results.
In performance analysis, always first ask, “What is being measured?”.
You can quantify performance in many different ways, depending on the intended purpose of the storage system or a
specific device. Some devices focus on delivering as many I/Os as possible. Other devices might focus on delivering
fewer I/Os, but in the fastest manner possible. You must decide the operating conditions under which you measure
the performance of a device or a system. After you make that decision, you can easily decide the host system, storage
topology, media type, link speeds at different interfaces, operating system requirements, RAID configurations,
software tools, and so on.
For example, when you measure storage controller performance, you want to eliminate bottlenecks that the storage
controller does not cause. You also want to control variables that might affect your measurement. The following
high-level guidelines would help you prepare your system for the performance test.

-8-
October 2014 Performance Testing Overview
General Guidelines for Better Performance Measurements

 Use the host system with the latest high performance processors and chip sets.
 Use latest motherboards that allow use of more than one CPU socket, then populate all the CPUs if possible.
 Use the latest system BIOS version for the motherboard
 Tune the system BIOS settings for performance rather than for power saving mode.
 Use an up-to-date operating system and implement any necessary patches or updates that might
affect performance.
 Set the interface speeds (such as PCI Express® (PCIe®) and SAS) to their maximum so the controller is the only
bottleneck. For example, to measure a PCIe Generation 3 (8 Gb/s) controller performance, you do not want to
configure your motherboard PCIe slot to PCIe Generation 2 (5 Gb/s) speed.
 Make sure sufficient drives are in place to exercise the maximum controller performance. For example, to measure
the maximum bandwidth of a SAS controller, you might need more than 20 hard drives. In such cases, using only
the direct attached drives is bottlenecked by the drives.
 Make sure the cables and connectors are not prone to signal integrity issues. For example, use the appropriate
cable length to connect the controller and expander. Use cables and connectors that meet the specification
standards such SAS, SATA, and PCIe of your storage devices.
 Make sure sufficient cooling is in place, so temperature variations do not affect your measurements.
 Choose the benchmarking tool or system monitoring tool that properly measures the metric of your interest. For
example, if latency is your prime metric, the VDbench tool might be better than the v1.1.0 IOmeter tool.
 Make sure performance-related features of other devices are in a known state. For example,
— A 12Gb/s SAS expander might enable a buffering feature that is advantageous for 6 Gb/s drives.
— Write cache on hard drives impact the performance
 Update all devices in your systems with the latest firmware and software, such as BIOS, driver, tools, and utilities.
 Make sure to run workloads that represent your real-life scenarios.
Overlooking any of the basic guidelines can result in an unreliable or inconsistent performance measurement. The
following table lists such problems and their potential causes.
Problem Potential Causes

Performance  Insufficient disks
measurement lower  Link not running up to expected speed
than expected
 CPU utilization at 100 %
 Disk sees random I/O when sequential I/O is intended
 Unexpected file system or operating system influence
 Some component failed and generated errors
 Incorrect performance expectation
 One or more disks in the virtual drive has lower performance than the other drives (that is, drive
becomes defective, has many reallocated sectors or media errors, and so on).
 Background tasks running, such as a consistency check or patrol read.
Performance  Using more disks than expected
measurement higher  Disk sees sequential I/O when random I/O is intended
than expected
 I/Os are serviced out of cache instead of reaching the disks
 Incorrect performance expectation
Unstable performance  System processes starting and stopping
measurement results  Intermittent component errors
 Problems with interrupt and process affinity
 Use of nonpreconditioned SSD

-9-
October 2014 References
Problem Potential Causes

Results not repeatable  System or other processes starting and stopping
 Intermittent errors
 Inconsistent test process
 RAID background operations starting and stopping
 Use of nonpreconditioned SSD
 Too-short test time
 Thermal problems
Insensitivity to  Parameter remained unchanged
expected parameter  Incorrect expectation
changes
 I/O not going to the expected target devices
Runtime hardware or  Thermal problems
software errors  Use of uninitialized volumes
 Illegal topology
 Use of broken or inappropriate cables and drive enclosures
 Insufficient drive power
For any measurement, first develop a baseline. That is, a simple, stable test environment and measurement. Try to
deviate from the baseline by only changing one factor at a time to help isolate and root cause any issue that
might occur.
When you have made sure that your test system is ready for measurement and your baseline proves no issues exist,
you may run your benchmark and obtain your results. If you are running your tests for the first time, it is a good
practice to rerun the same tests for repeatability. You might also monitor your results closely to check for any
anomalies such as errors, link failures, improper worker assignments, and so on. When your results are valid, compare
them with the expected results, results from other benchmarks, and benchmarks published by product vendors. If the
results match, use these results as your golden reference for further tests. If these results differ, revisit your test to
understand the bottleneck that stops you from reaching the expected results.
NOTE If you see any performance issues with the Avago products capture all
information about your test to create a support request for Avago.
Work with your FAE to use the LSIGet tool (http://sae.lsi.com/ or
ftp://ftp0.lsil.com/outgoing_perm/CaptureScripts) to capture all
information. This tool captures the information about the host system,
storage topology, RAID volume information, and so on. Also provide
the benchmark related information and any associated scripts that
you have used.
1.5 References
Refer to the following Avago documentation for product-specific information. Contact your FAE to obtain
documentation.
 LSI Scrutiny Tool User Guide
 LSI SAS-3 Architecture Guide
 StorCLI Reference Manual
 MegaRAID SAS Device Driver Installation User Guide
 MegaRAID SAS Software User’s Guide
 Linux Device Mapping in LSI Expander-Designed Backplanes SEN

- 10 -
October 2014 References
 12Gb/s SAS Controllers

— LSISAS3xxx PCI Express to 12Gb/s SAS Controller Datasheet
— LSISAS3xxx PCI Express to 12Gb/s SAS Controller Configuration Programming Guide
— LSISAS3108 PCI Express to 12Gb/s SAS ROC Controller Register Programming Guide
— LSISAS3108 PCI Express to 12Gb/s SAS/SATA ROC Controller SDK Programming Guide
— LSISAS3xxx Controller Reference Schematic
 6Gb/s SAS Controllers
— LSISAS2xxx PCI Express to 6Gb/s SAS/SATA Controller Design Considerations SEN
— LSISAS2xxx PCI Express to 6Gb/s SAS/SATA ROC Controller Reference Manual
— LSISAS2208 PCI Express to 6Gb/s SAS/SATA ROC Controller Programming Guide
— LSISAS2208 PCI Express to 6Gb/s SAS/SATA ROC Controller SDK Programming Guide
 12Gb/s SAS Expanders
— LSISAS3xxx PCI Express to 12Gb/s SAS Controller Datasheet
— LSISAS3xxx PCI Express to 12Gb/s SAS Controller Configuration Programming Guide
— LSISAS3xXX 12Gb/s SAS/SATA Expander Family Register Reference Manual
— LSISAS3xXX-R 12Gb/s SAS/SATA Expander Family Register Reference Manual
— LSI 12Gb/s SAS/SATA Expander Software Development Kit Programming Guide
— 12Gb/s SAS/SATA Expander Firmware Configuration Programming Guide
— LSI 12Gb/s Expander Tools (Xtools) User Guide
— LSI 12Gb/s Expander Flash (g3Xflash) User Guide
— LSI 12Gb/s Expander Manufacturing Image (g3Xmfg) User Guide
— LSI 12Gb/s Expander Diagnostics Utility (g3Xutil) User Guide
— LSI 12Gb/s Expander IP Configuration Utility (g3Xip) User Guide
— Configuration Page Definition for 12Gb/s SAS/SATA Expander Firmware Application Note
— LSISAS3xXX-R 12Gb/s SAS/SATA Expander Family Register Reference Manual
— LSI 12Gb/s SAS/SATA Expander Software Development Kit Programming Guide
 6Gb/s SAS Expanders
— LSISAS2xXX Expander Design Considerations SEN
— LSISAS2xXX Expander Reference Manual
— LSI 6Gb/s SAS/SATA Expander SDK Programming Guide
— LSI Expander Flash Utility (Xflash) User Guide
— LSI Expander Tools (Xtools) User Guide
— Configuration Page Definition for 6Gb/s SAS/SATA Expander Firmware
 HBAs
— LSI SAS 9xxx-xx PCI Express to 12Gb/s Serial Attached SCSI (SAS) Host Bus Adapter User Guide
— PCI Express to xGb/s Serial Attached SCSI (SAS) Host Bus Adapters User Guide
— Quick Installation Guide LSI SAS 9xxx-xx PCI Express to 12Gb/s SAS Host Bus Adapter

- 11 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 2: Calculate Expected Performance
October 2014 Bottlenecks and Limitations
Chapter 2: Calculate Expected Performance

This chapter explains how to calculate the expected performance of your system. To understand the expected
performance, it is important to understand the bottlenecks and limitations of different devices and interfaces of your
system. This performance guide reviews the bottleneck and limitations related to the following Avago
storage products:
 SAS Storage I/O Controllers (IOCs)
— LSISAS2008, LSISAS2308 (6Gb/s SAS)
— LSISAS3004, LSISAS3008 (12Gb/s SAS)
 RAID-on-Chip ICs (ROCs)
— LSISAS2108, LSISAS2116, LSISAS2208 (6Gb/s SAS)
— LSISAS3108 (12Gb/s SAS)
 Host Bust Adapters (HBAs)
— LSI SAS 92xx (6Gb/s SAS)
— LSI SAS 93xx (12Gb/s SAS)
 RAID Controllers
— LSI MegaRAID SAS 6Gb/s RAID (LSI MegaRAID SAS 92xx)
— LSI MegaRAID SAS 12Gb/s RAID (LSI MegaRAID SAS 93xx)
 SAS Expanders
— LSISAS2x36, LSISAS2x28, LSISAS2x24, LSISAS2x20 (6Gb/s SAS)
— LSISAS3x48, LSISAS3x40, LSISAS3x36, LSISAS3x36-R, LSISAS3x28-R, LSISAS3x24-R (12Gb/s SAS)
The performance of each product depends on the PCIe interface and SAS speeds, processing power of the product’s
CPU and DMA engines, and I/O routing capabilities of the hardware modules. To measure the product performance
capability, avoid other bottlenecks in the system as much as possible. The maximum system performance is the
minimum of the maximum device or interface performance in the system. After all, any chain is as strong as its
weakest link.
2.1 Bottlenecks and Limitations
Consider the following figure an as example to explain the possible bottlenecks in a storage system.The DDR block
and DDR path does not apply for IOC products.
Figure 1 Example Storage Configuration
$$2 3!3 $RIVE,IMITATIONS $RIVES

3!3
$$2 %XPANDER
#ONTROLLER $RIVES
$RIVES
0#)E
#05 #05
$$2 3$2!-S $$2 3$2!-S

- 12 -
October 2014 Bottlenecks and Limitations
For an I/O read or write, the I/O path is as follows:

Operating system or Application > host CPU > PCIe interface > storage controller > SAS > Expander > Drives
The following table lists factors (including, but not limited to) that affect performance at each level of the previous
I/O path.
Table 1 I/O Path Elements that Affect Performance
I/O Path Elements Factors that Affect Performance

Operating system  Other applications using host resources
or Application  Network loads
 Operating system type (Windows, Linux, et al)
 Benchmark type (synthetic, application)
 Queue depth or number of outstanding I/Os per physical drive
 MSI-X interrupt vector support
Host CPU  Processor and I/O architecture
 CPU speed
 Number of CPU Sockets
 Number of processor cores per CPU
 Hyper-threading
 Memory size and speed
 NUMA
 Host chipset
PCIe interface  Link rate (2.5 Gb/s, 5 Gb/s, 8 Gb/s)
 Link width
 Signal integrity (SI)
Storage controller IOC and RAID
 CPU core capability
 DMA and Fast Path engines
 I/O coalescing
 Interrupt coalescing
 Controller mode (initiator or target)
RAID Only
 DDR memory type, speed, and size
 RAID initialization or other background operations (Rebuild, Reconstruction, Patrol Read, Consistency
Check)
 RAID volume configurations
 Number of drive groups and volumes

- 13 -
October 2014 Limitations
Table 1 I/O Path Elements that Affect Performance (Continued)
I/O Path Elements Factors that Affect Performance

SAS  Link rate (3 Gb/s, 6 Gb/s, 12 Gb/s)
 Link width
 SI
Expander  Connection routing and arbitration
 DataBolt (end device buffering)
Drives  Individual drive performance
 Number of drives
 Protocol (SAS or SATA)
 Media type (HDD or SSD)
 Link rate (3 Gb/s, 6 Gb/s, 12 Gb/s)
 Write cache
 Preconditioning (SSD only)
NOTE The storage topology, how the storage components connect can
affect performance. For easy explanation, only a controller > expander
> drives topology is chosen here. A storage topology can be built
many different ways and performance can differ between different
topologies; see Section 3.3, Storage Topology for more information.
Performance measurement can occur for a specific device or a specific topology.
 When you measure device performance, overprovision all other interfaces and devices so the device is the only
bottleneck and you measure the maximum device capabilities.
 When you measure a specific topology performance, keep all devices and interfaces at the maximum capability
so the measurement exposes any device or interface that performs lower than others.
2.2 Limitations
This section presents maximum interface and drive limitations, including:

 Maximum theoretical and practical bottlenecks of SAS/SATA and PCIe interfaces in a storage topology
 Expected maximum performance for different drive types (SAS/SATA, SSD/HDD)
 Hardware limitations of storage controllers
2.2.1 Interface Connection Limitations
Interface Limitations
Table 2 Generation 2 Interface Connection Limitations
BW (Uni-Directional) MB/s
Technology Phys
Theoretical Practical
PCIe x1 500 400
(5 Gb/s) x4 2000 1600
x8 4000 3200

- 14 -
Table 2 Generation 2 Interface Connection Limitations (Continued)
Technology Phys
SAS x1 600 550
(6 Gb/s) x4 2400 2200
x8 4800 4400
SATA x1 300 260 (3 Gb/s)
(6 Gb/s) 520 (6 Gb/s)
x4 1200 1040 (3 Gb/s)
2080 (6 Gb/s)
x8 2400 2080 (3 Gb/s)
4160 (6 Gb/s)
Table 3 Generation 3 Interface Connection Limitations
Technology Phys
PCIe x1 800 790
(8 Gb/s) x4 3200 3200
x8 6400 6400
SAS x1 1200 1100
(12 Gb/s) x4 4800 4400
x8 9600 8800
SATA x1 600 260 (3 Gb/s)
(12 Gb/s) 490 (6 Gb/s)
x4 2400 820 (3 Gb/s)
1540 (6 Gb/s)
x8 4800 1640 (3 Gb/s)
3080 (6 Gb/s)
Disk Drive Limitations
Table 4 Disk Drive Interface Limitations
Generation Drive Type Disk K IOPs Sustained MB/s

Generation 2 SAS 2.5’ 40 to 250 80 to 210
(6 Gb/s)
SAS3.5’ 40 to 250 90 to 220
SATA 2.5’ 10 to 70 40 to 120
SATA 3.5’ 10 to 70 80 to 150
Generation 3 SAS HDD 40 to 250 100 to 220
(12 Gb/s)
SATA HDD 10 to 80 50 to 150
SAS SSD 10 to 120 550
SATA SSD 10 to 100 550

- 15 -
The previous tables help you choose a right storage topology for your measurement. For example, to measure the
maximum IOPs of a controller expected to give 500,000 IOPs maximum, eight direct-attached SAS HDDs that give
100,000 IOPs maximum each might be sufficient because 8 x 100,000 = 800,000 IOPs > 500,000 IOPs. However, the
same topology is not sufficient to measure the maximum MBPS of the same controller if the controller is expected to
exceed 4000 MBPS and the drives are expected to give only 200 MBPS maximum each. The drives limit the
performance at (8 x 200 =) 1600 MBPS, so the eight direct-attached SAS HDD topology is not suited to measure
greater than the 4000 MBPS limit of the controller.
2.2.2 Device Hardware Limitations
The following table lists the maximum performance for Avago controllers.
Table 5 Device Hardware Maximum Performance
SAS Maximum Read SAS Maximum Write

Generation Controller SATA Maximum IOPs SAS Maximum IOPs
MBiPS MBiPS
6 Gb/s SAS, LSISAS2008 245,000 at 0.5 KB 350.000 at 0.5 KB 3100 MBiPS 2700 MBiPS
5GT/s PCIe
LSISAS2108 256,000 at 0.5 KB [4K SR 308,000 at 0.5 KB [4 K SR 1721 MBiPs 934 MBiPS
RAID0 SATA 3Gb/s SSD] RAID0 6Gb/s SAS SSD] [RAID0 6Gb/s SAS SSD] [RAID0 6Gb/s SAS SSD]
6 Gb/s SAS, LSISAS2308 460,000 at 4 KB 640,000 at 4 KB 4320 MBiPS 4300 MBiPS
8GT/s PCIe
LSISAS2208 502,000 at 4 KB [4K SR 521,000 at 4 KB [4K SR,RAID0] 4315 MBiPS 4281 MBiPS
RAID0]
12 Gb/s SAS, LSISAS3008 683,000 at 4 KB 1.45 Million at 4 KB [4 K RR] 5930 MBiPS 6590 MBiPS
8GT/s PCIe
LSISAS3108 653,000 at 4 KB 1.43 Million at 4 KB [4 K RR] 5930 MBiPS 6590 MBiPS
NOTE Do not expect RAID performance to equal JBOD performance. RAID

operations have additional I/O overheads that reduce
maximum capability.
2.2.3 Bottleneck Examples
The maximum throughput is the minimum of the maximum performances of all the interfaces and devices. As a
result, bottlenecks might be due to:
 Limitation of any interfaces, devices, or topology
 Number of devices and links
 Computational overheads, and so on
For example, while finding the maximum IOPs of the system, usually the processing capabilities of the storage
controller and host CPU cause the bottleneck. But if the number of drives are lower to meet the maximum IOPs of the
controller the number of drives becomes the bottleneck.
While finding the maximum MBPS of the system, factors such as controller DDR interface, SAS, PCIe interface, host
CPU, or number of drives can cause the bottleneck. The factor with the lowest maximum MBPS for a specific workload
becomes the bottleneck.
Knowing the bottleneck or limitation of your system configuration helps you understand the expected maximum
performance of your system. The following examples discuss different bottlenecks related to storage controllers
and performance.

- 16 -
2.2.3.1 6Gb/s SAS Controller Bottleneck Example

The following figure shows the SAS controllers that use 8 Gb/s PCIe interface and 6Gb/s SAS. This setup uses a x8 link
to an expander with 40 drives. The drives are 6Gb/s SAS drives and each drive is capable of 120 MB/s and
460 IOPs maximum for Random I/Os.
Figure 2 6Gb/s SAS Controllers, Revision C1 and Later

-"S$RIVE-2227
+)/0S0RACTICAL )/0S$RIVE+2227
'BS3!3 'BS3EAGATE+n!PPROX
'"S0EAK '"S0RACTICAL '"SOR +)/0S
$$2AT
'BS3!3
-(Z %XPANDER
#ONTROLLER
'BS3!3

3!3"OTTLENECK'BS3!3CONTROLLER
'BS0#)E WITHOUTDRIVELIMITATIONRESULTSIN'BS
'"S0RACTICAL SPEEDLIMIT
(OST#HIPSET
#05 #05
'"S$$2 3$2!-S '"S$$2 3$2!-S

BYTEPRIMARYDATABUS BYTEPRIMARYDATABUS
From the inherent nature of the SAS and PCIe interface used, the controller’s SAS connection becomes the
performance bottleneck for the case that follows.
Maximum random MB/s = Minimum (PCIe 8 Gb/s, SAS 6Gb/s, 40x drives)
= Minimum (6.4 GB/s, 4.4 GB/s, 40 x 120 MB/s)
= 4.4 GB/s (SAS bottleneck)

- 17 -
2.2.3.2 12Gb/s SAS Controller PCIe Bottleneck Example

Compared to Figure 2, 6Gb/s SAS Controllers, Revision C1 and Later, the following figure uses a 12Gb/s SAS controller
with 12Gb/s SAS instead of the 6Gb/s SAS.
Figure 3 12Gb/s SAS Controllers with PCIe Bottleneck

-"S$RIVE -2227
-)/0S0RACTICAL )/0S$RIVE +2227
'BS3!3 'BS3EAGATE+n!PPROX
'"S0EAK '"S0RACTICAL '"SOR +)/0S
$$2AT
'BS3!3
-(Z %XPANDER
#ONTROLLER
'BS3!3

0#)E"OTTLENECK,ARGE)/SAND
'BS0#)E CONTROLLERINTERFACESRUNNINGMAXIMUM
'"S0RACTICAL WIDTHANDSPEEDWITHOUTDRIVELIMITATIONS
(OST#HIPSET
#05 #05
'"S$$2 3$2!-S '"S$$2 3$2!-S

Now the drive throughput becomes the bottleneck for Random Read/Random Write. With 6Gb/s SAS, the 40x drives
gave 4.8 GB/s (40 x 120 MB/s):
Maximum Random MB/s = Minimum (PCIe 8 Gb/s, SAS 12Gb/s, 40x drives at 120 MB/s)
= Minimum (6.4 GB/s, 8.8 GB/s, 4.8 GB/s)
= 4.8 GB/s (drive bottleneck)
If the expander is a 12Gb/s expander with Databolt, the expander can extract almost 12 Gb/s performance from
6 Gb/s drives. With DataBolt enabled, the same drives can reach up to 9.6 GB/s (2 x 4.8). Assuming the drives reach
7.2 GB/s for the Random Read/Random Write, the PCIe interface becomes the bottleneck.
Maximum MB/s = Minimum (PCIe 8 Gb/s, SAS 12Gb/s, 40x drives at 6 Gb/s + Databolt)
= 6.4 GB/s (PCIe bottleneck)

- 18 -
2.2.3.3 12Gb/s SAS Controller with PCIe and Drive Bottleneck Example
Compared to Figure 3, the following figure illustrates two cases. One with 40 drives and another with 24 drives.
Figure 4 12Gb/s SAS Controller with PCIe and Drive Bottleneck

-)/0S0RACTICAL -"S$RIVE-2227
'BS3!3 'BS3EAGATE+n!PPROX'"SOR +)/0S
'"S0EAK '"S0RACTICAL 'BS3EAGATE+n!PPROX'"SOR +)/0S
$$2AT
'BS3!3
-(Z %XPANDER
#ONTROLLER
'BS3!3

'BS0#)E 0#)E"OTTLENECK0#)E'BSXWITHOUTWITHOUTDRIVELIMITATIONS
'"S0RACTICAL $ISK"OTTLENECK,IMITOFSPECIFIEDDRIVES
(OST#HIPSET
#05 #05
'"S$$2 3$2!-S '"S$$2 3$2!-S

The 40-drive case behaves similarly to the previous example, where the bottleneck is the PCIe interface because the
drives can reach 7.2 GB/s sequential reads/writes.
For the 24-drive case, the drive performance falls to 4.3 GB/s so the number of drives causes the bottleneck:
Maximum sequential MB/s = Minimum (PCIe 8 Gb/s, SAS 12Gb/s, 24x drives)
= 4.3 GB/s (number-of-drives bottleneck)

- 19 -
2.2.3.4 12Gb/s SAS Controller Small Sequential IOPs Bottleneck Example

The following figure shows the SAS link width reduced to x4 instead of x8, as in earlier examples.
Figure 5 IOPs Small Sequential Random Write

+)/0S0RACTICAL 'BS3EAGATE+3732 +)/0S
'BS3!3 !PPROX +)/0S37OR-"S32PERDRIVE
'"S0EAK '"S0RACTICAL -37)/0SOR'"S32
$$2AT
'BS3!3
-(Z %XPANDER
#ONTROLLER
'BS3!3

3!3"OTTLENECK#ONTROLLERLIMITOF +)/0SX +"
'BS0#)E '"S"ECAUSE3!3CONNECTSWITHXTHELOWER
'"S0RACTICAL '"SISTHELIMIT.OTADRIVEBOTTLENECK
(OST#HIPSET
#05 #05
'"S$$2 3$2!-S '"S$$2 3$2!-S

Consider the small sequential IOPs bottlenecks for this case. For 4-KB I/O, the controller can give 600,000 IOPs, which
gives 600,000 x 4 KB = 2400 MB/s = 2.4 GB/s.
However, the SAS link is only x4 and is bottlenecked at 2.2 GB/s:
Maximum 4-KB sequential IOPs = Minimum (PCIe 8 Gb/s at x8 link, controller IOPs limit, SAS 6Gb/s at x4 link)
= 2.2 GB/s (SAS link width bottleneck)

- 20 -
2.2.3.5 12Gb/s SAS Controller Throughput Bottleneck Example

The following figure shows a SAS 12Gb/s controller that reaches 1,250,000 IOPs. However, the SAS link to the expander
is 3Gb/s, which limits performance to 2.2 GB/s even though the link is a x8 link. This scenario uses forty 6 Gb/s SSDs,
each capable of 20,000 IOPs for 0.5-KB Random Write I/Os.
Figure 6 12Gb/s SAS Controller Throughput Bottleneck

-)/0S0RACTICAL 'BS3-!24/PTIMUS33$
'BS3!3 !PPROX +)/0SAT +"27
'"S0EAK '"S0RACTICAL !PPROX +)/0S'"S27
$$2AT
'BS3!3
-(Z %XPANDER
#ONTROLLER
'BS3!3

'BS0#)E $ISK"OTTLENECK$RIVEMAXIMUMSMALLREAD
'"S0RACTICAL WRITE)/0SLIMITSTHETHROUGHPUT
(OST#HIPSET
#05 #05
'"S$$2 3$2!-S '"S$$2 3$2!-S

The bottleneck for maximum IOPs at 0.5-KB Random Writes for 40 drives can reach only 40 x 20,000 = 800,000 IOPs for
0.5-KB Random Writes = 400 MB/s = 0.4 GB/s. For the controller, it is 1.25-million IOPs at 0.5-KB Random Writes =
1,250,000 IOPS x 0.5 KB = 625 MB/s = 0.625 GB/s.
Assuming the host chipset reaches 1.6 GB/s with a x8 PCIe 8 Gb/s link:
Maximum random IOPs at 0.5-KB Random Writes = Minimum (chipset, controller, SAS 3Gb/s at x8 link, 40x SSD with
40,000 IOPs each)
= Minimum (1.6 GB/s, 0.625 GB/s, 2.2 GB/s, 0.4 GB/s)
= 0.4 GB/s (drive’s random performance and number of drives causes the bottleneck)
If the drives reach 40,000 IOPs instead of 20,000 IOPs, the next possible bottleneck is the controller IOPs limitation. The
equation becomes:
Maximum random IOPs at 0.5-KB Random Writes = Minimum (chipset, controller, SAS 3Gb/s at x8 link, 40x SSD with
40,000 IOPs each)
= Minimum (1.6 GB/s, 0.625 GB/s, 2.2 GB/s, 0.8 GB/s)
= 0.625 GB/s (controller IOPs bottleneck)

- 21 -
October 2014 Queue Depth and Expected Performance
2.3 Queue Depth and Expected Performance
Queue depth (Qd) is the number of outstanding I/Os for each device. More outstanding I/Os lets a device maintain its
workload without incurring idle times at the disk. Synthetic benchmarking tools let you directly control the Qd so it is
easier to measure or compare Qd with synthetic benchmarks than real world applications.
Storage applications have many different queues, such as the following:
 Drive Queue Depth
 Adapter SAS Core Outstanding I/O Count
 Driver Maximum Outstanding I/O Count and Individual Count for each device presented
To understand your total expected performance, it is important to understand the effect of queue depth on your
specific media, adapter, and driver.
As shown in the following figure (which shows the Qd scaling for 1x drive case and 8x drive case), increasing the Qd
increases the performance until the drive is saturated. At saturation, the drive performs at its maximum capability.
Increasing Qd after this level does not increase performance.
NOTE In some cases, too large a Qd increase can add overhead because the
drive queue is full and overloaded, and the drive might not perform at
its optimal operating conditions.
Figure 7 HGST Direct Attached Throughput of JBOD RAID Types for Sequential Workloads
+
$ISKS
+
4HROUGHPUT)/0S
+
+
+
$ISKS
+
1$ 1$ 1$ 1$ 1$ 1$
+327ORKLOADS
The following figure illustrates the Qd at controller level and at driver level. This graph compares the Qd scaling of
IT/IR, MegaRAID, and iMR controllers with the Windows operating system in an 8x drive direct attached topology.
Maximum Qd constraints affect the actual Qd.
Adapters have a limit on the maximum outstanding I/Os (OIO) they can support:
 12Gb/s SAS IT/IR controllers have a hardcoded value of approximately 9000 I/Os, but practically they can reach
about 5000 I/Os to 7000 I/Os maximum.
 12Gb/s SAS MegaRAID Controllers have the maximum OIO set to approximately 920.
 12Gb/s SAS iMR controllers have the maximum OIO set to approximately 234.

- 22 -
October 2014 Queue Depth and Expected Performance
Each controller generation might have a different maximum OIO setting, dependent on the design considerations at
the design time. When the adapter hits its OIO limit, there is no additional benefit in queuing more outstanding I/Os
from the benchmarking tool.
As the graph indicates, IT/IR controller have the highest OIO support so they scale up even after QD is greater than 32.
MegaRAID controllers show the next highest OIO support, however the MegaRAID-Windows driver limits the Qd per
physical drive to 32.
NOTE Each operating system driver might have slightly different algorithms
as to how the Max Adapter Outstanding IO is divided amongst the
available disks.
iMR controllers show the lowest maximum OIO support because of resource limitations. And so, the scaling is lowest
among the controllers.
Figure 8 JBOD Write 8 SAS SSDs Throughput of All RAID Types for Sequential, Random, and OLTP Workloads
-
-
-
)4AND)2TODEVICE
-
$RIVER,IMITED0$TO3!3DEVICEOR3!4!DEVICE
-
-2^DEVICE
-
7INDOWS$RIVER,IMITED0$TODEVICE
-
4HROUGHPUT)/0S
+
+
I-2^DEVICE
+
+
+
+
+
+
+
+
1$ 1$ 1$ 1$ 1$ 1$ 1$ 1$ 1$
+327ORKLOADS
)4?0HASE -2?3!3?
)2?0HASE I-2?3!3?
NOTE The previous figure highlights a 12Gb/s SAS SSD under Windows 2008
R2 SP1, which is among the first devices to provide additional
performance benefits beyond 32 outstanding I/Os.

- 23 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 3: Build Your Test Setup
October 2014 Host System Considerations
Chapter 3: Build Your Test Setup

Preparing your test setup for performance benchmarking presents challenges. Many variables can affect
performance, but keeping variables known and constant helps provide a reliable and repeatable measurement. This
chapter helps with parameters that are not expected to change between different tests of the performance test
project. For example, a performance test where the goal is to measure R0, R5, and R6 performance on 8 SAS HDDs,
directly attached to a 12Gb/s SAS controller. In this example, an 8 drive direct-attached SAS topology is a fixed
configuration for all R0, R5, and R6 tests.
This chapter reviews set-up related parameters, such as the following:
 Host system considerations
 Storage topology
 Storage components
3.1 Host System Considerations
Many host-specific factors can affect the performance, which include (but are not limited to) the following factors:
 Processor Architecture
— Processer organization and architecture
— Processor count
— Processor generation
— Number of cores
— Hyperthreading status
— Processor clock speed
— Chipset
 Memory
— Memory type
— Memory speed
— Memory configuration
 PCIe slot
— Link speed
— Link width
— Location relative to a processor (on multiprocessor architecture)
 BIOS settings
The following sections discuss these factors in detail.
3.1.1 Processor Architecture and Core Organization
Any CPU, memory, bridge, and PCI slot organization affect the system efficiency. Newer multiprocessor systems with
an architecture where the memory and the PCI slots connect directly to the CPUs, such as shown in the following
figure, can perform much better than older system architectures, such as in Figure 10.

- 24 -
Figure 9 Series 9 SMC

0 (
0 (
0 $
0 '
0 $
0 '
0 # $$2
0 &

0 # $$2
$$2 0 &
0 "
$$2

$$2 0 %
0 " (6HULHV 10)'B3 (6HULHV

$$2

$$2 0 %
0 ! &38 0 0 &38

$$2

$$2 3."#/2% 3."#/2%

0 !
$$2
$$2 $$2 10)'B3 $$2

$$2

$$2 0 0

$$2

$$2

$$2 ! " " $-) ! $-)

3,/4 3,/4
0#) %X' 0#) %X'
0#) %X 0#) %X
3,/4 3,/4
0#) %X' 0#) %X' 0#) %X'
0#) %X 0#) %X
3,/4 $-) 3,/4

0#) %X' 0#) %X'
'"S
0#) %X 0#) %X
10)
'"S
,!. 0#) %8

3#5
$-) 3#5 0ORTSTO
,!.% 3#5
3#5^
X$2 &/NLY 0ORTSTO
,!.%
3&+&&
,!.%
'B3
3!4!
0#)
"-# 'B3 TO
6'! &OR0ORTS
53"
#/- #/-
(EADER (EADER 53"
TO
4YPE !
2EAR
&RONT
30)
?

- 25 -
Figure 10 Series 7 SMC

62- ,'!?02/#%33/2 #+#,+
!$$2 #42, $!4!
&3"-(Z
!$$2 #42, $!4! 83!30/243
0#)EX
,3)3!3%
$$2
$)--?#(!#(% )NTEL-#( 0#)EX
0#)EX3,/4
$-)
.# 3)
3 !4!
3!4!0/243
0#)EX ',!.
2*
53" ,
53"0/243?^ )#(
0#)EX
',!.
2*
0#) ,
0#)?3,/4
30) 0#)
30)&,!3(-B
,0# 70#-2
,0#
2-))
7$(' 24,.
6'!0/24
,0#)/ 0(983" &
3%2 03 2*

&$$
3%2 +"-3 83" &
?
Performance measurement in older system architectures might not yield the maximum results that Avago publishes.
Older systems are limited by memory, CPU clock, chipset, and mezzanine bus speed, which can all reduce maximum
observed performance.
Processor Choice
In addition to the processor I/O architecture, the choice of each processor affects performance. Systems that
use Intel Xeon® E3 (or larger) or 4th Generation Intel Core processors do not need chipset components for
PCIe attachment because the chipset is part of the CPU. Avago recommends that LSISAS3008 and
LSISAS3108 Enterprise class storage controller performance testing is done on systems with Intel
Enterprise processors.

- 26 -
Avago performance measurements and performance targets are based on the latest host computer
components, used in 2-socket systems based on Intel processors. Use of any different host computer system
likely results in lower measured performance.
Number of Processors and Cores
The more total cores the better the performance can be, because the I/O load is shared across different
processor cores.
Hyper-Threading Technology
Makes a physical processor appear as if it has more cores. For example, a processor with 16 physical cores
might appear as 32 logical cores to the operating system. This is referred as Hyper-Threading Technology
(HTT). Usually implementing HTT gives better performance.
Process Affinity
The affinity of a certain process to run on a specific processor in a multiprocessor environment. If the
processes (applications) are not spread across the processor cores in a balanced manner, performance is
affected because certain cores might be overloaded and others might be unused. Therefore, it is important to
manage the affinity and spread the load evenly across the cores. Environments that do not manage this
affinity can reduce the performance, especially the IOPs of small I/Os.
Microsoft® Windows Server®, by default, does a good job of managing process to core assignments without
user intervention. Linux distributions require you to explicitly assign processes to cores for optimal load
balancing.
CPU Clock
The CPU clock can affect performance. Higher clock rates improve system throughput, especially 8-KB I/Os
and smaller. A higher CPU clock speed improves latency as well.
For testing Avago SAS 12 Gb/s controllers or newer, use CPUs with a clock of 2.6 GHz or higher. Performance
measurements with slower CPUs result in lower than optimal IOPs. Use enterprise computers for
performance measurement of enterprise class storage controllers. Using older desktop or workstations for
performance measurement yields results lower than expected.
3.1.2 Memory
In addition to system architecture, the memory size, type, speed, bus width, and population affect system
performance. Servers might have limitations in all of these areas. The server manufacturer provides a User's Guide and
possibly other documents to provide specifications and guidance on selection and population. Refer to such
documents to indentify the configurations that best suit your case and provide the best performance.
Additional limitations might exist beyond the number of slots, the maximum size for each DIMM, total size, and the
maximum speed that the system supports. The amount of memory and speed that a particular system supports can
vary depending on the population. Multi-channel support can increase performance if populated correctly. For
example, 12 DIMM slots (4 channels of 3 slots each) might be available. If a single DIMM is populated for each channel,
in the recommended slots, performance might increase. However, if more than one slot for each channel is used, the
bus speed and performance can decrease.
3.1.3 PCIe Slot Choice
The PCI slot location, PCIe link speed, link width, and relative location to a processor can affect performance, as
described in the list that follows. The slot location and number of devices attached to the same processor or bridge
can reduce throughput.

- 27 -
NOTE For Linux you can use the lspci inbuilt command to find the controller’s
bus, device, or function number. For a Windows operating system, use
the lspci available at http://eternallybored.org/misc/pciutils/.
PCIe Link Width
Performance is linearly proportional to the link width of the PCIe slot. For example, with a x8 link you can
achieve twice the performance of a x4 link. On motherboards the physical connector width of a PCIe slot
might be larger than the electrical connection. Be wary of such PCIe connectors and use the PCIe bus that
matches the maximum links that the storage controller supports. Before you run performance tests, confirm
that the negotiated link widths are as expected (x8 with PCIe x8 capable slot, x4 with PCIe x4 capable slot,
and so on).
PCIe Link Speed
PCIe slots with different link speed might be available on any motherboard. Choose a PCIe bus designed for
highest link speed for better performance. Before you run performance tests, confirm that the negotiated link
rates are as expected (8 Gb/s with PCIe 8 Gb/s capable slot, 5 Gb/s with PCIe 5Gb/s slot, and son on).
Actual Link Speed versus Negotiated Link Speed
In an actual system, additional factors can cause the actual negotiated speed and width to be lower than the
maximum supported by the slots or the storage controller. Therefore, you must verify the negotiated speed
and width. Read the PCIe Configuration Space to verify the capabilities and currently negotiated speed and
width.
NOTE You can use tools such as lsiutil, Scrutiny, lspci, or MegaCLI to read the
PCIe Configuration space.
PCIe Slot Location Relative to a Processor
The PCIe slot location relative to a processor in a multiprocessor environment can affect performance. See
Figure 9 and Figure 10. Consider a case where the benchmark application runs on CPU1, either by operating
system allocation or as forced by affinity settings.
 If the storage controller is placed on PCIe slots 1 through 3 (native to CPU1), the system can give higher
performance.
 If the storage controller is placed on PCIe slots 4 through 6 (native to CPU2), the system gives lower
performance.
These differences occur because of the additional latency caused by the access over the QPI bus between the
CPUs, which does not occur if the memory and PCIe slot are native to the CPU.
NOTE When you use an unfamiliar server, test each PCIe slot to find the slot
that gives the highest performance. All slots do not yield the same
data throughput.
For Better Performance Results
 Run the CPU at the maximum clock supported.
 Use a CPU that provides a higher number of cores.
 Use HTT.
 Manage affinity to spread the load across the cores if the operating system does not automatically
manage affinity.
 Use a PCIe slot that gives the best performance compared to all the other slots available.
3.1.4 Non Uniform Memory Architecture
Non Uniform Memory Access (NUMA) is a feature useful for multiprocessing, where a CPU can access the memory of
the other CPU. A process will reside on a memory local to the CPU or on a memory non-local to the CPU. Depending

- 28 -
October 2014 Storage Components and Performance
on where the data resides, the performance varies. Local memory accesses are faster, so performance is higher.
However, accessing non-local memory adds additional overheads due to the need to go over the QPI, interprocessor
bus. This additional overhead increases the latency and reduces the performance. NUMA is proven to help
multiprocessing and handling processes across CPUs, however the benefits are limited to particular workloads only.
You can extend the PCIe slot location example in the previous section to Non Uniform Memory Architecture (NUMA).
A process might run on CPU1, but the process memory is on the DDR3 native to the other CPU (CPU2). In this case,
CPU1 faces additional latency because of the QPI bus access that is not present if the process used the memory native
to CPU1.
3.1.5 BIOS Options
System BIOS can provide configurable options that can affect performance. The following settings might be
configurable in your system BIOS. Set them as follows for best performance:
 Choose high performance, rather than energy saving or Balanced options.
 Increase fan settings to run cooler.
 Enable hyperthreading to increase processor capabilities.
 Set any controllable PCIe slot width and speed options to the maximum setting.
 Set QPI speed to maximum.
 Set Maximum Read Request to Auto, or the largest possible value.
 Set Maximum Payload to Auto, 256 bytes, or larger.
NOTE The maximum payload size of the host system depends on the chipset.
Setting the payload size to a maximum supported value provides
maximum performance. Lower values cause higher overheads for
each I/O and so affects performance.
3.2 Storage Components and Performance
Initiators, expanders, and targets are the major components that make up the storage systems. The following sections
discuss each component and their impact on performance. The following three basic elements comprise any storage
topology:
 Initiator
 Expander
 Target
Initiator
Initiators include host bus adapters that might be an I/O controller or RAID-on-controller and that might be
on a motherboard or on an HBA card that fits on any PCIe slot on a motherboard.
Expander
Expanders can be used as a simple JBOD or with a multitude of functions such as self-configuring, SCSI
enclosure services (SES), zoning, and DataBolt. SAS switches made of multiple expanders might also be
present.
Target
Targets can be any number of SAS or SATA drives that are HDD, SSD, or an HBA in target mode.

- 29 -
3.2.1 Initiators and Performance
Avago storage controller operation modes are divided into the following major modes:
 Initiator Target (IT)
 Integrated RAID (IR)
 MegaRAID
 Integrated MegaRAID (iMR)
Initiator Target (IT)
IT mode allows the controller to support only the raw JBOD mode and does not allow any RAID capabilities. IT
controllers let the controller be in target mode as well, explained later in a separate subsection.
Integrated RAID (IR)
IR mode allows the controller to support basic RAID modes such as R0, R1 and R10. However, the firmware
integrates the RAID operations, as opposed to the hardware.
NOTE IR mode is defeatured and superseded by iMR from 12Gb/s SAS,

therefore this document does not discuss IR. Further references to
RAID assumes MegaRAID.
MegaRAID
MegaRAID mode uses the MegaRAID firmware stack, the hardware RAID modules, and has DDR caching
features. These controllers provide the best RAID capabilities compared to other modes. The RAID
performance of this mode is the highest among all these modes.
Integrated MegaRAID (iMR)
iMegaRAID mode, commonly referred to as iMR, uses the MegaRAID stack, however the firmware implements
the RAID functions instead of using the hardware RAID modules. The performance is significantly lower
compared to the MegaRAID mode.
The RAID (IR/iMR/MegaRAID) modes allow JBOD options. However, minor differences in performance might exist
compared to the JBOD performance of RAID controllers versus IT controllers because of programming differences in
the firmware and drivers. Avago controllers support the following configurations:
 JBOD
 RAID0
 RAID1
 RAID10
 RAID5
 RAID6
 RAID50
 RAID60
3.2.1.1 Initiator Features that Affect Performance

The following sections review initiator features that affect performance.
Interrupt Coalescing
Interrupt coalescing allows more than one interrupt to be coalesced together before raising the interrupts to
the CPU. This option decreases the interrupts to the host CPU per I/O, which improves the maximum IOPs for
small size I/Os. For example, if the interrupt coalescing depth is set to 10, the host CPU is interrupted only
once every 10 I/Os.

- 30 -
I/O Coalescing
I/O coalescing allows smaller I/Os to be grouped and processed together. This feature improves the
throughput of the small size I/Os as compared to handling the I/Os individually. This feature is useful for RAID
operations. For JBOD this feature is not used because the I/Os can be handled faster with the FastPath feature,
if the hardware supports FastPath. See the following FastPath section for more information.
Maximum I/O Size
The maximum storage controller capability might limit the maximum I/O size. Performance measurements
usually use 0.5-K to 4-M I/O sizes, however the controllers might not be able to natively support up to 4-M I/O.
For example, the MegaRAID controller limit on maximum I/O size limit is 252-KB. Larger I/Os are split into
multiple I/Os and processed. This strategy can impact performance because it effectively reduces the number
of outstanding I/Os per drive. Additional overheads might result because of the split and join operations.
FastPath
Storage controllers might support hardware FastPath, where a hardware I/O accelerator handles I/Os without
firmware involvement. FastPath helps improve performance for JBODs and certain RAID configurations. RAID
configurations that require parity calculations and RAID volumes that use cache for their reads or writes
require firmware involvement and cannot use FastPath. Use FastPath whenever your application permits.
Multipath and Drive Listing
IT and IR controllers do not support a multipath topology. MegaRAID controllers do support multipath. When
drives are used in a multipath topology, each drive is listed twice with IT/IR controllers and the Enclosure slot
mapping set in controller NVDATA decides how the Target ID for the drives are assigned. MegaRAID
controllers expose only one path and MegaRAID balances the drives across multiple ports when both ports
of the controller are used, such that both controller ports can perform at maximum performance.
NOTE The MegaRAID controller allows multiple paths but only one path is
used at any time for the I/Os, not both paths at the same time. Current
designs do not support active-active I/Os on both paths.
Using both ports of a drive might help for SSDs, but not for HDDs. On
HDDs the data must go to a single media, unlike the SSDs. SSDs can
write the data in parallel so using multiple ports can scale
the performance.
DMA Engines – Single Context versus Dual Context
Storage controllers use DMA engines as part of their I/O processing and configuration of these DMA engines
can significantly impact performance. Avago 12Gb/s SAS controllers have eight Tx DMAs. A DMA has two
modes, single context and dual context. Different configurations perform differently with these modes. Use
the EDFBFlags[6:5] in Manufacturing Page 30 of NVDATA to control these modes. Tune these fields to a mode
that best suits your application.
For direct attached devices, single context mode might suffice, but dual context mode suits best for
12Gb/s SAS expanders with DataBolt enabled:
 The IT controller sets all 8 DMA engines to dual context when 12 Gb/s SAS expanders are detected in the
topology; otherwise the controller sets all 8 DMA engines to single context mode.
 MegaRAID sets four DMA engines in single context mode and four in dual context mode, by default.
When 24 or more devices are attached, MegaRAID sets all eight Tx dma engines to dual context mode.
NOTE If the DMA context is not set correctly you might see issues such as the
performance not scaling with an odd number of drives, but scaling
with an even number of drives, or vice versa.

- 31 -
I/O Size Tuning for Expander Buffering Solutions

12Gb/s SAS storage controllers offer other tunable parameters to improve the buffering solutions that
12Gb/s SAS expanders provide. The parameters, located in Manufacturing Page 30 of the controllers
NVDATA, include:
 EDFBMaxGroupUnload
 EDFBThresholdSAS, and EDFBThresholdSATA
EDFBMaxGroupUnload
Specifies the maximum number of entries that a specific DMA Group unloads to the DMA engines before
moving to another DMA Group. A value of 0 in this field will use the hardware default value. From phase 4 of
12Gb/s SAS firmware, this field is set to 4. Four is a recommended value.
EDFBThresholdSAS and EDFBThresholdSATA
Specifies the maximum number of Data Frames (SAS and SATA respectively) that should transmit during
EDFB before switching to an alternate context. A value of 0x00 indicates the firmware should program the
setting based on any values returned by located the EDFB expanders. If multiple EDFB expanders return
differing values, the firmware uses the lowest value found.
 For Avago 12Gb/s SAS expanders, set these parameters to 0 and the controller dynamically assigns its
value by using vendor-specific SMP commands to the Avago12Gb/s SAS expanders.
 With non-Avago expanders, set these fields based on the buffer size.
3.2.2 Expanders and Performance
3.2.2.1 Expanders and Latency

Expander attached topologies incur additional latency at each expander level because of arbitration and additional
I/O connection time. Arbitration can become a large component of performance impact, especially in deeply
cascaded topologies because each additional expander in the cascade adds to the time required to establish
a connection.
 In 12Gb/s SAS expanders, the arbitration process can take as little as 161.33 nanoseconds to a maximum
401.66 nanoseconds.
 The 12Gb/s SAS expanders connect at 53.33 nanoseconds for each expander in the topology.
The times currently cited are in the nanosecond range. A nanosecond range is not large in comparison to the time for
drives to process I/O and return responses, but you must still understand the impact expanders can have on the
overall storage fabric and the impact each additional expander can add.
3.2.2.2 DataBolt Technology

12Gb/s SAS expanders support DataBolt® technology, or buffering (previously called end device frame buffering
(EDFB)). The DataBolt technology allows 3Gb/s and 6Gb/s SAS drives to transfer the data at up to 6Gb/s and 12Gb/s
SAS rates respectively, with the use of a buffering module on the expander phys. DataBolt technology removes rate
matching, so the performance per drive nearly doubles.
In expander attached configurations, when Databolt is disabled, the negotiated link rate for a 6 Gb/s drive is 6 Gb/s.
When Databolt is enabled, the negotiated link rate for a 6 Gb/s drive is 12 Gb/s. The Databolt feature does not affect
drives that support 12 Gb/s speeds. Refer to LSI DataBolt Bandwidth Aggregation Technology: 12Gb/s SAS Performance
Test Results White Paper for more information.
The following concepts pertain to Databolt technology:
Rate Matching
Rate matching permits faster communication channels to run traffic to slower devices by slowing down the
faster channel with delete-able ALIGN primitives. As these primitives route through expanders, the primitives
are removed and the data is then spaced at the same rate as expected by the slower device. The problem

- 32 -
with rate matching is that, during communication to slower devices, the faster channel yields 50 percent to
75 percent less throughput during the connection.
DataBolt Technology
DataBolt technology can permit communication between channels that operate at different link rates
without using rate matching. DataBolt uses two 24-KB buffers dedicated to inbound and outbound
transactions, which permit read and write commands to be serviced at the same time in a nonblocking
memory fashion. This action is transparent to the attached devices which makes for seamless integration into
SAS domains. The DataBolt technology is T10 compliant.
HDD versus SSD
DataBolt technology functions with HDDs and SSDs. Consider how the performance characteristics of each
target device might impact DataBolt performance. The expander manufacturing page exposes several
tuning parameters that can impact the performance measured through the expander. The exact value of the
tuning parameters used for optimal performance depends on the implemented target devices. It is not
recommended to modify the default tuning parameters in the manufacturing page without extensive
testing. These values were tested with both HDDs and SSDs independently and found to be sufficient for
both drive types.
Enable DataBolt
The LSISAS3x36 or LSISAS3x48 expanders do not enable the DataBolt feature by default. Use the expander
manufacturing page 0xFF15 to manually enable the DataBolt feature. You must modify the XML file included in the
expander firmware package, build a new manufacturing page, and upload the page to the expander. Refer to the
12Gb/s SAS/SATA Expander Firmware Configuration Programming Guide. The following steps describe the
general process:
1. Create a separate copy of the default sas3xMfg.xml file. The exact file name depends on the expander version.
For example, the manufacturing page XML file for an evaluation LSISAS3x48 expander is
named sas3xMfgEval.xml.
2. Modify the XML file using the following changes:
— Set EDFBEnable to 00000001
— Set EDFBPhyEnablesLow to FFFFFFFF, to enable EDFB on PHY 0 to PHY 31
— Set EDFBPhyEnablesHigh to FF, to enables DataBolt on PHY 32 to PHY 39
— PhyMaskLow in EDFBPerfSettings is set to FFFFFFFF (enables DataBolt performance tuning on PHY 0 to
PHY 31)
— PhyMaskLow in EDFBPerfSettings is set to 000000FF (enables DataBolt performance tunings on PHY 32 to
PHY 39)
3. Save the XML file.
4. Refer to LSI 12Gb/s Expander Tools (Xtools) User Guide to build (use g3xmfg) and upload (use g3xutil) the
manufacturing page.
5. Reset the expander for the new changes to take effect.
6. Use the edfbinfo command to verify the DataBolt status. Refer to the LSI 12Gb/s SAS/SATA Expander SDK
Programming Guide for more information.
3.2.3 Storage Drives and Performance
The storage targets can be any number of SAS or SATA drives of type HDD or SSD, or a controller in target mode (see
Section 3.2.4, Target-Mode Controllers and Performance for information about controllers in target mode). This
section discusses different features of these drives and the effect on performance.
SSD versus HDD
 Solid state drives (SSD) perform well especially with random I/Os, as there are no rotational parts.

- 33 -
 Brand new SSDs show very high performance compared to used SSDs because the performance of an SSD
depends on what was previously written. You must precondition the SSDs for their performance to be repeatable.
HDDs do not need any such preconditioning. See Section 3.2.5, SSD Preconditioning for preconditioning
information.
 HDDs perform better with disk write cache enabled, whereas SSDs do not gain much by using disk write cache.
 HDD performance varies based on where most data is located. Performance improves if the data is at the outer
sectors (short stroking), and is lower if the data is at the inner sectors. SSD performance is homogeneous because
the drives have zero seek time.
SAS versus SATA
 SAS provides better enterprise features than SATA.
 SAS drives usually perform better than SATA.
 SATA drives usually are of larger density and of slow rotational speed, and therefore perform slower than SAS.
 SATA performance is lower when attached to expanders, as additional translations occur because of the SATA
tunneling protocol (STP) that is not present on native SATA transfers.
SAS Nearline
 Nearline SAS drives have the benefit of larger density that comes with the SATA drives and the reliable interface
performance that comes with SAS.
 Rotational speed of SAS nearline drives is same as SATA drives.
 Performance of SAS nearline drives stands between native SAS drives and SATA drives.
Link Speed
The faster the drive interface link rate, the better the performance. For example, 6Gb/s SAS drives perform better than
3Gb/s SAS drives.
Rotational Speed
The higher the rotational speed, the better the performance. For example, 15,000 RPM drives perform better than
7,000 RPM or 10,000 RPM drives.
3.2.4 Target-Mode Controllers and Performance
IT controllers can act as a SAS target and receive commands from SAS initiators through the SAS connection. To
complete read/write operations, a target-mode controller copies data between the host memory and the initiator by
using SAS. Because SAS is used, the target performance increases.
A target that performs better than the available SAS/SATA drives helps with loop back testing and other useful
performance tests. For example, an LSISAS3008 controller in target mode provides up to 300,000 IOPs for small I/Os
and up to 5600 MB/s for large I/Os, which is extremely high compared to any drive targets.
3.2.5 SSD Preconditioning
SSD performance and latency at an instant depends on what is written to the flash prior to that instant. It is important
to precondition the SSDs, that is, run I/Os to storage until a steady state is reached. If SSDs are not preconditioned, you
might see inconsistent performance results when the SSD enters its maintenance mode to erase used sectors so the
sectors can be rewritten. This process is known as garbage collection. Performance is not faster after preconditioning.
Preconditioning is designed to have benchmarks measure the steady state (slower) performance instead of the initial
(faster) performance.
The unique features of SSDs create distinct requirements to accurately measuring SSD performance. The SNIA Solid
State Storage Initiative’s Performance Test Specification (PTS 1.1) clearly defines steady state performance. Avago uses

- 34 -
the SNIA definition as a benchmarking guideline. For more information on the SNIA benchmarking standardization
access the document at the following link:http://snia.org/sites/default/files/SSS%20PTS%20Client%20-%20v1.1.pdf
Avago emphasizes the SSD benchmarking goal as repeatable, accurate, consistent, and representative results. The
following sections describe two methods to accurately measure SSD performance. The first method uses the SNIA
methodology and generates the most precise and accurate benchmarking results, but can take considerable time to
execute. The second method provides equitably accurate results, but takes significantly less time to complete. The
benchmarking tool used is irrelevant.
3.2.5.1 SNIA SSD Preconditioning

1. Purge the drive.
Prior to any artificial benchmarking, put the drive into a known state that emulates the state as received from the
manufacturer. This state is typically called the fresh-out-of-box (FOB) state. Most devices support a secure erase
command and others might have a proprietary method to put the drives into a known state.
2. Write the entire user capacity of the device twice with 128-KB sequential writes aligned on 4-KB boundaries. Set
the queue depth to the highest value supported by the device.
Avago generally uses 256 for RAID virtual devices. You can use a smaller value for HBA testing.
The amount of time it takes to write to the device twice depends on the capacity and the performance.You can
estimate this value by multiplying the capacity of the device in MB by 2, then dividing that number by the
steady-state Megabytes per second obtained by the benchmark tool. The result is the number of seconds to write
to the entire device. For added security, you can add additional time.
3. Run the desired data point until steady state is achieved.
You can determine steady state using two methods: data excursion and slope excursion.
Data Excursion
Variation of y within the measurement window is within 20% of the average (Max(y) –Min(y) <= Average(y) ).
Slope Excursions
A linear curve fit of the data within the measurement window is within 10% of the average within the
measurement window.
The measurement window is the anticipated area of steady state. You can often times determine the measurement
window by simple observations.
4. Collect data immediately after you reach steady state.
Idle time can significantly change the performance numbers. Collect performance statistics for a long enough
time to assure precise averages. One to 5 minutes is generally sufficient depending on performance
characteristics.
3.2.5.2 Alternative SSD Preconditioning

The alternative preconditioning method recognizes that testing all data points and tests (I/O sizes, queue depth,
read:write mixtures) can number in the thousands, making it impractical to follow the SNIA-defined test flow. For that
reason, Avago defined a shorter testing method that still provides repeatable and precise steady state performance
with minimal overhead of preconditioning. This method helps reduce excessive preconditioning while maintaining
accurate and consistent results run over run. Very large capacity or cMLC drives might require additional
preconditioning time and data points should be verified using the SNIA method to ensure absolute steady state.
1. Purge the drive.
Prior to any artificial benchmarking, put the drive into a known state that emulates the state as received from the
manufacturer. This state is typically called the fresh-out-of-box (FOB) state. Most devices support a secure erase
command and others might have a proprietary method to put the drives into a known state.
2. Write the entire user capacity of the device twice with 128-KB sequential writes aligned on 4-KB boundaries. Set
the queue depth to the highest value supported by the device.
Avago generally uses 256 for RAID virtual devices. You can use a smaller value for HBA testing.

- 35 -
October 2014 Storage Topology
The amount of time it takes to write to the device twice depends on the capacity and the performance.You can
estimate this value by multiplying the capacity of the device in MB by 2, then dividing that number by the
steady-state Megabytes per second obtained by the benchmark tool. The result is the number of seconds to write
to the entire device. For added security, you can add additional time.
3. Run 1-MB sequential writes at queue depth 256, for 2 hours.
4. Run all sequential I/O patterns in the following pattern:
a. All writes.
Run small to large I/O sizes, and low to high queue depths. That is, run all I/Os at the smallest Qd first, and run
all I/Os at the largest Qd last.
b. All reads.
Run small to large I/O sizes, and low to high queue depths.
5. Run 4-KB random writes at queue depth 256, for 4 hours.

6. Run all random I/O patterns in the following pattern:
a. All writes.
b. All reads.
3.3 Storage Topology
The number and type of initiators, expanders, and targets present in a topology might vary, but all storage topologies
can be categorized to one of the following:
 Direct attached
 Expander attached - Single
 Expander attached - Cascade
 Expander attached - Tree
 Multipath topology
3.3.1 Direct Attached Topology
A direct attached topology has no expander, so the number of drives is limited by the number of phys available on the
initiator. Usually 8 or 16 drives are direct attached to a storage controller.
Figure 11 Direct Attached Topology Example

3!33!4!
$RIVE
)/#ONTROLLER
3!33!4!
$RIVE
If a single expander can support the number of drives you need, with sufficient ports available for the controller, use
that configuration. For example, consider the following setup examples:
1. Single SAS3x48 expander with x8 wide ports to a controller and 40 drives
2. Two SAS3x48 expanders with two x4 wide ports to a controller and 40 drives

- 36 -
1.The first topology would give a better performance over all, compared to the second one.
Direct Attached Topology Considerations
 Simple configuration.
 No added latency that might occur with expanders.
 Might be suitable to check the storage controller maximum IOPs performance and basic latency characteristics
 Not suitable for checking the maximum bandwidth (MBPS) of a storage controller, because the MBPS is usually
drive limited.
3.3.2 Expander Attached Topology - Single
The expander attached topology - single has only one expander and is common for applications similar to Just a
Bunch of Disks (JBOD). The number of phys on the expander limits the number of drives, excluding the expander phys,
that connect to the initiators.
Figure 12 Single Expander Topology Example

3!33!4!
$RIVE

)NITIATOR $ %XPANDER $
3!33!4!
$RIVE
$n$IRECT2OUTING4YPE
Expander Attached Topology - Single Considerations

 Relatively simple topology.
 Suitable for checking both maximum MBPS and IOPs of a storage controller. However, the topology can still be
drive limited for the MBPS if the drives perform low.
 Expanders allow bandwidth aggregation. Adding a x8 link to an initiator, instead of x4, can reach up to twice the
x4 link performance.
 The latency is higher than a direct attached topology, but lower than multiple expander topologies.
 Same expander chips can be configured differently (in terms of routing and other phy/connection attributes) so
the performance can vary between two platforms that use the same expander chip.
3.3.3 Expander Attached Topology - Cascade
The expander attached topology - cascade has two or more expanders connected in series (cascade) fashion. This
topology is used when more drives than a single expander can support are needed. Edge expander set is an example of
this topology. If n expanders are present, a maximum n + 1 hops are present between the controller and expander.

- 37 -
Figure 13 Cascaded Expander Topology Example

%XPANDER$EVICE3ET

)NITIATOR $
3!33!4!
%XPANDER $
$RIVES

)NITIATOR $ 4

3!33!4!
%XPANDER $
$RIVES
4

3!33!4!
%XPANDER $
$RIVES
2OUTING4YPES 4
$n$IRECT

4n4ABLE
3n3UBTRACTIVE
Expander Attached Topology - Cascade Considerations

 Relatively complex, but suitable to scale the topology with more drives.
 Suitable for checking both the maximum MBPS and IOPs of one or more storage controllers.
 Suitable for measuring the expander’s capability to route the I/Os under different use cases.
 Use x4 or x8 connections between expanders and between a controller and expander. You must maintain the link
width uniform throughout the cascade. One link smaller introduces a bottleneck.
 Latency increases for devices farther in the cascade compared to devices near the controller. The drives at the
farther end must win the arbitration at each expander level. The drives directly attached to the expanders at each
level have the higher probability of winning the arbitration. The more levels (or hops), the higher the latency.
Different arbitration schemes than the default schemes might reduce this impact.
 Less fault tolerant. If one expander fails, the whole storage behind the topology is unreachable.
 Same expander chips can be configured differently (in terms of routing and other phy/connection attributes) so
the performance can vary between two platforms that use the same expander chip.
3.3.4 Expander Attached Topology - Tree
The expander attached topology - tree has two or more expanders connected in a tree branching fashion to reduce
the maximum number of hops between controller and expanders. The number of phys on the controller limits the
number of expanders that you can connect in parallel.

- 38 -
Figure 14 Tree Expander Topology Example

2OUTING4YPES
$n$IRECT
3!33!4!
4n4ABLE )NITIATOR %XPANDER $
3n3UBTRACTIVE $RIVES
4 4
3 3
3!33!4! 3!33!4!
)NITIATOR %XPANDER $ %XPANDER $
$RIVES $RIVES
4 4 4
3 3 3
3!33!4! 3!33!4! 3!33!4!

%XPANDER $ %XPANDER $ %XPANDER $ $RIVES
$RIVES $RIVES
Expander Attached Topology - Tree Considerations

 Relatively complex topology, but suitable for scaling the topology with more drives.
 Suitable for checking both maximum MBPS and IOPS of one or more storage controllers.
 Suitable for measuring the expander’s capability to route the I/Os under different use cases.
 Latency is lesser than in the cascade configuration.
 Improved fault tolerance than in cascade configuration. If one expander fails, the storage on other branches
might still be available depending on the topology.
3.3.5 Multipath Topology
The Multipath topology is a more complex variation of cascade and tree topologies. This topology usually uses two or
more initiators and allows multiple paths to each of the drives from multiple initiators so the Availability is higher than
other topologies. The multipath topology requires SAS drives because they are dual ported, unlike SATA drives. The
following figure illustrates the multipath in a simple manner. In a more complex example, multiple expanders could
replace each single expander, in either a cascade or tree fashion, to allow many more drives to connect and each
would still have a path to both sides

- 39 -
Figure 15 Path Redundancy Application Example
$OMAIN $OMAIN
)NITIATOR!

$ $
$UAL 0ORT3!3
%XPANDER $ $ %XPANDER
$RIVES
$ $

)NITIATOR"
2OUTING4YPES
$n$IRECT
4n4ABLE
3n3UBTRACTIVE
Multipath Topology Considerations

 Very complex and suitable for large topologies and external storage enclosures.
 Highest fault tolerance than other topologies.
 Suitable for checking both maximum MBPS and IOPs of one or more storage controllers in many different
use cases.
 Suitable for measuring the expander capability under different use cases.
 Adds multiple variables to the performance. For the performance to be deterministic you must know the status of
all the devices. A minor glitch can affect the performance on a large scale.
 Hard to reproduce and debug performance issues.
3.3.6 Topology Guidelines for Better Performance
 Choose a topology that best suits your need, keeping in mind performance and latency.
 Make sure to use correct cables and that no signal integrity (SI) issues exist on these cables or connectors.
Otherwise, long debug times might occur.
 Make sure your drives and expanders are detected properly, using any tool of your preference (example tools
include MSM, Scrutiny, StorCLI/MegaCLI, sg_utils, device manager listing, storlibTest, or lsiutil).
 Make sure the link width between expanders, and between expanders and controllers, is wide enough
throughout the topology so you do not add additional bottlenecks to your application.
 Use the Databolt technology with 3Gb/s and 6Gb/s SAS drives. Avoid using the phys with the Databolt feature to
connect to initiators and expanders, so more DataBolt-capable phys are available for drives.
NOTE PHYS [40:47] of the SAS3x48 expander do not offer DataBolt capability.
 Excess SES polling can interfere with arbitration and indirectly affect the performance. Make sure your SES polling
intervals are not too short to cause such interaction.

- 40 -
 Before you run performance tests, confirm that all connected links are up and that the negotiated link rates are as
expected (12 Gb/s with SAS 12Gb/s drives or expander, 6 Gb/s with SAS 6Gb/s drives or expander, and so on). You
can use the following tools to view SAS link information: lsigetwin utlility, lsigetlin Utility, LSIUTIL, or Scrutiny.

- 41 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 4: Configure Your Test Parameters
October 2014 Operating System Environments
Chapter 4: Configure Your Test Parameters

This chapter discusses parameters that may change between your tests in the same performance testing project. For
example, volume configurations can change between tests. This chapter covers the following topics:
 Operating system environments
 Volume configurations
 Benchmarking and system monitoring tools
4.1 Operating System Environments
Operating systems behave differently when it comes to performance because each system’s default settings,
configurations, and mode of operations can differ.
4.1.1 Windows Operating System
The parameters discussed in this section might apply for all Windows operating systems; however, this section uses
the Windows 2008 R2 operating system as an example.
4.1.1.1 Windows Operating System Hotfixes

In general, disable operating system automatic updates. Apply only the hotfixes or updates suggested for Windows
operating systems that fix known performance issues. Assess the possible performance risk before you install any
hotfix or update. Installing Windows' hotfixes can alter your performance. For example, the Windows Server 2008 R2
hotfix KB2769701can exhibit high variability of small I/O throughput.
4.1.1.2 MSI-X Interrupt Vectors

By default, the Windows operating system usually does a good job using the available and supported MSI-X vectors
on storage controllers. The exact number of MSI-X vectors depends on the number of CPU cores. For better
performance, confirm that the total number of interrupt vectors assigned is greater than 16 and the interrupts are
balanced across all the CPU cores. Use the following steps to help confirm that the total number of interrupt vectors
assigned is greater than 16:
1. Navigate to Start > Control Panel > Device Manager.
2. Right-click the controller in the Storage Controllers device list.
3. Select Properties.
4. Click the Resources tab. The Resource settings box shows each IRQ entry paired with the number of interrupt
vectors currently assigned.
4.1.1.3 Process Affinity

Affinity refers to the nature of processes to run on a specific processor core on a multiprocessor system. You can force
any application to run on any specific processor cores by using the following steps:
1. Navigate to Windows Task Manager > Processes.
2. Right-click the process of interest. For example, Dynamo.exe if you run an IOmeter benchmark.
3. Select Set Affinity and choose the CPUs or Nodes on which you would like to run this process. You can select one
or more.
4.1.1.4 Driver Version and Customization

Use the latest recommended driver version of the storage controller for better performance results.

- 42 -
Avago storage controllers ship with driver default settings tuned for best performance. Under certain cases your
application or topology might require custom settings. For such needs, the Windows Driver Configuration Utility
(WDCFG) is provided with the driver package. You can customize different driver parameters and choose parameters
that best match the needs. This utility provides run-time control over various configuration parameters (registry
entries) that configure the Avago host storage drivers used on the Windows operating system. The driver package
contains the utility and user guide. It is recommended that you use WDCFG to make all changes, rather than manually,
because WDCFG provides a number of safety checks and protections, including a history stack that permits return to
prior settings.
NOTE Work with your FAE for any assistance with WDCFG and to choose the
settings most suitable for your application.
The following table provides information regarding the parameters that can affect performance.
Table 6 Customizable 6Gb/s SAS, 12Gb/s SAS and MegaRAID Driver Parameters
Stability
Driver Parameter Minimum Maximum Default Description
Impact
6Gb/s SAS, 12Gb/s SAS, and MegaRAID
NumberofRequest-M 10 9,999,999 — Low Number of request message buffers to allocate at SOD. The driver makes
essageBuffers sure that the value used is not greater than the IOC’s reported request
FIFO depth.
The default value is the number of firmware Global Credits that the
firmware issues to the host driver. Be extremely careful when you change
this parameter because it can cause starvation of I/O resources at lower
levels, adversely affecting storage system operation. Also be careful with
the maximum because if it exceeds the firmware credits, requests drop
on the floor.
6Gb/s SAS and 12Gb/s SAS Only
DisableFwQueue-Fu 0 1 0 None For SAS devices only.
llHandling If this registry entry is present and the value is non-zero, the firmware
does not handle Queue Full returns by the target and they return to the
host driver.
If this registry entry is not present, or is present with 0 value, the firmware
handles the Queue Full return.
MaxSASQueueDepth 1 254 64 None The maximum number of concurrent I/Os issued to a single target ID for
SAS devices. Setting this value too high can cause multiple Queue Full
returns back to the OS, which can cause Event 11 and Event 15 to appear
in the Windows Event Log.
A 0 value results in a queue depth of 1. Values greater than 254 are forced
to 20 by StorPort, therefore 254 is the maximum value.
Setting this parameter to anything larger than the maximum target
queue depth currently in-use by target devices causes Queue Full status
returns. The maximum target queue depth in-use by target devices is a
very nebulous number, which changes dynamically over time based on
the present mixture of I/O sizes, current workload, and resources
available in the target device. This parameter affects storage system
performance; storage system stability should not be impacted.

- 43 -
Table 6 Customizable 6Gb/s SAS, 12Gb/s SAS and MegaRAID Driver Parameters (Continued)
Stability
Driver Parameter Minimum Maximum Default Description
Impact
MaxSATAQueueDepth 1 254 32 None The maximum number of concurrent I/Os issued to a single target ID for
SATA devices. Setting this value too high can cause multiple Queue Full
returns back to the OS, which can cause Event 11 and Event 15 to appear
in the Windows Event log.
A 0 value results in a queue depth of 1. Values greater than 254 are forced
to 20 by StorPort, therefore 254 is the maximum value.
Setting this parameter to anything larger than the maximum target
queue depth currently in-use by target devices causes Queue Full status
returns. The maximum target queue depth in-use by target devices is a
very nebulous number, which changes dynamically over time based on
the present mixture of I/O sizes, current workload, and resources
available in the target device. This parameter affects storage system
performance, storage system stability should not be impacted.
MaxSGList 17 513 257 High Controls the maximum I/O size that the driver handles. Setting this
(64 KB) (2 MB) (1 MB) parameter to 33 provides a 128-KB I/O size, maximum. You can
significantly impact storage system performance by setting this
parameter, which can be tuned to optimize for specific I/O sizes. The
default setting (257) provides good performance when I/O demands vary
(normal situation).
MegaRAID Only
balancecount 1 4,294,967,295 — None The maximum number of I/Os sent to each disk in a RAID 1 volume
before switching to the other disk.
busywaitcount 1 4,294,967,295 — None The number of requests that the adapter must complete before it
resumes I/O requests to the miniport driver, in a Queue Full state.
coalescedepth 1 4,294,967,295 — None The maximum I/Os that the driver can coalesce.
coalescestart 2 4,294,967,295 — None Indicates the minimum number of I/Os before the driver starts
to coalesce.
fastpathoff 1 1 Not None Disable the Fastpath I/O algorithm for all I/Os. To enable FastPath, you
Useda must remove this parameter entirely.
limitsges 1 1 Not None If the parameter exists, the maximum number of SGEs for MPT frames
Useda is limited.
maxnumrequests 1 1024 — None Sets the maximum number of requests from the OS.
maxtransfersize 1 4,294,967,295 — None Sets the maximum number of bytes to transfer.
msiqueues 1 16 — None Sets the maximum number MSI queues.
nobusywait 1 1 Not None If this parameter exists, the driver returns the Queue Full status back to
Useda StorPort and does not use the StorportBusyWait mechanism.
nonuma 1 1 Not None If this parameter exists, NUMA support is disabled in the driver.
Useda
Nosrbflush 1 1 Not None If this parameter exists, driver issues the SRB_FUNCTION_FLUSH
Useda command to the firmware when the OS receives the command. By
default, this command is not passed to the firmware and completed back
to the OS by the driver. Enabling this parameter causes significant
performance drop.
Qdepth 1 254 — None Sets the device queue depth.
a. By default, this registry entry is not present.

- 44 -
4.1.1.5 Disk Write Cache

Disk write cache (or write back cache) permits your system to function faster by acknowledging a write while data is
still in the disk cache rather than waiting for the data to commit to storage media. However, if power is lost prior to the
actual write, the data is lost. Therefore, not all drives permit write cache. Enabling disk write cache improves
performance, but you must consider your specific situation before you decide to enable write cache. SSDs generally
ignore the disk write cache setting because they might have their own internal caching algorithm.
You can enable write caching in the controller firmware (write back cache for MegaRAID products) or in the operating
system. From Disk manager, right-click Properties > Policies. See the following example screen.
Figure 16 Write Cache Policies Example
 Write-cache buffer flushing is enabled (unchecked, as shown by the second checkbox in the previous figure) by
default. If this feature is enabled, an I/O must actually be written to the drive before it completes. That is, writing
an I/O to the cache does not mean the I/O is complete.
 If you enabled write-cache buffer and you do not use unbuffered I/O, the performance is the same as disabling
the drive's write cache (see the following table). Some benchmark tools, such as HD Tune Pro, use buffered I/O
(not direct I/O). Iometer uses unbuffered I/O (direct I/O).
 Results indicate that Flushing Enabled is synonymous with no drive cache when you compare writes. Everything
must still reach the drive platters to complete. Reads are not affected.
The following table shows how enabling or disabling the write cache option and the write-cache buffer flushing option
affects the performance.
Table 7 Windows Write-Cache Buffer Flushing Comparison
Block Size (KB) Windows Flushing Enabled (IOPs) Windows Flushing Disabled (IOPs) Write Cache Disabled (IOPs)
0.5 250 23,486 250
1 250 22,331 250
2 250 21,468 250
4 249 20,098 249
8 248 17,855 248
16 245 11,984 246
32 240 5,997 241
64 232 2,977 232

- 45 -
Table 7 Windows Write-Cache Buffer Flushing Comparison (Continued)
Block Size (KB) Windows Flushing Enabled (IOPs) Windows Flushing Disabled (IOPs) Write Cache Disabled (IOPs)
128 215 1,499 215
256 189 747 188
512 151 377 152
1024 108 189 108
2048 69 94 69
4096 40 47 40
8195 21 24 22
4.1.2 Linux Operating System
Linux default settings might not be tuned for best performance and you might need to tune them manually. This
section assumes general Linux distributions. Any steps might differ from your Linux distribution. Refer to the
documentation from your distributor for equivalent commands and settings.
4.1.2.1 Linux Kernel Version

The Linux kernel version depends on the Linux distribution with which the kernal came. For better performance,
adhere to the following guidelines:
 Choose a distribution with the latest kernel version that is void of any known performance issues.
 Make sure your Linux distribution supports the RQ_affinity = 2 option.
4.1.2.2 Linux Drivers

Avago storage controllers release with prebuilt drivers. Move to the latest recommended driver version of the storage
controller for better performance results.
While applying operating system updates, the Linux Kernel version might change and the newer kernel might not
pick the latest driver installed on the system. The kernel might pick the in-box version. For such cases, build the latest
driver from its source and install. The source is available as the dynamic kernal module support (DKMS). The
instructions to build and install a Linux device driver by using DKMS is available with Avago product documentation,
such as the MegaRAID SAS Device Driver Installation User Guide.
4.1.2.3 MSI-X Interrupt Vectors

Linux might not balance the MSI-X interrupt vectors across the CPU cores. You might have to manually assign the
interrupts to the cores.
IT Controllers
To list all interrupts assigned to the controller, run cat /proc/interrupts | grep mptsas. Avago provides
the set_affinity.sh script with its driver downloads to set the interrupt affinity automatically.
MegaRAID Controllers
Use one of the following methods to set the affinity:
 If you use Linux version 6.3 or newer and the RQ_affinity setting is available, set the RQ_affinity setting
to 2 for the device. Do not use the affinity steps in the following option.
 Use the command line as outlined in this section to assign the interrupt affinity manually.
Use the CPU ID of the core and IRQ number of the interrupt vector in the/proc file system to assign interrupt
vectors to specific cores:
echo "{CPU ID MASK}" > /proc/irq/{IRQ #}/smp_affinity

- 46 -
Run cat /proc/interrupts to assign the IRQ numbers to the controller. Run grep with mptsas to filter
the results.
Add manually-assigned IRQs to the banned interrupts list to avoid rebalancing by the system across the CPUs:
export IRQBALANCE_BANNED_INTERRUPTS="{IRQ #}...{IRQ #}"
4.1.2.4 I/O Scheduler

The Linux kernel uses kernal I/O scheduling to control disk access. The 2.6 kernel lets applications select different I/O
schedulers depending on usage patterns to optimize the kernel I/O. The four I/O schedulers are Complete Fair
Queueing (default), Deadline, NOOP, and Anticipator. Avago uses the Deadline and NOOP schedulers for performance
tuning. Make sure to run the same I/O scheduler for all storage devices attached to the same controller.
Deadline (deadline)
This option uses a deadline algorithm aimed to minimize I/O latency to provide near real-time behavior. The algorithm
uses a round robin policy among multiple I/O requests to prevent starvation. Avago uses the Deadline scheduler for
storage systems with both rotating media and with SSDs to reduce the I/O latency as much as possible.
You must change the I/O scheduler on each individual device:
Syntax: echo {Scheduler-Name} > /sys/block/{Device-Name}/queue/scheduler
Example: echo "deadline" > /sys/block/"SAS3108" /queue/scheduler
To verify the I/O scheduler for a block device run:
Syntax: cat /sys/block/{Device-Name}/queue/scheduler
Example: cat /sys/block/"SAS3108" /queue/scheduler
NOOP (noop)
This option uses a basic FIFO queue and performs the minimum work required to complete an I/O. This algorithm
assumes performance for an I/O is optimized by the application or another component in the system (block device,
HBA, or externally attached controller).
You must change the I/O scheduler on each individual device:
Syntax: echo {Scheduler-Name} > /sys/block/{Device-Name}/queue/scheduler
Example: echo "noop" > /sys/block/"SAS3108" /queue/scheduler
To verify the I/O scheduler for a block device run:
Syntax: cat /sys/block/{Device-Name}/queue/scheduler
Example: cat /sys/block/"SAS3108" /queue/scheduler
4.1.2.5 Block Layer I/O Scheduler Queue

The Linux I/O scheduler queue size can impact performance. The queue size determines how many incoming requests
are stored in the I/O scheduler's request queue for the scheduler to optimize. Configure the queue size at the block
layer for individual devices through the nr_requests variable:
Syntax: echo "{QUEUE SIZE}" > /sys/block/{DEVICE NAME}/queue/nr_requests
Example: echo "128" > /sys/block/"SAS3008"/queue/nr_requests
You can query the current I/O scheduler queue size in a similar manner:
cat /sys/block/{DEVICE NAME}/queue/nr_requests
The I/O scheduler queue default size in most Linux versions is 128; that is, 128 reads and 128 writes can queue to the
device at any instance before the process is put to sleep. You can increase or decrease the size. The queue size might
impact the system performance.

- 47 -
Latency sensitive applications that use writeback I/O might consider lowering the nr_requests value to prevent
filling the device queue with write I/Os. The exact queue size yielding optimal performance varies from system to
system and is workload dependent. Test this setting on your system to decide what setting yields the
best performance.
4.1.2.6 SCSI Queue Depth

The SCSI queue depth defines the number of transfers that can be outstanding for the device at any given time. You
can configure this limit at the block layer for individual devices through the queue_depth variable:
echo "{QUEUE SIZE}" > /sys/block/{DEVICE NAME}/device/queue_depth
You can query the current SCSI queue depth in a similar manner:
cat /sys/block/{DEVICE NAME}/device/queue_depth
The default SCSI queue depth for an I/O device in Linux varies on the device. For example, a SAS hard drive might have
a default SCSI queue depth of 32. You can increase or decrease this value. The SCSI queue depth might impact the
system performance. The exact SCSI queue depth yielding optimal performance varies from system to system and is
workload dependent. Test this setting on your system to decide what setting yields the best performance.
4.1.2.7 Nomerges Setting

The nomerges setting helps manage contiguous I/Os. This setting affects Write Back performance optimization. To
optimize Write Back performance, set nomerges to 0 for HDDs or to 1 for SSDs.
NOTE The nomerges option requires the device queue depth setting. If the
device queue depth setting is less than the queue depth pushed from
the benchmarking tool, the block layer performs merges even if
nomerges is set to 1.
Syntax: echo "{NOMERGES}" > /sys/block/{DEVICE NAME}/queue/nomerges
Example: echo "0" > /sys/block/"sda"/queue/nomerges
4.1.2.8 Rotational Setting

The rotational setting states if the device is rotational (1, HDD) or nonrotational (0, SSD). The driver properly sets
this value unless you switch drives without performing a driver reload.
Syntax: echo "{rotational}" > /sys/block/{DEVICE NAME}/queue/rotational
Example: For HDDs: echo "1" > /sys/block/"sda"/queue/rotational
4.1.2.9 Add Random Setting

The add_random setting helps manage disk entropy contribution. The default value is 1. Set add_random=0 for
SSDs because random entropy pool does not optimize SSD performance.
Syntax: echo "{add_random}" > /sys/block/{DEVICE NAME}/queue/add_random
Example: echo "0" > /sys/block/"sdb"/queue/add_random
4.1.2.10 Linux Write Cache

Use the hdparm tool to enable write cache in for SATA drives. Use the sdparm tool to enable write cache on SAS
drives. Refer to the Linux man page for each command for details.

- 48 -
October 2014 Volume Configurations
4.2 Volume Configurations
4.2.1 Volume Configurations and Performance
Performance varies depending on how a volume is configured. Understanding the following ideas helps you to
understand performance:
Drive group
A group of one or more physical drives. Drive groups can be made of any simple RAID type such as R0, R1, R5,
or R6 ; or of a spanned RAID type such as R10, R50, R60.
Logical drives, virtual drives, or volumes
You can create logical drives, virtual drives, or volumes from drive groups. A virtual drive can be on a single
drive or on a drive group.
When creating the drive groups and virtual drives, you must choose certain parameters, such as the following
parameters that can affect the performance:
 Volume type (R0, R1, R10, R5, R6, R50, R60)
 Stripe size (64 KB or 256 KB)
 Read cache policy (read ahead, no read ahead)
 Write cache policy (write back, write through)
 I/O policy (direct I/O or cached I/O)
 Access policy
 Disk cache policy (enabled, disabled, unchanged)
 Consistency and Initialization
 Background operations
 FastPath capability
Sections that follow review the effects of these parameters on performance.
4.2.2 Volume Type
JBOD
Just a Bunch of Drives (JBOD) indicates a raw mode without any RAID feature. This scenario is equivalent IT mode or
the JBOD mode in MegaRAID/iMR. JBOD is the fastest mode in terms of performance per drive because JBOD does not
have the RAID overhead. Though JBOD mode is simple to use, it lacks the redundancy, fault tolerance, and
performance benefits that a RAID mode provides.
The hardware in the LSISAS2208, LSISAS2308, LSISAS3004, LSISAS3008, and LSISAS3108 controllers can completely
run these I/Os without firmware involvement. This feature is called FastPath. FastPath is possible with some RAID
modes as well, which Section 4.2.9, MegaRAID FastPath Software reviews.
RAID
The RAID feature permits multiple drives to be configured as a RAID volume or virtual drive (VD) and exposed to the
operating system as a single drive. Refer to Chapter 2 in the MegaRAID SAS Software User’s Guide for a RAID
introduction. Avago MegaRAID firmware supports RAID levels 0, 1, 5, 6, 10, 50, and 60. The LSISAS3004 and
LSISAS3008 iMR firmware supports RAID levels 0, 1, 5, 10, and 50. The MegaRAID SAS RAID controllers provide
reliability, high performance, and fault-tolerant disk subsystem management.
Multiple volumes generally yield better performance. Performance varies between RAID levels and I/O types.
Sequential I/Os with RAID 0 (striping) typically performing the best, and sequential I/Os with RAID 1 (mirroring)

- 49 -
performs the lowest. Random write I/Os with RAID 5 or RAID 6 typically have lower performance because of
parity calculations.
RAID 0
RAID0 stripes the data and uses more than one drive to write the stripes. By doing so, RAID0 provides
performance improvement with the use of parallel writes and reads. Performance scales with the number of
drives present in the RAID0 volume, which is advantageous compared to using a single drive to store all the
data because single drive performance is limited to that drive’s performance. With the same number of
drives, RAID0 performance is almost as same as JBOD performance.
RAID 1
RAID1 mirrors the data of one drive to another drive. Two writes must occur for each write so the write
performance does not double with two drives. However, RAID1 helps with read performance. Data can be
read from either drive and the reads can occur in parallel. With a proper load balancing algorithm, the reads
scale almost twice the performance of a single drive. Performance is impacted if the volume is
undergoing rebuild.
RAID 10
RAID10 uses stripping and mirroring, so the performance features of both RAID0 and RAID1 are applicable.
Read performance scales almost up to the number of drives present in the RAID10 volume, however the write
performance scales only up to half of the number of drives. Performance is impacted if the volume is
undergoing rebuild.
RAID 5
RAID5 calculates and distributes parity across the drives. Write performance suffers because of these parity
calculations. However, read performance scales almost up to the number of data drives present (total drives
minus 1 parity drive). Using hardware RAID accelerators, in ROCs, improves the write performance. If the
firmware or software handles the parity calculations, the performance decreases. If the volume undergoes
rebuild, the performance would be affected.
Initialize the RAID5 volumes for better performance because consistent volumes avoid the need to access
each drives individually to do the read-modify-write.
RAID 6
RAID6 uses dual distributed parity, similar to RAID5. Dual parity calculations do not show significant
overhead compared to single parity so the performance is also similar to RAID5 volumes. Initialize the
volumes for better performance. If the volume undergoes rebuild, the performance would be affected.
RAID 50
RAID50 is the span formed by RAID5 and RAID0, thus combines the performance properties of both RAID5
and RAID0. That is, data is stripped to use more than one drive at a time. This approach reduces the rebuild
times compared to a single large RAID5 made of all drives.
RAID 60
RAID60 is the span formed by RAID6 and RAID0. RAID60 is similar to RAID50, previously described.
NOTE RAID50 and RAID60 performance results are almost the same as RAID5
and RAID6 performance results, respectively; this document does not
discuss RAID50 and RAID60 explicitly in detail.
Performance and Volume Type
Read Performance
Redundancy allows the same data to be present on more than one location so it provides the liberty to load
balance the reads across different drives. The read performance scales with the level of redundancy. For
example, R1 has two drives with same data. The data can be read simultaneously from both these drives, so
you can achieve up to almost twice the single drive performance.

- 50 -
Write Performance
Stripping (R0, R10, R50, or R60) allows data to be written into more than one drive in a parallel fashion. The
write performance of the volume scales with the number of data drives present in a strip.
Parity Generation
Parity generation during the R5, R6, R50, and R60 mode adds overhead to the writes and limits the write
performance. The ROC IC hardware RAID modules can compute these XOR (parity) calculations at a faster rate
than the firmware computation. The ROCs provide higher R5, R6, R50, and R60 write performance than the
IOCs. However, the data still must be cached, which requires the firmware and so the performance is lower
compared to the hardware FastPath I/Os.
4.2.3 Strip Size
Strip size decides how much overhead is involved during the write operation. In general, the lower the strip size the
higher the number of stripping operations per I/O, and the performance decreases. That is, the higher the strip size,
the better the performance. However, performance is negatively impacted if host commands are larger than the strip
size or if multiple random I/O land within the same stripe (strip size × number of data drives). You can improve
performance if the strip size is matched to the expected I/O size. A 256-KB default size provides a compromise for
general operation of small, random and large, or streaming I/O.
6Gb/s SAS MegaRAID controllers used a 64-KB strip size as the default. Newer MegaRAID controllers use a 256-KB strip
size as the default. iMR controllers only support 64-KB strip size. The maximum strip size that MegaRAID supports is
1024 KB.
4.2.4 Cache Policy
Cache can improve read and write performance. If the cache performance is limited when the I/Os use cache, the
cache becomes the bottleneck. Cached I/Os cannot use the hardware FastPath engine, so the performance might
be lower.
 Read Ahead and Write Back modes use the cache for reads and writes, respectively. This combination suits HDD
volumes, but not SSD volumes. Using cache in the front end boosts performance by flushing the cache later in the
background, because accessing the rotational HDDs is significantly slower than cache accesses.
 No Read Ahead and Write Through modes avoid using the cache for both reads and writes. If no parity generation
exists, this setting helps use FastPath for I/Os and improve performance. This combination suits SSD volumes, but
not HDD volumes. Accessing SSDs directly gives better performance and latency than using the cache in
between.
 Read Ahead with Write Through and No Read Ahead with Write Back modes help only read or only write,
respectively. Because these two modes use cache, the I/Os cannot use FastPath and the performance decreases.
4.2.5 Disk Cache Policy
Enabling disk write cache (physical drive cache for MegaRAID) helps the write performance. However, Enterprise
servers might want to keep the disk write cache disabled to avoid any data integrity issues that can arise if the drive
loses the power abruptly. It might not be advantageous to use disk write cache for SSDs. Disk write cache is
advantageous for HDDs because the cache writes are significantly faster compared to writing to the rotating media.
4.2.6 I/O Policy
I/O policy allows Cached I/O and Direct I/O modes. Cached I/O helps retain the write data in the cache, whereas Direct
I/O releases the cache line after the writes to the disks complete. For consistent performance, use Direct I/O policy.

- 51 -
Because the Cached I/O performance might vary over time depending on the Cache lines’ availability and what I/O
already resides in cache as a dirty cache.
4.2.7 Consistency and Initialization
Initialization is the process of writing volumes with zeros. Initialization is important for consistent performance,
especially for volumes that require parity generation. Consistent volumes do not need additional read-modify-writes
that inconsistent volumes require.
If initialization is running in the background, as explained in the following section, performance is affected. Wait until
the initialization finishes before you run the actual performance tests.
4.2.8 Background Operations
Background operations significantly impact performance because benchmarking tools do not account for such I/Os.
Background Initialization, Patrol Read, Consistency Check, Rebuild, and Reconstruction operations should not run
while measuring I/O performance. Make sure these operations are disabled or completed, and are not scheduled to
run during the performance measurement.
MegaRAID provides options to control the percentage rate at which these background operations are issued.
However, these percentages are not representative of exact bandwidth that the background operations consume. For
example, setting the rebuild rate to 30% does not mean that 70% of the bandwidth is used for normal I/O. It only
means that the wait time before submitting the next Rebuild command is set to 70% of its maximum wait time. The
options control only the submission of the commands to the drives; background operations are usually handled when
the controller is idle. If many I/Os are already in progress, I/Os might continue to proceed at almost 100% rate.
4.2.9 MegaRAID FastPath Software
Avago MegaRAID FastPath software is a high performance I/O accelerator for SSD acceleration that can be enabled so
that a hardware I/O accelerator handles I/Os without firmware involvement. The FastPath feature is an additional
feature available in some 6Gb/s Avago MegaRAID SAS controller cards and all 12Gb/s Avago MegaRAID SAS controller
cards through the purchase of a software license. Consult product documentation for any Avago MegaRAID card to
determine its FastPath capability.
Using FastPath benefits certain workloads depending on the RAID type and volume configuration. Though the
FastPath feature can be enabled and available for all configurations, it is not always possible to use FastPath for all
configuration. Therefore, the MegaRAID firmware uses FastPath only for configurations for which the use makes sense.
The following table identifies which configurations can use FastPath:
Table 8 FastPath Software Capability Matrix
HDDs , HDDs and SSDs SSDs Only

RAID Level
Reads Writes Reads Writes
IT Adapter Yes Yes Yes Yes
MegaRAID JBOD, iMR JBOD Yes Yes Yes Yes
RAID 0 Yes Yes Yes Yes
RAID 1 (Two drives) Yes No Yes No
RAID 1 (More than two drives) No No Yes No

- 52 -
Table 8 FastPath Software Capability Matrix (Continued)
HDDs , HDDs and SSDs SSDs Only

RAID Level
Reads Writes Reads Writes
RAID 10 No No Yes No
RAID 5 and RAID 50 Yes No Yes No
RAID 6 and RAID 60 Yes No Yes No
The Avago MegaRAID controller uses FastPath in the following conditions:

 Virtual drive configured by using Write Through, No Read Ahead, and Direct I/O.
 Cut-through I/O is enabled in the controller.
 No background operations, such as consistency checks, volume initialization, patrol reads, or copy back are
running (verify by using MSM or StorCLI).
 Controller runs in non-degraded mode for the best performance possible.
 I/O operations are within a single RAID strip.
Avago MegaRAID topologies can support a mix of FastPath-enabled volumes and non-FastPath volumes. The
firmware evaluates the FastPath capability on a per volume basis while doing a media check to determine if the
underlying storage consists of HDDs or SSDs. If FastPath is enabled on a volume, the firmware does not touch the I/O
in normal cases. The firmware is involved in error cases. With FastPath software, an Avago MegaRAID controller can
see substantial performance gains compared to non-FastPath configurations.
4.2.10 Guidelines on Volume Configurations for Better Performance
The following are guidelines only. Choose your options based on what best suits your application.
 HDD Volume Parameter Settings
— Stripe Size: 256 KB (default)
— Read Policy: Always Read Ahead
— Write Policy: Write Back
— I/O Policy: Direct I/O
— Access Policy: Read Write
— Disk Cache Policy: Unchanged
 SSD Volume Parameter Settings
— Stripe Size: 256 KB (default)
— Read Policy: No Read Ahead
— Write Policy: Write Through
— I/O Policy: Direct I/O
— Access Policy: Read Write
— Disk Cache Policy: Unchanged
— If prompted to enable SSD Caching (CacheCade), respond No
 Make sure your volumes are consistent before you run performance.
 Make sure no background operations are running.
 Make sure sufficient queue depth (Qd) is set from the benchmarking tools. Set the Queue depths in accordance
with the number of physical drives. A Qd of 8 per drive for a JBOD is not same as Qd of 8 for an 8 drive RAID
volume. Set the Qd to (8 x 8 =) 64 for the 8 drive RAID volume to get the same performance.
 You might need to increase the number of volumes present in a drive group to get better performance. You might
also need to increase the number of threads/workers to match the number of virtual drives (volumes). For
example, on a RAID0 Drive Group made of 8 physical drives, it is better to create 2, 4, or 8 volumes and assign

- 53 -
October 2014 Software Tools
them to different workers instead of creating one volume and assigning it to one worker.
4.3 Software Tools
After you complete your storage topology and installed the necessary operating system, you need software tools to
set up and monitor your configuration, and to measure the performance. The following table summarizes the tools
used in the Avago performance lab. Refer to the documentation of the product or tool of your interest to use the tool
that best suits your need.
Table 9 Tools to Program and Configure the Storage Controllers and Expander
Tool Comments
Sasflash To program SAS controllers. Not for use with MegaRAID products.
 sas2flash tool for 6Gb/s SAS
 sas3flash tool for 12Gb/s SAS
Sas2praser Merges the NVDATA files of the SAS controllers with their firmware images. Useful to make custom
changes in NVDATA, then merge and flash to the controller.
Storcli Command line tool to program, monitor, and manage MegaRAID 6Gb/s SAS and 12Gb/s SAS controllers.
Useful for scripting and automation.
MegaCli Command line tool to program, monitor, and manage MegaRAID 6Gb/s SAS controllers . Useful for
scripting and automation.
MegaRAID storage GUI-based tool to program, monitor, and manage MegaRAID 6Gb/s SAS and 12Gb/s SAS controllers. Easy
manager (MSM) to start with and configure different parameters.
MegaREC Recovery tool for MegaRAID controllers. Useful If the controller is bricked.
Lsiutil Internal tool with many debugging options for Avago controllers and expanders. Not for use with
MegaRAID products.
Scrutiny Customer tool for debugging and configuring Avago controllers and expanders. Officially supported for
12Gb/s SAS controllers and Expanders.
Xtools Xutil and Xflash is to flash the firmware and manufacturing images of 6Gb/s SAS expanders. [ g3xutil and
(Xflash/Xutil/Xmfg) g3xflash are for 12Gb/s SAS expanders.] Xmfg creates a manufacturing image from xml files.
The following table describes benchmarking tools. The following chapter describes in detail the commonly used
benchmarking tools.
Table 10 Benchmarking Tools
Tool Comments
IOmeter Easy to use GUI based benchmarking tool with Synthetic workloads. Works with Windows and Linux.
Supports Command line options as well. Not suitable if Latency must be analyzed in depth.
VDBench Command line tool that is powerful to measure latency at a greater granularity. Supports windows and
Linux. Java based benchmark tool and so easy to run on any OS. Provides text and html based results.
Fio Command line tool suitable for benchmarking, QA, and verification purposes. Provides different IO
engines and various results formats.
JetStress Benchmarking tool that simulates a Microsoft Exchange database workload without a full Exchange
installation. Suitable as a Real-world workload simulator. Not a general purpose tool.

- 54 -
Table 10 Benchmarking Tools (Continued)
Tool Comments
TPC-C (Transaction TPC-C is an on-line transaction processing (OLTP) benchmark. Simulates a complete environment
Processing Performance where a population of terminal operators executes transactions against a database. Transactions based
Council – C) tool.
TPC-E The TPC-E benchmark simulates the OLTP workload of a brokerage firm. The focus of the benchmark is
the central database that executes transactions related to the firm’s customer accounts
Orion Oracle Orion is a tool for predicting the performance of an Oracle database without having to install
Oracle or create a database.
The following table describes system tools.
Table 11 System Tools
Tool Comments
Windows – Performance monitoring tool that comes with Windows. Allows creating different performance counters to
Perfmon measure any performance parameter of interest. For example, plot the MBPS of an SSD over a time to verify if the
precondition is sufficient.
Windows – Comes as part of Windows Performance Toolkit. Needs windows performance recorder (WPR) and windows
Xperf performance analyzer (WPA).
Windows – Windows tool to collect all the information about the system. Allows saving the complete configuration to a file.
msinfo32
Linux – eXtended trace utility, similar to strace, ptrace, truss, but with extended functionality and unique features, such as
Xtrace dumping function calls (dynamically or statically linked), dumping call stack and more.
Linux – Linux utility to manage software RAID devices. Allows creating software level RAID volumes on any non-RAID
mdadm storage controller as well.
Linux – sar A Linux command that writes to standard output the contents of selected cumulative activity counters in the
operating system.
Linux – iostat Linux command to report CPU statistics and I/O statistics for devices, partitions and network filesystems (NFS).
Linux – Linux command to generate traces of the I/O traffic on block devices.
blktrace
Linux – Linux Command to produce formatted output of event streams of block devices.
blkparse
Windows – Windows tool to manage objects (disks, partitions, or volumes) by using scripts or direct input at a command
diskpart prompt.
4.3.1 Linux Performance Monitoring Tools
When you run a performance test under Linux, you can monitor the system during the test. This monitoring can give
additional insight into the system that might not be available through the performance test alone. Linux offers several
command line tools installed by default to help you monitor a Linux system.
Many tools discussed in this chapter are a part of the sysstat package. This package is installed by default on many
common Linux distributions or available in the package repositories.
4.3.1.1 sar
The sar command line utility reports the values of cumulative activity counters in the Linux operating system. Invoke
sar by using one of the two following methods:
 standalone: Looks for the current day's data and shows the performance data recorded for the current day.

- 55 -
 sar file: This method is invoked using the -f flag passing in the sar file and shows the performance data stored
in the sa file.
You must enable sar logging to use sar. Many systems enable sar by default. Debian®-based systems might require
that you modify /etc/default/sysstat to set ENABLED to true. Red Hat®-based systems enable sar by
default and set to log 7 days of statistics. Use the following syntax:
sar <command flags> <# of seconds between each run> <# of times to run sar>
For example, sar -u 5 3 runs the cumulative real-time CPU usage every five seconds, a total of three times.
sar generates a wide range of statistics based on the command flags provided. The following command flags are
most common:
 -u: real-time usage of all CPUs
 -P: real-time usage of individual CPUs or cores
 -r: memory statistics, including free and used memory
 -b: I/O statistics, including transactions and bytes broken down by reads and writes
 -d: I/O statistics for individual block devices
 -w: number of context switches per second
 -q: run queue and load average
 -s: report the data using the specified start time
The following code sample shows an example output.

- 56 -

- 57 -
4.3.1.2 iostat
The iostat command line utility reports system storage input and output statistics, and observes the time the
devices are active in relation to their average transfer rates. iostat generates the following three report types:
 CPU Utilization: Global averages among all processors
 Device Utilization: Statistics for each physical device or partition
 Network Filesystem: Statistics for each mounted network filesystem
A typical iostat use is to run iostat during a performance test to help monitor the system like real-time. You can
use the following command line interval options to enable iostat to display either a single report or a continuous
report at fixed intervals:
 iostat -d 5: Displays a continuous device report every five seconds
 iostat -d 5 3: Displays three device reports every five seconds
iostat can report the statistics only for specific devices if the devices are passed into the command line options, as
the following example shows:
iostat -d sda sdb 5 3: Displays three device reports every five seconds for device sda and device sdb
The report interval to use with iostat depends on the test and the behavior type that you want to observe.
Frequent iostat reports (such as every 2 seconds) might be more suited for picking up small events during the test
while more infrequent iostat reports (such as every minute) might give a better overall view of the system. iostat
output shows in Linux stdout and can be redirected to a text file if necessary, by using the > operator.
The following command flags are most common:
 -c: display the CPU use report
 -d: display the device use report
 -n: display the network filesystem report
 -k: display statistics in KB/s
 -m: display statistics in MB/s
 -t: display each report time
 -x: display the extended statistics

- 58 -
4.3.1.3 blktrace
The blktrace block layer I/O tracing mechanism provides detailed information about request queue operations up to
the user space. blktrace includes three major components: a kernel component, a utility component to record the
I/O trace information for the kernel to user space, and a utilities component to analyze and view the trace information.
First run: mount -t debugfs debugfs/sys/kernal/debug
Secondly, run blktrace -d dev [-r debugfs_path] [-o output] [-k] [-w time] [-a action]
[-A action_mask] [-v]
The parameter options include filter masks, buffer information, tracing information, network information, file, and
versions. Refer to the blktrace Linux manufacturing page for details and examples,
http://linux.die.net/man/8/blktrace.

- 59 -
4.3.1.4 blkparse
The blkparse parameter interpets the blktrace file for metrics such as the Qdepth, and so on.

- 60 -
4.3.2 Windows XPerf
Prerequisites
Before you can collect traces, you must install .NET framework 4.5 and Windows SDK for Windows 8 or later. After
installation two new programs appear in the start menu: Windows Performance Recorder and Windows Performance
Analyzer.
Windows SDK provides the XPerf tool that collects various performance traces on Windows. Windows SDK for
Windows 8 provides a graphical user interface tool to record and analyze performance traces. Use the following steps
to record performance traces using Windows Performance Recorder:
1. Run the Windows Performance Recorder tool by navigating to Start > Windows Performance Recorder.
2. Click More Options to see additional performance profiles.
3. Select CPU Usage and Disk I/O Activity as shown in the following figure. Select additional profiles if necessary.
Figure 17 Profile Selection
4. Use the drop-down options to select the appropriate Performance scenario, Detail level, and Logging mode.
General, Verbose, and File, respectively, are good choices for all generic scenarios.
5. Click Start to begin recording.
6. Start any IO generators or benchmarking tools to send I/Os to disk.
For example run IOMeter with workloads that might uncover performance issues currently being debugged.
7. After the appropriate test runs finish, click Save on the Windows Performance Recorder tool. Save the trace to a
convenient location when asked, as shown in the following figure.

- 61 -
Figure 18 Save Test Example
8. Run Windows Performance Analyzer tool and load the trace file saved in the previous step. The Performance
Analyzer might take a few minutes to load and analyze the trace.
9. Double click Graph Explorer to see all traces available.
10. Right click on the traces of interest and select Add Graph to Analysis View to add them to Analysis view for
further analysis, as shown in the following figure.

- 62 -
Figure 19 Graph Explorer Example
11. After you add all traces of interest to the analysis view, open the View Editor and make appropriate changes to
customize the view. The following figure shows the View Editor icon.
Figure 20 Open the View Editor

- 63 -
4.3.3 Windows Performance Monitor (Perfmon)
The Windows operating system ships with a performance monitor to trace important performance counters and get
deeper insight about the storage subsystem. Perfmon has an easy-to-use GUI to create and run performance
monitoring tasks. Follow the steps to create a data collector profile and start a monitoring task.
1. Run perfmon at command line or the Windows Run dialog to launch the perfmon GUI.
2. On the left of the Performance Monitor GUI, expand Data Collector Sets and right-click User Defined.
3. Select New > Data Collector Set.
4. In the screen that appears, enter a new name for the data collector set. Select Create manually (Advanced) and
click Next.
5. In the screen that appears, select Create data logs and select Performance counter.
6. Click Next.
7. In the screen that appears, click Add to add performance counters of your interest to the Data Collector Set.
8. In the screen that appears, scroll to the Physical Disk section and click the down arrow to list the counter options.
Figure 21 Physical Disk Option
9. Select all Physical Disk counters and double-click or click Add. Apply them to either only the storage devices you
are measuring, or select <All instances>.

- 64 -
Figure 22 Add Performance Counters
10. After you add the required performance counters to the list, click OK.
11. Complete the remaining Data Collector Set wizard actions.
The new Data Collector Set appears on the left of the Performance Monitor GUI, at Data Collector Sets > User
Defined.
12. Select the Data Collector Set that you created and right-click DataCollector01 in the right portion of the GUI.
13. Select Properties from the pop-up menu.
14. In the screen that appears, select the desired Log format and Sample interval. Binary log format can be viewed
by using a performance log viewer GUI that Windows provides. If you must process data, the comma separated
values (CSV) format is recommended.

- 65 -
Figure 23 Properties
15. Click OK.

16. Right-click on the Data Collector Set and select Start from the pop-up menu to start the monitoring task.

- 66 -
Figure 24 Start Collector Set
Optionally, you can run the following to start and stop performance monitoring from command line.
— logman start <Data Collector Set name>
— logman stop <Data Collector Set name>
Perfmon starts logging data to the output directory specified when you created the data collector set.
When the performance monitoring stops, the results are stored in a DataCollector.csv file that you can import
into Excel for analysis and graphing.

- 67 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 5: Benchmark Resources
October 2014 Benchmarking Basics
Chapter 5: Benchmark Resources

After you set up and configure your test hardware, choose a benchmark that best suits what you want to measure.
Your benchmark must measure the metrics of your interest for sufficient duration and with sufficient granularity.
Each benchmarking tool has its own merits and demerits. Evaluate different benchmarks and choose the tool that
produces results with the best set of metrics that suits your real-time workloads.
After you choose the benchmark, configure the input parameters and output formats correctly such that it captures
all the relevant metrics properly. It is a good practice to have a simple, short test that is a good indicative of your actual
runs. Before you run your actual test, run this sample test first to check for maximum IOPS and MBPS for different
performance corners of your configuration. Compare the results of this sample test against your expected results to
make sure they match. Sample test advantages includes the following:
 Proves your topology is free of any obvious issues.
 Verify the input parameters of the benchmarking tool are as expected.
 Verify the output results format is as expected.
If your setup has any issues, sample tests catch the issues quickly and so time is saved. Without these tests, you might
only see the problem after the actual perfomance test completes, which can be lengthy.
The following sections provide detailed explanations on how to install, run, and interpret results for select commonly
used benchmark tools.
5.1 Benchmarking Basics
This section discusses basic parameters related to benchmarking and their impact on performance.
Workers or Threads
Benchmarking tools use threads to send I/Os and to measure the performance metrics. IOmeter uses workers,
whereas Linux tools use threads.
Managers and Instances
A manager is one instance of the benchmark tool, that can have one or more workers. You can have more
than one Manager. In Linux you may run multiple instances of the benchmark. However at the end, you must
merge the results from all the managers/instances to get the complete result. Typically only one manager is
needed, However you might require more than one manager when multiple controllers are benchmarked at
the same time, or multiple unrelated metrics measured at the same time.
Queue Depth
The number of outstanding I/Os per drive. The benchmark tools allow direct modification. Qd may be set for
a physical drive, or a logical drive in the case of RAID volumes.
Before running benchmarks, identify the minimum Qd for which the drives are saturated and provide
maximum performance. Use a Qd that is equal to or greater than this minimum Qd for your benchmarking
runs. Multiply this Qd with the number of physical drives to select the Qd for RAID volumes.
I/O Type
I/Os can be sequential, random, and mix of sequential and random. For HDDs, random performance is usually
drive limited and HDDs tend to give very high sequential performance and low random performance. For
SSDs, both sequential and random performance numbers are at similar levels.
Real world workloads are usually mixed I/O types. Synthetic benchmark tools such as IOmeter allow building
complex workloads that mimic the real life workloads.

- 68 -
October 2014 Benchmarking Basics
I/O Size
I/O size is the size of the I/O used to measure the performance metrics. Avago standard runs use I/Os from 0.5
KB to 1 MB or 4 MB. 0.5 KB, 4 KB, or 16 KB are better candidates to check the maximum IOPS. 256 KB and 1 MB
are better candidates to check MB/s limits.
The I/O size influences performance in different ways because the storage controllers might have features to
optimize performance for specific I/Os. For example,
 Small I/Os may be coalesced to get higher performance
 Large I/Os may be broken in to smaller ones because of the maximum I/O size limitation of the controller.
MegaRAID controllers can support up to 252 KB natively, I/Os above 252 KB are split in to multiple I/Os
which causes overhead and the performance might reduce a bit for larger I/Os.
 Storage controller may use Big Block Bypass, a feature that by passes larger I/Os from reaching to the
Cache to optimize the cache usage and gain larger MB/s.
I/O Direction
I/O direction means the direction in which the data flow. Direction can be read, write, or a mix of reads and
writes. Benchmarks usually let you modify the % of Reads/Writes to define the I/O direction. For HDDs, usually
the write performance reaches higher maximum levels compared to reads. For SSDs, the write performance is
lower as it can involve additional erasing and garbage collection.
Ramp Time
The duration when the I/Os are sent but the measurement is NOT made. This time is important to avoid the
transients that can occur at the start of the test. Allow sufficient ramp time before each of the actual test run
time.
Run Time
The duration when the I/Os are sent and performance metrics are measured. Having too short run time
affects the consistency of the performance results. Having too long run time increases the overall
measurement time. There must be a trade-off on the right run time. SSDs might need a longer run time. Tests
that might have cache influence need longer ramp and run time.
Scaling
The scaling means stepping a certain parameter in either a serial or a parallel fashion. The parameter that
scales may be the number of drives, the number of Workers, Qd, and so on. An example for serial is adding
drives in steps, irrespective of workers. An example for parallel is adding one drive per worker at every step. If
there 8 workers, drives are added to all the workers at each step. The steps can be linear or exponential. For
example,
 In a 60-drive configuration, for a specific I/O such as 64-K SW, the performance can be monitored for
drive scaling with a linear step of 5 drives at a time. This helps ensures all the drives are used well and the
system scales well up to 60 drives and performs optimally for any number of drives.
 Single drive performance may be monitored for Qd scaling with the exponential scaling from 2 to 256 in
steps of powers of 2. This helps choose the right Qd for each physical drive for the actual
performance runs.
Outliers
Outliers in performance are always possible. The outliers represent a sample that behaves differently
compared to the other measurement. The outlier could show itself between runs, or between different tests
of same I/O. Outliers usually indicate an issue with the device design or a unaccounted variable during the
measurement. Repeat the tests for the same configuration or scale the performance for different I/O sizes,
drives, Qd, and so on to find such outliers.

- 69 -
October 2014 Iometer for Windows
5.2 Iometer for Windows
Iometer is an I/O generator, measurement, and characterization tool for single and clustered systems. Iometer uses an
easy-to-use GUI that provides command line options and the option to run in batch mode. Iometer is not a real
end-user application, but is a tool with which to probe storage performance. You can run benchmarking on a local
system or from a remote client over a network. This section discusses Iometer capabilities relative to storage
performance only.
Avago uses the latest version of Iometer 1.1.0, available at http://sourceforge.net/projects/iometer/, because it
provides varied entropy for the I/O data pattern. Entropy (randomness in data patterns) can impact some SSDs. Some
SSDs tend to give very high performance when the same data is written again and again. Therefore, it is important to
test with high entropy so the measured performance better represents real-world performance.
Iometer works based on a client-server model with two parts: Dynamo and IOmeter GUI. When you start the IOmeter
GUI, dynamo starts.
After you make your topology, you must have Managers (Dynamo) with many Workers (Threads) and each worker
assigned with a specific number of targets (physical drives or logical volumes). An I/O profile (access specifications)
that can be saved as Configuration files (*.icf), is run on these targets to obtain results (results.csv) in
comma-separated-value files. These results also show in the GUI when you run the tests.
In batch or command line mode, the command to obtain results might look like
iometer /c iometer.icf /r results.csv /t 100
Check everything in the GUI mode first, and create and save the configuration file before you run the tests in
batch mode.
5.2.1 Run Iometer
Prerequisites
Verify that the controller or expander is plugged in and functioning with the system, and that Iometer is installed on
the same system as the controller or expander.
Use the following steps and the Iometer User’s Guide to setup and run an Iometer test.
1. Verify that the driver for the controller is installed.
2. Setup your storage controller in a topology of your interest, including the necessary drives.
3. Go to Start > Control Panel > Device Manager > Disk Drives and verify that all your drives are listed.
4. Go to Start > Control Panel > Device Manager > Disk Management.
Some of your drives might be listed as unknown/Not initialized. Right-click on the drive and select Initialize to
select and initialize all the drives. If not selected by default, select all the drives that you need to initialize. Use
MBR (Master Boot Record) as your partition style. GPT (GUID Partition table) is for Itanium based processors, or if
the disk is larger than 2 TB.
You can run I/Os on GPT and raw partitions. If IOmeter does not detect the drives by default when you start
IOmeter, you can manually start the dynamo with the /force_raw command line option. The force_raw
forces dynamo to report all raw disks regardless of partitions contained within them.
Now your disks should be listed as basic and online. When you reboot your system, you might have to reinitialize
some drives. Make sure all your drives are basic and online before you start your tests.
5. Run iometer.exe.
6. Use the steps in Iometer User’s Guide to operate the GUI.
The following list calls out specific choices made by Avago during the Iometer set up:

- 70 -
— On the Disk Targets tab, set the number of outstanding I/Os in the # of Outstanding I/Os field. This value
corresponds to the queue depth (Qd).
— Leave the Maximum Disk Size field at 0, so all the virtual drive capacity is exercised.
— Do not enter changes in the Network Targets tab.
— To run small I/Os of sequential reads choose the 512B; 100% Read; 0% random option from the Global
Access Specifications box and click Add to move the test into the left panel, Assigned Access
Specifications. You can also add additional workloads of 4 KB or 16 KB. For example, if you have three
workloads for your test. You can create custom access specifications by using the New, Edit, or Edit Copy
option. For example, OLTP might require 70% read and 30% writes and such workloads are not defined by
default.
— In the Test Setup tab, set proper the Run Time (30 to 60 seconds) and Ramp Up Time (10 to 20 seconds). The
aim is to wait a sufficient amount of time for your results to stabilize and average out. Thus you compensate
for the transients that occur while switching between tests.
— In the Test Setup tab, keep the Cycling Options choice as Normal -- run all selected targets for all
workers., or choose any other option that best suits your need.
7. Click Start Tests, the green-flag button, to start the test.
8. In the Access Specifications tab, the currently running test shows in green. The test number and remaining time
is listed at the bottom-right of the Iometer GUI. If you must skip to a specific test, click Stop on the prior tests.
If you see errors, check your drives, cables, and connectors. Swap bad components from your setup with good
components before you run the actual tests.
5.2.2 Iometer Tips and Tricks
 Obtain an .icf file from your Avago FAE as a starting point for your Iometer testing.
 Always set the Ramp and Test times, because if you leave the default 0/0 setting, your tests will not progress. Only
the first test will run, and it runs until manually stopped.
 Ramp time should be sufficient enough for the test that follows. For example, SSDs need preconditioning, after
which you might need to set the Ramp Time as one minute, and Run Time as two minutes.
 Do not assign a same target to multiple workers. This action can give unexpected results when you run sequential
I/Os. When multiple workers send parallel I/Os sequentially, the end result might look similar to the random I/Os
from one worker.
 Queue depth (Qd) is the maximum number of outstanding I/Os that can be queued for each drive. Qd is usually
set at eight for each drive. You can increase the Qd to decide the optimum Qd for your test.
 For RAID volumes, set the Qd based on the number of drives present in the volume. For example, for four drives
set Qd = 32 to compare the performance with four drives with 8 Qd in JBOD mode.
 When using IOMeter 2006 systems with a CPU clock speed of 2 GHz and higher do not report accurate
performance metrics. Always use IOMeter the most recent version of 1.1.0 or a newer version.
NOTE Avago does not report the results from the following test option in the
final test results. Use the following test to test your adapters, not for
actual performance reporting.
 Set Max Disk Size to Disk Cache Size to give advantage with specific I/O size to gain maximum performance. For
example, Seagate Savvio® 15K.3 HDDs tend to give higher performance (about 400 MB/s versus about 190 MB/s
normal) with 256 KB Sequential read I/Os when the Max Disk Size is 1000 sectors. This setting might be handy
when you need to get more performance from fewer drives during initial setup or trouble shooting. The I/Os are
completely handled from Cache not from the Media. Do not treat these performance numbers as actual HDD
performance.

- 71 -
5.2.3 Interpret Iometer Results
Iometer displays the data in real-time on the GUI and can save the data to a file in Comma Separated Values (CSV)
format. This CSV format file can be post processed and required information can be harvested. Because CSV is ASCII
format, it can be viewed with any standard text editor, however the raw CSV file can be difficult to understand. The
following example is a section from an Iometer CSV output file.
Figure 25 Iometer CSV Output Example
You can import CSV files to Microsoft Excel worksheet for faster and easier post processing. Use the following steps to
import a CSV file to Microsoft Excel and format it for easy consumption.
1. In Microsoft Excel, select File > Open.
2. Locate the CSV file and click Open. If you do not see your CSV file, try changing the file type to All Files or Text
Files.
3. Select the row with the column headers and select Data > Filter to add filter options to the headers.

- 72 -
Figure 26 Filter Options Example
4. Filter the first column by MANAGER, as shown in the following figure.

- 73 -
Figure 27 Column Filter Example
5. Hide any columns that are of no interest to better organize make the data.
In this example, only 10 columns with most important data appear in the worksheet. The column names are
self-explanatory.

- 74 -
October 2014 Vdbench
Figure 28 Iometer Worksheet Example
Continue to filter or review the worksheet. Save the file as an Excel file when you finish.
5.2.4 Iometer References
Download IOmeter 1.1.0, Most Recent Version

http://sourceforge.net/projects/iometer/
IOmeter User Guide
http://sourceforge.net/p/iometer/svn/HEAD/tree/trunk/IOmeter/Docs/Iometer.pdf
5.3 Vdbench
The Vdbench tool generates I/O workload to verify data integrity and to measure direct attached performance.
Vdbench is a command line program used to generate disk I/O workloads to validate storage performance and data
integrity. Vdbench also can provide detailed latency information via a histogram, and can generate workloads with
varying intensities. When testing SSDs, Vdbench permits accurate specification of data entropy (randomness of the
data pattern).
NOTE VDbench does not properly perform high queue depth, small block
sequentials. Instead, VDbench begins to randomize the I/Os and
report very low performance if you test multiple devices in JBOD
mode with expected large performance, such as 1-M IOPs. RAID
testing is not affected because issues prevent Linux from reaching the
high level of Windows’ small block sequential performance (behavior
is not a bug).
Oracle develops and maintains the program. Use the following resources:

- 75 -
October 2014 Vdbench
 Package download, User Guide, source code, and discussion forum:

http://www.oracle.com/technetwork/server-storage/vdbench-downloads-1901681.html
 SNIA™ Vdbench slide deck overview:
http://snia.org/sites/default/files/Emerald%20Training%20-%20VDBENCH%20Overview.pdf
Avago recommends using Vdbench 5.04, or no older than 5.03 rc11.
5.3.1 Install Vdbench
Use the following steps to install the Vdbench program.

1. Download the Vdbench package from the following location,
http://www.oracle.com/technetwork/server-storage/vdbench-downloads-1901681.html, into an empty folder.
2. Unzip the package.
3. Move the entire unzipped package to the system on which you will run the benchmark.
4. Select the folder that matches your operating system and copy the contents into the folder that contains all the
Vdbench files. These are the Vdbench system and operating-system specific components.
5. Install a Java® Runtime Environment on the system on which the benchmark will be run. You can download JREs
for various operating systems from many sources. For example, at:
http://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html
6. Get the JRE 7 SE package.
7. Test the installation by entering the java -version command using one of the following prompts:
— In a Windows operating system, open a CMD window.
— In Linux, open a Term window.
A returned version number confirms that the JRE is installed.

8. Next move to the folder in which Vdbench is installed. In the command or term window type vdbench –test.
If Vdbench installed correctly, a short test runs with output similar to the following output:
Vdbench distribution: vdbench503rc9
For documentation, see 'vdbench.pdf'.
17:09:00.028 input argument scanned: '-f/tmp/parmfile'
17:09:00.090 Starting slave: /root/Desktop/FB_Vdb_2E512-4x/vdbench SlaveJvm -m localhost
-n localhost-10-011231-17.08.59.982 -l localhost-0 -p 5570
17:09:00.429 All slaves are now connected
17:09:01.002 Starting RD=rd1; I/O rate: 100; elapsed=5; For loops: None
Dec 31, 2001 interval i/o MB/sec bytes read resp read write resp
resp queue cpu% cpu%
rate 1024**2 i/o pct time resp resp max stddev depth sys+u sys
17:09:02.085 1 87.00 0.08 1024 54.02 0.008 0.006 0.011 0.019
0.003 0.0 4.0 0.6
17:09:03.014 2 100.00 0.10 1024 52.00 0.008 0.005 0.010 0.018
0.004 0.0 1.9 1.1
17:09:04.053 3 71.00 0.07 1024 45.07 0.008 0.005 0.010 0.021
0.003 0.0 0.8 0.1
17:09:05.053 4 108.00 0.11 1024 52.78 0.008 0.005 0.010 0.019
0.003 0.0 0.3 0.1
17:09:06.058 5 92.00 0.09 1024 57.61 0.006 0.005 0.009 0.013
0.003 0.0 0.3 0.0
17:09:06.083 avg_2-5 92.75 0.09 1024 52.29 0.007 0.005 0.010 0.021
0.003 0.0 0.8 0.3
17:09:07.646 Vdbench execution completed successfully. Output directory:
/root/Desktop/FB_Vdb_2E512-4x/output

- 76 -
October 2014 Jetstress
If these two tests run, Vdbench is properly installed.
5.3.2 Run Vdbench
Prior to running any benchmark, you must be certain that the target drives of the test are not drives that contain
critical system information, such as the OS.
1. Verify that VdBench is properly installed.
2. Verify the system is setup to test the environment which you wish to test.
3. Create the <test_name> folder.
4. Run Vdbench.
You can run VDbench from the command line, but use of a parameter file enables you to build complex workload
and test descriptions.
Tests shows that for extremely high IOPs testing with VDBench, you can achieve the best performance by binding
VDBench to a single CPU Socket (not core) with the following execution line:
numactl --cpunodebind=x vdbench...
You can also force VDBench to use more JVMs than required to avoid the CPU core bottleneck. For example, use
the recommended JVM count 8 by entering -m 8 in the VDBench execution line.
Vdbench creates many files as a result of its execution. The Vdbench Users Guide and Section 5.3.4, Interpret Vdbench
Results provide details.
5.3.3 Sample Vdbench Script
To run 4-KB sequential read I/O to two disks for a 60 second test in Linux, run
./vdbench –f 4K_Parm.prm –o 4K_Out
where the parameter file, 4K_Parm.prm contains the following lines:
sd=s1,lun=/dev/sdb,openflags=o_direct
wd=wd1,sd=s1, xfersize=4K,rdpct=100,seekpct=0
rd=rd1,wd=wd1,iorate=max,forthreads=32,elapsed=60,interval=1
For this example, the results go to the 4K_Out folder.
5.3.4 Interpret Vdbench Results
Vdbench creates many files as a result of its execution. The flatfile.html and histogram.html output files
are the most important. flatfile.html contains the throughput, latency, and test condition information for each
output sample. You can import the file into Excel for post processing. histogram.html contains a latency
histogram for each test which is useful in analyzing components of the average and maximum latency metrics.
Other output files provides much more additional runtime and debug information.
5.4 Jetstress
Microsoft distributes the Jetstress tool that simulates a Microsoft Exchange database workload without a full
Exchange installation. Jetstress verifies the performance and stability of a system prior to a full installation of Microsoft
Exchange in a production environment. Avago uses Jetstress as a system level performance benchmark tool in the
Microsoft Windows environment to help simulate a real-world workload more closely than synthetic benchmark tools

- 77 -
such as IOMeter. Jetstress does only one kind of I/O, simulating an Exchange email environment, so is not a
general-purpose tool.
Jetstress is available on the Microsoft website for free. Microsoft releases a new version of Jetstress with each major
release of Microsoft Exchange Server. Jetstress 2013 is the current version, to coincide with Microsoft Exchange Server
2013. Each Jetstress version includes changes unique to the Microsoft Exchange Server version with which it releases.
Make sure to use the appropriate Jetstress version for the deployed Microsoft Exchange version in your production
environment.
Refer to the Jetstress 2013 Field Guide for Jetstress details, including detailed installation details,
http://gallery.technet.microsoft.com/Jetstress-2013-Field-Guide-2438bc12
5.4.1 Install Jetstress
1. Obtain the required Extensible Storage Engine (ESE) binaries. Use an installation or CD for Microsoft Exchange
Server, or download a trial version from the Microsoft website to get the necessary ESE binaries. Jetstress requires
the ESE binaries from its respective Microsoft Exchange Server install package.
Jetstress requires the following ESE binaries:
— ESE.DLL
— ESEPERF.DLL
— ESEPERF.HXX
— ESEPERF.INI
— ESEPERF.XML
2. Run the Jetstress.msi installer. Notice where Jetstress is actually installed on the system.
3. Follow the installation dialogs. Use the recommended default options that each step includes.
4. After the installation completes, copy the five ESE binary files from step 1 into the Jetstress installation folder.
5. Run the Jetstress tool so Jetstress can configure the performance library, objects, and counters.
This Jetstress configuration occurs on the first-run only.
6. Close Jetstress, then restart Jetstress to run your performance benchmarks.
7. Choose a test type. This reminder of this document focuses on the Disk Subsystem Throughput Test.
— Disk Subsystem Throughput Test. Determines the maximum performance for a storage solution when the
disks are filled close to capacity. Use for performance testing.
— Exchange Mailbox Profile Test. Determines whether a given storage solution combined can meet or exceed
the requirements of a given Exchange mailbox profile specified in terms of users, IOP per mailbox, and quota
size. Use to reproduce a specific customer scenario.
Use the following sections and the Jetstress Field Guide to create your Jetstress test.
5.4.2 Create your Jetstress Test
5.4.2.1 Select Capacity and Throughput

Capacity and throughput control how much back-end storage is used and what is the intensity of the applied
Exchange workload to the storage.
Storage capacity
Percentage of the backend storage that supports the Exchange database. Using at least 85 percent of the
storage capacity for a valid throughput test permits full-stroking of the backend storage.

- 78 -
Throughput
Throughput capacity percentage to achieve a Target IOPs. It is recommended to leave this value at
100 percent to obtain the maximum IOPs possible from the storage subsystem.
Jetstress 2013 autotunes the benchmark for the maximum IOPs possible within the acceptable response time limits.
The application is tuned by varying the number of threads applying a workload to the storage subsystem. Additional
threads means greater throughput, but comes at the expense of increased latency metrics that might not be
acceptable for Microsoft Exchange. You have the option of not using the auto-tuning and manually specifying the
number of threads.
It is recommended that you use the auto-tuning feature the first time you run Jetstress to let the benchmark estimate
an optimal thread count. Subsequent runs of Jetstress can then use a manual thread count based on these results and
whether a need exists to change the throughput level.
5.4.2.2 Select Test Type

Jetstress offers three test types:
 Performance (recommended)
 Database backup
 Soft recovery
You can enable or disable the following additional options:
Multi-host test
Only select the multihost test if you are on a shared storage platform with multiple servers.
Run background database maintenance
It is recommended to enable this option. The background database maintenance is an additional sequential
workload operating on the databases in addition to the Exchange operations from the worker threads.
Enable this option so Jetstress can more closely replicate a live Exchange deployment performing similar
background maintenance at all times.
Continue the test run despite encountering disk errors
If this option is enabled, the test report includes any disk errors.
5.4.2.3 Define Test Run

You can define how long to run the test. Use the following scenarios to determine your test length:
 When you attempt to adjust thread count manually or using the auto-tune feature, the test should be at least 30
minutes (specified as .50).
 When you execute a performance test, the test should be a minimum of 2 hours, with a recommended 8 hours.
 When you validate an Exchange deployment, perform a separate test of 24 hours.
5.4.2.4 Configure Databases

The database configuration that Jetstress uses should match the target Exchange deployment. If no target Exchange
deployment exists, adhere to the following recommendations to achieve the maximum performance in Jetstress for
the storage subsystem:
Number of databases
Match the number of databases to the number of volumes presented from the backend storage with each
database that resides on a unique volume.
Number of copies per database
It is expected that any live Exchange configuration has multiple copies of the Exchange database that reside
on the storage subsystem. The exact number of copies depends on the specific Exchange configuration.
Modifying this value in Jetstress only simulates additional Log I/O to mimic log shipping activity between

- 79 -
active and passive databases. It does not actually copy the logs. It is recommended to use at least three
copies for each database in Jetstress.
Database and log file location.
Assign paths for each database and log file. Place the database and its respective log file in the same location
unless the target Exchange deployment is configured differently.
5.4.2.5 Select Database Source

You can select from the following three options when you select a database source:
 Create new databases
 Attach existing databases
 Restore backup database
Typically the first run with a given configuration requires Jetstress to create new databases. Creating a new database
takes a long time, according to the Jetstress 2013, Jetstress Field Guide, expect approximately 24 hours for each 10 TB of
data. Subsequent runs can attach existing databases because the databases are saved between individual Jetstress
runs. However, a chance exists that performance might degrade on a database with each additional run, therefore,
create a new database for each run if time permits.
5.4.3 Start the Test
Prerequisites
Complete Section 5.4.2.1, Select Capacity and Throughput through Section 5.4.2.5, Select Database Source.
On the Review & Execute Test dialog within Jetstress, take the following steps:
1. Click Save test.
This step saves the test parameters into an XML file so you can use or review the configuration for future
Jetstress tests.
2. Click Prepare test.
This step creates and initializes the databases if they do not yet exist, or checksum existing databases before you
use them for testing. The database initialization process can be lengthy depending on the database size.
According to the Jetstress 2013, Jetstress Field Guide, expect approximately 24 hours for each 10 TB of data that
must be initialized.
3. On the screen that appears, click Execute test.
Execute the test according to the configuration specified and store results in the output directory specified.
5.4.3.1 Characterize the Jetstress Workload

Jetstress differs from other performance-oriented benchmarks like IOMeter in that Jetstress does not produce the
best performance numbers. Jetstress replicates an Exchange-type workload on a storage subsystem at a given
intensity level. The following distribution is the default for Exchange database operations in Jetstress:
 40 % insert
 35 % read
 20 % delete
 5 % update
The SluggishSessions variable also affects the workload. The SluggishSessions variable adds an additional
pause between each Jetstress task and permits additional tuning of the intensity level beyond the thread count. The
default SluggishSessions value is 1. Increase this value to decrease the number of IOPs achieved with the same
thread count.

- 80 -
October 2014 fio for Linux
You can modify all Jetstress parameters in the XML configuration file. Unless you have a very specific reason to
manually modify this file, do not do so.
Several streams exist in Jetstress during a test, each that exercise a different workload pattern. While it is difficult to
duplicate the workload patterns exactly, you can approximate them. Database operations consist of 32-KB random
reads and writes using a mix of approximately two database reads for each database write. The circular log operations
are 4-KB sequential writes with 256-KB sequential reads to replicate the logs for each database instance. Background
database maintenance is a separate 256-KB sequential read workload.
5.4.4 Interpret Jetstress Results
The output of a Jetstress test comes in several different files. For performance analysis, the
Performance_<date>.html file provides an easy way to read the test status in a single report. The report is
divided into the following sections:
 Test Summary
 Test Issues
 Database Sizing and Throughput
 Transactional I/O Performance
 Background Database Maintenance I/O Performance
 Log Replication I/O Performance
5.4.4.1 Transactional I/O Performance

The Transactional I/O Performance section displays the performance numbers for the Transactional I/O workloads
going to each Microsoft Exchange database instance. The following parameters are important:
 I/O Database Reads Average Latency (ms)
 I/O Database Writes Average Latency (ms)
 I/O Database Reads/sec
 I/O Database Writes/sec
 I/O Log Writes Average Latency (ms)
Note how close the actual latency metrics are to the latency requirements. Even if a test passes by meeting the latency
requirements it can still be a concern if the latency metrics are too close to the requirements where rerunning the test
could easily result in failing the criteria.
The sum of the I/O Database Reads/sec and Writes/sec add up to the Achieved Transactional IOPs. If the storage
subsystem is allocated evenly between the databases the read and write performance across databases should be
similar to one another. Uneven performance might indicate a performance issue that requires further investigation
beyond Jetstress.
5.4.4.2 Background Database Maintenance I/O Performance

This section displays the background database maintenance for each database instance, but is not a factor in
determining whether a Jetstress test passes or fails. The Database Maintenance I/O Reads/sec value should be greater
than 0, which indicates that the Background Database Maintenance was active during the test.
5.5 fio for Linux
fio is an open source, Linux community, I/O tool for benchmarking and system stress tests. fio simulates various I/O
workload types with support for multiple I/O engines and system level optimizations. fio interacts with the Linux

- 81 -
layers, resulting in complex tuning methods constantly under improvement. The fio user interface is via a command
line, so fio is not as visual as Iometer.
Download fio from http://freecode.com/projects/fio, which points you to the latest fio version. The tool is free and is
offered to the public under the GPLv2 license.
 Online fio Linux man page: http://linux.die.net/man/1/fio
 fio project Freecode site: http://freecode.com/projects/fio
5.5.1 Get Started with fio
Complete the following steps to get started with fio.

1. Install the libaio and libaio-devel libraries.
fio requires these libraries before fio is compiled in the following steps. Failure to install libaio and libaio-devel
libraries can cause fio to function incorrectly even if fio compiles cleanly.
2. Verify that you have a C compiler such as GCC with the necessary base libraries already installed and configured
on your machine.
3. Run ./configure, make, and make install to build and install fio.
4. Use the following guidelines when you create a job file.
— How you enter devices into the FIO job file is crucial. You must adhere to the following format because
multiple devices on the same filename= prevents FIO from properly distributing I/O and results in lower
performance than expected.
[job1]
filename=/dev/sda
[job1]
filename=/dev/sdb
— Include the time_based parameter so FIO adheres properly to the input run time (very important for long
run times, such as required for SSD preconditioning).
— For random I/O testing on SSDs, include the following global parameters because the FIO random generator
repeats LBAs and remembers LBA locations, which falsely doubles the performance.
norandommap
use_os_rand=1
randrepeat=0
— Place the readwrite parameter in the global section if the same pattern goes to all devices.
— To achieve the high performance required for many small block I/O request size tests, you must execute FIO
with the numactl command. See Section 5.3.2, Run Vdbench.
— The numactl command removes the need for cpu_allowed option and is required for the high
performance. The use of both options is not recommended.
The following sample job file example shows a basic job file used to issue 4-KB sequential write workloads to two
volumes. This sample demonstrates some important features that fio can control outside of the workload pattern,
including I/O engine and CPU affinity. The full available parameter set is covered in the fio manufacturing page,
available at http://linux.die.net/man/1/fio for reference. The package also includes a HOWTO file. Verify the page
against the manufacturing page for your specific fio version.
[global]
numjobs=1
bs=4k
ramp_time=15
runtime=45
direct=1
iodepth=32

- 82 -
readwrite=write
ioengine=libaio
group_reporting
[job1]
filename=/dev/sda
[job1]
filename=/dev/sdb
[job1]
filename=/dev/sdc
5.5.2 fio Performance-Related Parameters
fio includes two performance-related parameter categories. The first category is workload parameters similar to other
tools like IOMeter. The second category is unique to fio and lets you optimize fio for the system on which you run. The
following workload parameters are important:
readwrite, rw
Determines the workload pattern as either read/write/mix and random or sequential.
blocksize
Specifies the block size for each I/O request.
ramp_time
The amount of time, in seconds, to run the workload before logging any performance numbers.
runtime
The amount of time to run the workload and log performance numbers.
iodepth
The number of I/O requests to keep in flight against a target. This value equals the queue depth parameter in
other tools.
thinktime
The amount of time, in microseconds, between issuing individual I/O requests.
The following parameters are unique to fio:
ioengine
Defines what I/O library issues I/O requests. The I/O engine can largely impact the performance measured
under fio. The recommended Linux I/O library is libaio which is the Linux native asynchronous I/O. If you want
to simulate synchronous I/O, use either sync or vsync. Vsync does coalesce adjacent I/Os into a single request
so this might affect the performance measured. You may use other I/O engines, but on a case-by-case basis
and should be understood fully before implemented in any test.
direct
Determines whether non-buffered I/O is used fio. This parameter is the equivalent of the O_DIRECT flag when
opening a file in Linux.

- 83 -
fsync
Sets how many I/Os to perform before flushing the dirty data to the drive. By default this parameter is
disabled and no syncing occurs.
cpus_allowed, cpumask
These variables provide control in a setting that CPUs can be used for a job.
zero_buffers, refill_buffers, scramble_buffers
These settings determine what data is actually written to the targets during the test. Even if the data itself is
meaningless when running a performance test, some drives (SSDs in particular) might use data patterns for
compression. The default setting is to fill the buffers with random data and scramble them. It is
recommended that these settings are not adjusted unless for some specific purpose.
rate, rate_iops
fio can cap the workload intensity based on the bandwidth or IOPs specified.
5.5.3 Interpret fio Output
fio outputs the results of a performance run to stdout in Linux. Capture this output in a text file by using the >
operator on the command line in Linux or the --output option on the command line when you run fio.
Additionally, you can use group_reporting in fio that can be specified with the other workload parameters in the
job file or the command line. If you set the group_reporting option, the results display on a per-group basis
rather than on a per-job basis. It is recommended not to enable group_reporting because the individual results
of each job, which might be useful for debugging purposes later, are hidden. Use the minimal option to cause results
to be on a per-job basis, but semi-colon delimited. This option enables you to import the data to a spreadsheet.
Extract the following metrics from a fio output:
 bw: The bandwidth measured during the test. This value can be expressed as KB/s or MB/s.
 iops: The IOPs measured during the test.
 slat, clat, lat: The submission, completion, and overall latency respectively expressed in terms of
minimum, maximum, average, and standard deviation.
 cpu: CPU use in terms of percent use by the user and system.
The following sample fio output is for a test with one device with an 8-KB random read and write mixed workload at
an iodepth of 8. The metrics are separated for the read and write components of the workload.
/dev/sdb: (g=0): rw=randrw, bs=8K-8K/8K-8K, ioengine=libaio, iodepth=8
fio 1.58
Starting 1 process
/dev/sdb: (groupid=0, jobs=1): err= 0: pid=5608
read : io=38246MB, bw=130546KB/s, iops=16318 , runt=300001msec
slat (usec): min=3 , max=174 , avg= 4.83, stdev= 1.49
clat (usec): min=35 , max=20358 , avg=416.31, stdev=411.26
lat (usec): min=51 , max=20363 , avg=421.82, stdev=411.26
bw (KB/s) : min=41456, max=146416, per=100.03%, avg=130586.26, stdev=11257.34
write: io=16382MB, bw=55916KB/s, iops=6989 , runt=300001msec
slat (usec): min=3 , max=171 , avg= 5.22, stdev= 1.56
clat (usec): min=21 , max=20315 , avg=147.51, stdev=152.05
lat (usec): min=46 , max=20320 , avg=153.40, stdev=151.98
bw (KB/s) : min=19072, max=63520, per=100.05%, avg=55940.49, stdev=4872.83
cpu : usr=11.82%, sys=13.98%, ctx=2584617, majf=0, minf=18
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=100.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.1%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%

- 84 -
October 2014 Verify Benchmark Results for Validity
issued r/w/d: total=4895500/2096847/0, short=0/0/0

lat (usec): 40=0.02%, 60=7.29%, 80=2.61%, 100=1.65%, 200=9.31%
lat (usec): 400=61.86%, 600=9.75%, 800=2.34%, 1000=1.19%
lat (msec): 2=2.46%, 4=1.50%, 6=0.02%, 8=0.01%, 10=0.01%
lat (msec): 20=0.01%, 40=0.01%
Run status group 0 (all jobs):
READ: io=38246MB, aggrb=130546KB/s, minb=133679KB/s, maxb=133679KB/s,
mint=300001msec, maxt=300001msec
WRITE: io=16382MB, aggrb=55915KB/s, minb=57257KB/s, maxb=57257KB/s,
mint=300001msec, maxt=300001msec
Disk stats (read/write):
sdb: ios=4895030/2096620, merge=0/0, ticks=1997589/293869, in_queue=2290517,
util=100.00%
5.6 Verify Benchmark Results for Validity
After you gather the results from your benchmark tools, verify the results for validity. Some performance results might
look as expected, but anomalies might exist during the performance runs and all the results might not be valid. In
such cases, it is a good practice to rerun the tests. You can also run tests in multiple sets, take the average, and check
the standard deviation. Comparing your results between runs helps you identify variables that change during the run
or between runs. Use the following guidelines to verify your results:
 Look for errors. Valid results do not contain any errors.
— Check the operating system log files and controller log files for errors. Clean the errors and logs before any
performance test so you can easily check for errors after the test.
— Benchmark tools such as IOMeter, provide an Errors metric which should be zero. If the value is non-zero, find
the reason for the errors and rerun your test after you resolve the cause of the errors.
 When running multiple sets of tests, look for outliers that affect the average and standard deviation.
 CPU usage is a good indicator of other applications that burden the processors. The CPU use should not be
too high.
— If you suspect any issues, use Start Task Manager > Performance to view the CPU Usage graphs. Rerun your
tests and make sure the CPUs are loaded uniformly and none of the CPUs approach 100%.
— The Number of workers used for the test might be too low. For example, when the test is for drive scaling, the
performance might not scale after a specific number of drives, which usually overloads some of the CPUs and
keeps the others unused.
 Change in Number of workers/threads might cause low or inconsistent performance. This problem is common with
IOMeter as the number of workers assigned in a saved configuration file (*.icf) can change when you reload
that file for the next run. This change can occur if your topology or volume configuration changes, so the saved
parameters do not match the new configuration.
 Sudden drop in performance for a short interval or the performance stays at a lower level for a long time. These
indicate background operations and transient errors during your tests. For example, a drive failure might cause a
rebuild and cause decreased performance. Or, an unsupported command by a drive or an expander that always
fails can reduce the performance if the same command is issued periodically.
 Insufficient ramp time can cause decreased and inconsistent performance. Some configurations that involve
cache can have longer transient time, so a longer ramp time or pause between tests is important. Without such
interval, the average performance is lower and inconsistent across runs.

- 85 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 6: Compare Measured Results with Expected Results
October 2014 Performance Result Examples for MegaRAID
Chapter 6: Compare Measured Results with Expected Results

After you verify the results as valid you can compare the results with the expected results, calculated with the help of
Chapter 2. You can also compare the results against what the product vendor publishes. The following section
provides example results for a few configurations that Avago usually uses in its Performance Lab for its
regression runs.
Example best-case performance results for various configuration options in the Avago performance tuning lab are
presented in the following sections. Your results might differ.
NOTE The values in this chapter are examples. Your configurations might not
exactly match and you might need to evaluate the numbers for your
configurations to compare with your actual results. Contact your FAE
for specific and recent product performance results.
6.1 Performance Result Examples for MegaRAID
The following topologies were used as examples:

 8 drive direct attached
 24 drive expander attached
The following are additional test configuration inputs for these examples:
 Avago tests each topology with SAS and SATA HDDs and SSDs.
 Only R0, R1, R10, R5, R6 configurations are tested. R50 and R60 volumes are not tested. Avago expects that the R5
and R6 results are representative of R50 and R60 results, respectively.
— HDDs use the following RAID settings: 256-KB stripe size, Write Back, Read Ahead, Direct IO
— SSDs use the following RAID settings: 64-KB stripe size, Write Through, No Read Ahead, Direct IO
 The results come from Iometer 1.1.0 under Windows Server 2008 Enterprise.
 The 8 SAS SSD data uses 12Gb/s SAS SSD.
 Configurations with 8 and 24 drives are generally disk-limited, and do not maximize all performance metrics.
 The 24 and 40 drive configurations use a LSISAS3x48 expander with DataBolt enabled.
 Configurations with 40 drives have sufficient drives to enable saturation of all throughput metrics with large I/O.
 To provide a realistic test point, the 24 and 40 drive configurations use two RAID volumes with the total drives
evenly split between the volumes.
 4-KB I/O size is selected to showcase the maximum IOPS
 256-KB I/O size is selected to showcase the maximum MB/s
6.1.1 Eight Drive Direct Attached Example Results
The following table gives maximum throughput results for eight-drive configurations that use 256-KB I/O. At 256-KB
I/O, it is possible to demonstrate maximum throughput if enough drives are present.

- 86 -
Table 12 LSISAS3108 Performance Results for One RAID Volume in MB/s
HDD HDD HDD HDD SSD SSD SSD SSD
MegaRAID 6.4 Write-Back Write-Through
256 Q per Volume 256 KB SR 256 KB SW 256 KB RR 256 KB RW 256 KB SR 256 KB SW 256 KB RR 256 KB RW
8 Drives, One RAID Volume
RAID 0, 8x SAS 1511 1494 366 373 5920 3575 5878 1211
RAID 0, 8x SATA 1385 1363 167 187 3501 2792 3396 1071
RAID 10, 8x SAS 1,234 747 375 202 3895 1,722 3340 607
RAID 10, 8x SATA 1,049 674 170 94 3616 1398 3545 551
RAID 5, 8x SAS 1,074 1301 367 116 5882 2,423 5,862 553
RAID 5, 8x SATA 1216 1173 167 58 3615 2,062 3525 549
RAID 6, 8x SAS 998 1,116 366 72 5955 2051 5838 440
RAID 6, 8x SATA 877 991 165 38 3,618 1919 3,418 473
The following table gives maximum throughput results for eight-drive configurations that use 4-KB I/O.
Table 13 LSISAS3108 Controller Performance Results for One RAID Volume in K IOPs
HDD HDD HDD HDD HDD SSD SSD SSD SSD SSD
256 Q per Volume 4 K SR 4 K SW 4 K RR 4 K RW 4 K R 67R 4 K SR 4 K SW 4 K RR 4 K RW 4 K R 67Ra

8 Drives, One RAID Volume
RAID 0, 8x SAS 379 381 3.8 5.3 4.2 415 382 319 145 270
RAID 0, 8x SATA 335 348 1.4 2.2 1.6 392 389 294 129 238
RAID 10, 8x SAS 190 102 3.9 2.8 3.4 407 314 277 72 186
RAID 10, 8x SATA 174 170 1.5 1.3 1.3 400 295 599 64 162
RAID 5, 8x SAS 331 333 3.8 1.5 2.5 415 190 282 37 109
RAID 5, 8x SATA 291 299 1.5 0.7 1 407 272 284 37 104
RAID 6, 8x SAS 281 286 3.8 0.9 2.3 416 285 281 22 74
RAID 6, 8x SATA 218 256 1.5 0.5 1.2 394 287 271 25 65
a. Refers to a 4-KB random 67% read, 33% write I/O sequence.
6.1.2 Twenty-four Drive Expander Attached Example Results
The following table gives maximum throughput results for 24-drive configurations that use 256-KB I/O. At 256-KB I/O,
it is possible to demonstrate maximum throughput if enough drives are present.

- 87 -
Table 14 LSISAS3108 Performance Results for Two RAID Volumes in MB/s
HDD HDD HDD HDD SSD SSD SSD SSD
256 Q per Volume 256KB SR 256 KB SW 256 KB RR 256 KB RW 256 KB SR 256 KB SW 256 KB RR 256 KB RW
24 Drives, Two RAID Volumes
RAID 0, 24x SAS 4,444 4,421 1,021 998 5,266 5,267 4,311 3,544
RAID 0, 24x SATA 4,031 3,918 485 453 5210 5686 4,187 3,188
RAID 10, 24x SAS 3,669 1,928 1,053 524 5,992 2,798 5,912 1,927
RAID 10, 24x SATA 3,072 1,843 496 261 4,633 2,067 4,485 1,465
RAID 5, 24x SAS 3,815 3008 1,020 296 4,914 2,410 2,948 533
RAID 5, 24x SATA 3,964 2482 484 144 4,789 2,386 3,871 513
RAID 6, 24x SAS 3,728 2,817 1,023 220 4,909 1,768 3,985 538
RAID 6, 24x SATA 3,353 1,974 483 107 4,803 1,725 3,887 503
The following table gives maximum throughput results for 24-drive configurations that use 4-KB I/O.
Table 15 LSISAS3108 Controller Performance Results for Two RAID Volumes in K IOPs
HDD HDD HDD HDD HDD SSD SSD SSD SSD SSD
256 Q per Volume 4 K SR 4 K SW 4 K RR 4 K RW 4 K R 67R 4 K SR 4 K SW 4 K RR 4 K RW 4 K R 67Ra

RAID 0, 24x SAS 713 678 10.3 13.7 11.3 689 686 488 415 485
RAID 0, 24x SATA 600 578 4.3 5.9 4.6 669 666 466 383 456
RAID 10, 24x SAS 564 443 10.8 7.5 9.4 745 644 468 93 238
RAID 10, 24x SATA 450 487 4.5 3.2 3.8 586 644 372 94 238
RAID 5, 24x SAS 735 713 10.3 3.9 6.7 716 170 539 38 111
RAID 5, 24x SATA 591 601 4.2 1.8 2.7 714 162 477 36 104
RAID 6, 24x SAS 729 689 10.3 2.7 5.4 694 195 534 13 39
RAID 6, 24x SATA 586 593 4.2 1.2 2.1 690 172 367 13 38
a. Refers to a 4-KB random 67% read, 33% write I/O sequence.
6.1.3 Forty Drive Expander Attached Example Results
The following table gives maximum throughput results for 40-drive configurations that use 256-KB I/O. At 256-KB I/O,
it is possible to demonstrate maximum throughput if enough drives are present.

- 88 -
October 2014 Performance Results Examples for IT Controllers
Table 16 LSISAS3108 Performance Results for Two RAID Volumes in MB/s for 256 KB
HDD HDD HDD HDD
MegaRAID 6.4 Write-Back
256 Q per Volume 256 KB SR 256 KB SW 256 KB RR 256 KB RW

RAID 0, 40x SAS 5,808 6,343 1,545 1,535
RAID 0, 40x SATA 5,825 5,358 740 695
RAID 10, 40x SAS 5,792 3,092 1,606 810
RAID 10, 40x SATA 5,019 2,665 774 378
RAID 5, 40x SAS 5,789 3,111 1,543 450
RAID 5, 40x SATA 5,842 3,051 736 221
RAID 6, 40x SAS 5,787 2,912 1,541 326
RAID 6, 40x SATA 5,610 2,825 739 163
The following table gives maximum throughput results for 40-drive configurations that use 4-KB I/O.
Table 17 LSISAS3108 Controller Performance Results for Two RAID Volumes in K IOPs
HDD HDD HDD HDD HDD
MegaRAID 6.2 Write-Back
256 Q per Volume 4 K SR 4 K SW 4 K RR 4 K RW 4 K R 67R

RAID 0, 40x SAS 692 673 14.9 20.3 16.7
RAID 0, 40x SATA 588 580 6.2 8.9 6.8
RAID 10, 40x SAS 731 633 16 11.5 14.1
RAID 10, 40x SATA 577 523 6.7 4.9 5.8
RAID 5, 40x SAS 704 703 14.9 6.2 10.0
RAID 5, 40x SATA 599 612 6.2 2.7 4.1
RAID 6, 40x SAS 693 697 14.9 4.2 8.1
RAID 6, 40x SATA 591 608 6.0 1.9 3.3
6.2 Performance Results Examples for IT Controllers
The following topologies were used as examples:

 8 drive direct attached
The following are additional test configuration inputs for these examples:
 Each topology implements SAS and SATA HDDs and SSDs.
 The results come from Iometer 1.1.0 under Windows Server 2008 Enterprise.

- 89 -
 The 8 SAS SSD data uses 12Gb/s SAS SSD.

 The 24 and 40 drive configurations use a LSISAS3x48 expander with DataBolt enabled.
 Firmware phase 5.0.0.0.
6.2.1 Eight Drive Direct Attached Example Results
The following table gives maximum latency performance results for 8-drive configurations.
Table 18 LSISAS3008 Controller Maximum Latency Performance Results in msec
HDD SSD
4Q
4K SR 4K SW 4K RR 4K RW 4K SR 4K SW 4K RR 4K RW
JBOD 8x SAS 5.5 25.2 167 35 2.2 16.3 2 13
JBOD 8x SATA 50.5 96.5 332 111 11.6 13.5 2 16
The following tables give maximum throughput results for 8-drive configurations.
Table 19 LSISAS3008 Controller Maximum Throughput Performance Results in K IOPs
HDD SSD
32 Q
0.5K SR 0.5K SW 4K SR 4K SW 4K SR 4K SW 4K RR 4K RW
JBOD 8x SAS 1,286 1,023 394 394 1,018 767 921 147
JBOD 8x SATA 588 542 359 350 512 413 502 134
Table 20 LSISAS3008 Controller Maximum Throughput Performance Results in MB/s
HDD SSD
32 Q
256K SR 256K SW 256K SR 256K SW 256K RR 256K RW
JBOD 8x SAS 1,541 1,538 5,870 3,551 5,875 1,193
JBOD 8x SATA 1,404 1,391 3,824 2,913 3,814 1,089
6.2.2 Twenty-four Drive Expander Attached Example Results
HDD SSD
4Q
4K SR 4K SW 4K RR 4K RW 4K SR 4K SW 4K RR 4K RW
JBOD 24x SAS 5.25 13.93 186.15 34.51 2.24 13.84 11.63 17.21
JBOD24x SATA 58.23 101.85 363.55 138.32 2.12 5.33 2.06 15.01

- 90 -
HDD SSD
32 Q
0.5K SR 0.5K SW 4K SR 4K SW 4K SR 4K SW 4K RR 4K RW
JBOD24x SAS 1,118 1,048 1,146 1,033 994 1,000 1,097 440
JBOD24x SATA 817 790 618 612 639 637 649 403
HDD SSD
32 Q
256K SR 256K SW 256K SR 256K SW 256K RR 256K RW
JBOD24x SAS 4,596 4,586 5,865 6,535 5,866 3,591
JBOD24x SATA 4,197 4,167 5,844 6,095 5,825 3,255
6.2.3 Forty Drive Expander Attached Example Results
HDD
4Q
4K SR 4K SW 4K RR 4K RW
JBOD 40x SAS 36.33 15.27 166 36.53
JBOD 40x SATA 57.69 77.47 393.89 196.57
HDD
32 Q
0.5K SR 0.5K SW 4K SR 4K SW
JBOD 40x SAS 1113 1015 1008 1022
JBOD 40x SATA 925 890 786 773
HDD
32 Q
256K SR 256K SW
JBOD 40x SAS 5467 6527
JBOD 40x SATA 5850 6340

- 91 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 7: Troubleshoot Performance Issues
October 2014
Chapter 7: Troubleshoot Performance Issues

If the measured performance test results do not match the expected results, many parameters could be the cause.
This chapter presents a few such parameters in accordance with the best practices and guidelines discussed earlier in
this document.
Understand the Issue
When you see a discrepancy in the results, you must first understand the discrepancy. This understanding requires
additional questioning and running debugging tests. Ask the following questions:
 Is the issue repeatable?
 Are the results reliable?
 Does the issue vary over time? Improve or worsen?
 Do system reboots affect the issue?
 Is the issue because of the drive scaling?
— Results are as expected with a lower number of drives, but not when you add drives?
— Results are as expected with an even number of drives, but not an odd number of drives?
 Is the issue an effect of Qd variation?
— Does increasing or decreasing the Qd affect the issue?
 Is the issue an effect of another operation occurring in parallel?
— Errors because of signal integrity?
— Links go up and down and cause discovery, and affect performance?
— Background operations running?
— Do any controller logs show errors?
— Do any operating system logs show errors?
— Are other devices using significant CPU resources?
 Is the issue an effect of cache?
— Does the issue vary over time and is not consistent?
— Does the issue go away if you change any read/write cache setting?
— Does performance return to normal after running the same I/O for longer duration?
— Does changing the I/O order solve the issue or result in different behaviors?
 Is the issue an effect of process affinity?
— Performance results are inconsistent, but the difference between the runs is always the same?
— Does running on a specific processor, or a set of processors, change the performance?
 Is the issue an effect of uninitialized volume?
— Does the issue go away after volume initialization?
 Is the issue a protocol bottleneck?
— Refer to Chapter 2 and re-evaluate your bottlenecks to see if any of the SAS, PCIe, or DDR bottleneck values
match your performance result maximum.
 Is the issue a link width issue?
— Change the slot or cable. Is the issue resolved?
 Is the issue a benchmarking tool issue?
— Does changing your benchmark tool to a different tool or a different version resolve the issue?
— Change different parameters of the benchmark. Is the issue resolved? For example, change the number of
threads/workers, sampling interval, ramp time, and run time.
 Is the issue an effect of insufficient preconditioning?
— Rerun the tests after longer preconditioning. Or run the same test for longer duration. Does the
performance improve?

- 92 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Chapter 7: Troubleshoot Performance Issues
October 2014
 Is the issue a thermal issue?

— Was the server lid open? Does closing the lid resolve the issue?
— Are any server components too hot?
 Is the issue an effect of a bug in a specific software, hardware, or firmware version?
— Update or roll back the system BIOS. Is the issue resolved?
— Update or roll back the controller BIOS, firmware, and driver. Is the issue resolved?
Troubleshooting with such questions helps you identify the issue and it becomes easier to understand and then
resolve the issue. After you resolve the issue, rerun your tests to make sure the results meet the expectations. Fixing
one bottleneck might advance you to another hurdle, but the troubleshooting continues until you reach the
expected results.

- 93 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Appendix A: Performance Testing Checklist
October 2014
Appendix A: Performance Testing Checklist

The following checklist highlights the configuration and run-time settings to consider before you start a performance
run. See related prior sections of this document for details on each item.

- 94 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide
October 2014
Expected Results
 Maximum MB/s = Are there enough targets to support ?
 Maximum MB/s = Is there enough PCIe bandwidth to support ?
 Maximum IOPs = Are there enough targets to support ?
 Maximum IOPs = Is there enough CPU capacity to support ?
 Latency = Is the desired cache setting used to support desired Latency ?
 Scaling = Are all the targets the same model and capacity for good scaling ?
System
 Slot, PCIe revision and width = ___________________  Disable BIOS power saving mode options.
 PCIe slot is attached to processor on which the controller driver will  Enable BIOS maximum cooling options.
run.
 PCIe slot supports maximum expected MB/s.  BIOS settings are set for maximum PCIe performance (speed, write
packet size, burst).
Operating System
 Operating system is known version and revision.  Targets appear in the storage management tool.
 Install necessary performance patches.  Targets are all online, active, and initialized.
 Use the correct file system. For maximum performance, use no file  Event Log or Error Log is clear of I/O related messages.
system (use raw device).
 Turn off any unnecessary background tasks.
Controller
 Controller is installed in the desired PCIe slot.  Controller heartbeat LED is flashing.
 I/O controller chip revision is correct.  Controller has a unique WWID visible from query tool.
Targets
 All targets are visible and initialized via configuration tool.  Each target is negotiated to the desired SAS speed.
 All targets are the same model number.  Linux: Set each target parameter: scheduler, queue, … as desired.
 All targets use the same firmware revision.
Driver
 Driver phase matches the controller firmware phase.  Set the coalescing depth as desired (default is 4).
 Driver is loaded and running. The controller is visible from OS  Set the coalescing timeout as desired (default is 10).
storage configuration tool.
 Set the maximum outstanding I/O as desired (default is ).  All MSI-X vectors are visible.
Firmware
 Firmware version is latest GCA, or as desired.  Firmware is loaded and running (Heartbeat).
 Firmware phase matches the driver phase.
Benchmark
 Benchmark tool is the recommended revision.  Set I/O size as desired and is consistent across all targets.
 Benchmark tool can see all targets or volumes.  Set Run and Ramp times as appropriate for test and target type.
 Set Qd or Thread count as desired and consistently across all targets.
Configuration Information
 Gather and record basic configuration information.
Firmware revision, driver revision, target model, target firmware revision, controller model, operating system name, operating system
revision, benchmark tool name, benchmark tool revision, PCIe slot number, PCIe slot width, PCIe slot speed, CPU model, number of CPUs,
number of active cores, CPU frequency, and so on.

- 95 -
Avago 6Gb/s SAS and 12Gb/s SAS Performance Tuning Guide Revision History
October 2014
Revision History
Version 1.0, October 2014
The following document changes were made.

 Updated access information for the MegaCLI and StorCLI general debug tools.
 Implemented clarifications throughout the document.
 Added MegaRAID Driver support of Windows Server 2012 R2
 Updated SCSI Queue Depth with specific setting information regarding MegaRAID.
 Added Nomerges Setting section.
 Added VMWare operating system optimization information.
 Fixed values and units in the Throughput Snapshot for Drives table.
 Updated terminology in the MegaRAID FasthPath section.
 Updated the tools used to configure RAID virtual drives.
 Document reorganization.
Advance, Version 0.1, March 2014
Initial document release.

- 96 -

489 6GbsSAS 12Gbs PerfTuningGuide

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

489 6GbsSAS 12Gbs PerfTuningGuide

Transféré par

Droits d'auteur :

Formats disponibles

Avago 6Gb/s SAS and 12Gb/s SAS Performance

Corporate Headquarters Email Website

Avago Technologies Confidential

Chapter 2: Calculate Expected Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

Chapter 3: Build Your Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

Chapter 4: Configure Your Test Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

Avago Technologies Confidential

4.1.2 Linux Operating System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Chapter 5: Benchmark Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Avago Technologies Confidential

5.5.2 fio Performance-Related Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Chapter 6: Compare Measured Results with Expected Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

Chapter 7: Troubleshoot Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

Appendix A: Performance Testing Checklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Avago Technologies Confidential

Avago Technologies Confidential

Chapter 6, Compare Measured Results with Expected Results

1.2 Performance Metrics

Avago Technologies Confidential

1.3 Performance Measurement Characteristics

1.4 Performance Testing Overview

Avago Technologies Confidential

General Guidelines for Better Performance Measurements

Problem Potential Causes

Avago Technologies Confidential

Problem Potential Causes

Avago Technologies Confidential

 12Gb/s SAS Controllers

Avago Technologies Confidential

Chapter 2: Calculate Expected Performance

2.1 Bottlenecks and Limitations

Figure 1 Example Storage Configuration

$$2 3!3 $RIVE,IMITATIONS $RIVES

$$2 3$2!-S $$2 3$2!-S

Avago Technologies Confidential

For an I/O read or write, the I/O path is as follows:

Table 1 I/O Path Elements that Affect Performance

I/O Path Elements Factors that Affect Performance

Avago Technologies Confidential

Table 1 I/O Path Elements that Affect Performance (Continued)

I/O Path Elements Factors that Affect Performance

This section presents maximum interface and drive limitations, including:

2.2.1 Interface Connection Limitations

Table 2 Generation 2 Interface Connection Limitations

Avago Technologies Confidential

Table 2 Generation 2 Interface Connection Limitations (Continued)

Table 3 Generation 3 Interface Connection Limitations

Disk Drive Limitations

Table 4 Disk Drive Interface Limitations

Generation Drive Type Disk K IOPs Sustained MB/s

Avago Technologies Confidential

2.2.2 Device Hardware Limitations

Table 5 Device Hardware Maximum Performance

SAS Maximum Read SAS Maximum Write

NOTE Do not expect RAID performance to equal JBOD performance. RAID

2.2.3 Bottleneck Examples

Avago Technologies Confidential

2.2.3.1 6Gb/s SAS Controller Bottleneck Example

Figure 2 6Gb/s SAS Controllers, Revision C1 and Later

'"S$$2 3$2!-S '"S$$2 3$2!-S

Avago Technologies Confidential

2.2.3.2 12Gb/s SAS Controller PCIe Bottleneck Example

Figure 3 12Gb/s SAS Controllers with PCIe Bottleneck

'"S$$2 3$2!-S '"S$$2 3$2!-S

$$2 3$2!-S $$2 3$2!-S

'"S$$2 3$2!-S '"S$$2 3$2!-S

'"S$$2 3$2!-S '"S$$2 3$2!-S

'"S$$2 3$2!-S '"S$$2 3$2!-S

'"S$$2 3$2!-S '"S$$2 3$2!-S

'"S$$2 3$2!-S '"S$$2 3$2!-S

3,/4 $-) 3,/4

!$$2 #42, $!4! 83!30/243

3%2 03 2*

3!33!4! 3!33!4! 3!33!4!