Vous êtes sur la page 1sur 562

Front cover

IBM Scale Out Network Attached Storage


Architecture, Planning, and Implementation Basics
Shows how to set up and customize the IBM Scale Out NAS Details the hardware and software architecture Includes daily administration scenarios

Mary Lovelace Vincent Boucher Shradha Nayak Curtis Neal Lukasz Razmuk John Sing John Tarella

ibm.com/redbooks

International Technical Support Organization IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics December 2010

SG24-7875-00

Note: Before using this information and the product it supports, read the information in Notices on page xiii.

First Edition (December 2010) This edition applies to IBM Scale Out Network Attached Storage V1.1.1.

Copyright International Business Machines Corporation 2010. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

Contents
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiv Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv The team who wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Stay connected to IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xviii Chapter 1. Introduction to IBM Scale Out Network Attached Storage . . . . . . . . . . . . . . 1 1.1 Marketplace requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Understanding I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 File I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 Block I/O . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.3 Network Attached Storage (NAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3 Scale Out Network Attached Storage (SONAS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.1 SONAS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.2 SONAS scale out capability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.3 SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3.4 High availability design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 1.4 SONAS architectural concepts and principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4.1 Create, write, and read files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.2 Creating and writing a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.3 Scale out more performance. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4.4 Reading a file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.4.5 Scale out parallelism and high concurrency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.6 Manage storage centrally and automatically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4.7 SONAS logical storage pools for tiered storage . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4.8 SONAS Software central policy engine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.4.9 High performance SONAS scan engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 1.4.10 High performance physical data movement for ILM / HSM. . . . . . . . . . . . . . . . . 32 1.4.11 HSM backup/restore to external storage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.4.12 Requirements for high performance external HSM and backup restore . . . . . . . 34 1.4.13 SONAS high performance HSM using Tivoli Storage Manager . . . . . . . . . . . . . 34 1.4.14 SONAS high performance backup/restore using Tivoli Storage Manager . . . . . 35 1.4.15 SONAS and Tivoli Storage Manager integration in more detail . . . . . . . . . . . . . 36 1.4.16 Summary: Lifecycle of a file using SONAS Software . . . . . . . . . . . . . . . . . . . . . 39 1.4.17 Chapter summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Chapter 2. Hardware architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Interface nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Storage nodes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.3 Management nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Internal InfiniBand switch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Internal private Ethernet switch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 External Ethernet switches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 External ports: 1 GbE / 10 GbE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Copyright IBM Corp. 2010. All rights reserved.

41 42 43 45 45 47 47 48 49 49 iii

2.3 Storage pods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 SONAS storage controller. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 SONAS storage expansion unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Connection between components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.1 Interface node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.2 Storage node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.3 Management node connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.4 Internal POD connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.5 Data InfiniBand network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.6 Management Ethernet network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4.7 Connection to the external customer network. . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 SONAS configurations available . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 Rack types: How to choose the correct rack for your solution . . . . . . . . . . . . . . . 2.5.2 Drive types: How to choose between various drive options . . . . . . . . . . . . . . . . . 2.5.3 External ports: 1 GbE / 10 GbE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 SONAS with XIV storage overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.1 Differences between SONAS with XIV and standard SONAS system . . . . . . . . . 2.6.2 SONAS with XIV configuration overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6.3 SONAS base rack configuration when used with XIV storage . . . . . . . . . . . . . . . 2.6.4 SONAS with XIV configuration and component considerations . . . . . . . . . . . . . .

50 51 53 53 53 56 57 58 59 60 60 61 61 66 67 68 68 69 70 70

Chapter 3. Software architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.1 SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 3.2 SONAS data access layer: File access protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.2.1 File export protocols: CIFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.2.2 File export protocols: NFS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2.3 File export protocols: FTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.2.4 File export protocols: HTTPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.2.5 SONAS locks and oplocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 3.3 SONAS Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 3.3.1 Introduction to the SONAS Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.3.2 Principles of SONAS workload allocation to interface nodes . . . . . . . . . . . . . . . . 81 3.3.3 Principles of interface node failover and failback . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.3.4 Principles of storage node failover and failback . . . . . . . . . . . . . . . . . . . . . . . . . . 84 3.3.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.3.6 SONAS Cluster Manager manages multi-platform concurrent file access . . . . . . 86 3.3.7 Distributed metadata manager for concurrent access and locking . . . . . . . . . . . . 88 3.3.8 SONAS Cluster Manager components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 3.4 SONAS authentication and authorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 3.4.1 SONAS authentication concepts and flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.4.2 SONAS authentication methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 3.5 Data repository layer: SONAS file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 3.5.1 SONAS file system scalability and maximum sizes . . . . . . . . . . . . . . . . . . . . . . . 97 3.5.2 Introduction to SONAS file system parallel clustered architecture . . . . . . . . . . . . 98 3.5.3 SONAS File system performance and scalability . . . . . . . . . . . . . . . . . . . . . . . . . 98 3.6 SONAS data management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 3.6.1 SONAS: Using the central policy engine and automatic tiered storage . . . . . . . 107 3.6.2 Using and configuring Tivoli Storage Manager HSM with SONAS basics . . . . . 111 3.7 SONAS resiliency using Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.7.1 SONAS Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.7.2 Integration with Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.8 SONAS resiliency using asynchronous replication . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.9 SONAS and Tivoli Storage Manager integration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

iv

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

3.9.1 General Tivoli Storage Manager and SONAS guidelines . . . . . . . . . . . . . . . . . . 3.9.2 Basic SONAS to Tivoli Storage Manager setup procedure. . . . . . . . . . . . . . . . . 3.9.3 Tivoli Storage Manager software licensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.9.4 How to protect SONAS files without Tivoli Storage Manager . . . . . . . . . . . . . . . 3.10 SONAS system management services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.1 Management GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.2 Health Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.3 Command Line Interface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10.4 External notifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11 Grouping concepts in SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.1 Node grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.11.2 Node grouping and async replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12 Summary: SONAS Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.1 SONAS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.12.2 SONAS goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 4. Networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Review of network attached storage concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 File systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Redirecting I/O over the network to a NAS device . . . . . . . . . . . . . . . . . . . . . . . 4.1.3 Network file system protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.4 Domain Name Server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.5 Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Domain Name Server as used by SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Domain Name Server configuration best practices. . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Domain Name Server balances incoming workload . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Interface node failover / failback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Bonding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 Bonding modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 Monitoring bonded ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Network groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Implementation networking considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 Network interface names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.2 Virtual Local Area Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.3 IP address ranges for internal connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.4 Use of Network Address Translation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.5 Management node as NTP server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.6 Maximum transmission unit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5.7 Considerations and restrictions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 The impact of network latency on throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 5. SONAS policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Creating and managing policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.1 File policy types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 Rule overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.3 Rule types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.4 SCAN engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.5 Threshold implementation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 SONAS CLI policy commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 mkpolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Changing policies using chpolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Listing policies using the lspolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Applying policies using the setpolicy command . . . . . . . . . . . . . . . . . . . . . . . . .

121 124 125 126 126 127 131 131 132 133 134 137 137 137 139 141 142 142 142 144 145 145 145 146 147 148 150 150 151 151 153 153 153 153 154 154 155 155 155 159 160 160 160 160 161 162 162 163 164 164 165

Contents

5.2.5 Running policies using runpolicy command . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.6 Creating policies using mkpolicytask command . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 SONAS policy best practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.1 Cron job considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Policy rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.3 Peered policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.4 Tiered policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.5 HSM policies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.6 Policy triggers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.7 Weight expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.8 Migration filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.9 General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4 Policy creation and execution walkthrough . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.1 Creating a storage pool using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.2 Creating a storage pool using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.3 Creating and applying policies using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.4 Creating and applying policies using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.4.5 Testing policy execution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 6. Backup and recovery, availability, and resiliency functions. . . . . . . . . . . 6.1 High availability and data protection in base SONAS . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Cluster Trivial Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.2 DNS performs IP address resolution and load balancing . . . . . . . . . . . . . . . . . . 6.1.3 File sharing protocol error recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Backup and restore of file data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.1 Tivoli Storage Manager terminology and operational overview . . . . . . . . . . . . . 6.2.2 Methods to back up a SONAS cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Tivoli Storage Manager client and server considerations . . . . . . . . . . . . . . . . . . 6.2.4 Configuring interface nodes for Tivoli Storage Manager. . . . . . . . . . . . . . . . . . . 6.2.5 Performing Tivoli Storage Manager backup and restore operations. . . . . . . . . . 6.2.6 Using Tivoli Storage Manager HSM client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.1 Snapshot considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.2 VSS snapshot integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3.3 Snapshot creation and management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Local and remote replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Synchronous versus asynchronous replication. . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.2 Block level versus file level replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.3 SONAS cluster replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Local synchronous replication. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.5 Remote async replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Disaster recovery methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.1 Backup of SONAS configuration information . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.2 Restore data from a traditional backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Restore data from a remote replica. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 7. Configuration and sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 Tradeoffs between configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Rack configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 InfiniBand switch configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Storage Pod configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Interface node configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Rack configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

165 165 165 165 167 168 168 169 169 170 170 170 171 171 173 175 176 178 181 182 182 184 185 185 185 186 186 187 189 190 193 193 194 194 198 199 199 199 200 205 217 217 219 219 221 222 222 223 223 224 226

vi

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

7.2 Considerations for sizing your configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Inputs for SONAS sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Application characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 Workload characteristics definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Workload characteristics impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Workload characteristics measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Powers of two and powers of ten: The missing space . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Sizing the SONAS appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.1 SONAS disk drives and capacities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.2 SONAS disk drive availabilities over time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.3 Storage subsystem disk type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.5.4 Interface node connectivity and memory configuration. . . . . . . . . . . . . . . . . . . . 7.5.5 Base rack sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Workload analyzer tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8. Installation planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Physical planning considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.1 Space and floor requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.2 Power consumption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.3 Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1.4 Heat and cooling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Installation checklist questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Storage considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 Storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.2 Async replication considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.3 Block size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.4 File system overhead and characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.5 SONAS master file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.6 Failure groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.7 Setting up storage pools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 SONAS integration into your network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Authentication using AD or LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.2 Planning IP addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4.3 Data access and IP address balancing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5 Attachment to customer applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.1 Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.2 Share access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.3 Caveats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.5.4 Backup considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 9. Installation and configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Pre-Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2 Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 Hardware installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.2 Software installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.3 Checking health of the node hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.4 Additional hardware health checks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Post installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5 Sample environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.1 Initial hardware installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.2 Initial software configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

231 233 234 234 235 240 241 241 242 242 243 243 244 245 245 249 250 250 252 253 253 254 260 260 261 262 263 264 264 265 265 266 267 268 278 278 278 278 278 279 280 280 280 281 281 281 282 282 282 283 291

Contents

vii

9.5.3 Understanding the IP addresses for internal networking . . . . . . . . . . . . . . . . . . 9.5.4 Configuring the Cluster Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.5 Listing all available disks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.6 Adding a second failure group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.7 Creating the GPFS file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.8 Configuring the DNS Server IP addresses and domains . . . . . . . . . . . . . . . . . . 9.5.9 Configuring the NAT Gateway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.10 Configuring authentication: AD and LDAP . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.11 Configuring Data Path IP Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.12 Configuring Data Path IP address group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.5.13 Attaching the Data Path IP Address Group . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.6 Creating Exports for data access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.7 Modifying ACLs to the shared export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.8 Testing access to the SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 10. SONAS administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1 Using the management interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 GUI tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.2 Accessing the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 SONAS administrator tasks list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Tasks that can be performed only by the SONAS GUI . . . . . . . . . . . . . . . . . . . 10.2.2 Tasks that can be performed only by the SONAS CLI . . . . . . . . . . . . . . . . . . . 10.2.3 Tasks that can be performed by the SONAS GUI and SONAS CLI . . . . . . . . . 10.3 Cluster management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Adding or deleting a cluster to the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.2 Viewing cluster status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 Viewing interface node and storage node status . . . . . . . . . . . . . . . . . . . . . . . 10.3.4 Modifying the status of interface nodes and storage nodes . . . . . . . . . . . . . . . 10.4 File system management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.1 Creating a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.2 Listing the file system status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.3 Mounting the file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.4 Unmounting the file system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.5 Modifying the file system configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.6 Deleting a file system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.7 Master and non-master file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.8 Quota management for file systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4.9 File set management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5 Creating and managing exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Creating exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Listing and viewing status of exports created . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.3 Modifying exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.4 Removing service/protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.5 Activating exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.6 Deactivating exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.7 Removing exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.8 Testing accessing the exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Disk management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.1 List Disks and View Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.2 Changing properties of disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.3 Starting disks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6.4 Removing disks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7 User management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

292 293 294 294 295 297 299 301 302 303 304 304 305 307 313 314 314 345 349 349 350 350 351 351 351 352 352 354 354 361 361 363 364 369 370 370 372 378 378 383 383 386 387 388 389 390 394 394 397 399 399 399

viii

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

10.7.1 SONAS administrator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.7.2 SONAS end users. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8 Services Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.1 Management Service administration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.8.2 Managing services on the cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9 Real-time and historical reporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.1 System utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.2 File System utilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.9.3 Utilization Thresholds and Notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10 Scheduling tasks in SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.1 Listing tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.2 Removing tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.10.3 Modifying the schedule tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11 Health Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.1 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.2 Default Grid view. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.11.3 Event logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.12 Call home . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 11. Migration overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 SONAS file system authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 SONAS file system ACLs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 File sharing protocols in SONAS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.3 Windows CIFS and SONAS considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Migrating files and directories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Data migration considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Metadata migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Migration tools. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Migration of CIFS shares and NFS exports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4 Migration considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 Migration data collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.2 Types of migration approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.3 Sample throughput estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.4 Migration throughput example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 12. Getting started with SONAS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1 Quick start . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 Connecting to the SONAS appliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 Connecting to the SONAS appliance using the GUI . . . . . . . . . . . . . . . . . . . . . 12.2.2 Connecting to the SONAS appliance using the CLI . . . . . . . . . . . . . . . . . . . . . 12.3 Creating SONAS administrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Creating a SONAS administrator using the CLI . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Creating a SONAS administrator using the GUI . . . . . . . . . . . . . . . . . . . . . . . . 12.4 Monitoring your SONAS environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Topology view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.2 SONAS logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.3 Performance and reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.4 Threshold monitoring and notification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5 Creating a filesystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 Creating a filesystem using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.2 Creating a filesystem using the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Creating an export. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6.1 Configuring exports using the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

399 403 405 405 406 411 411 413 415 416 416 418 419 420 420 424 432 433 435 436 436 437 439 440 440 441 442 444 444 445 445 447 447 449 450 450 450 451 452 452 452 453 454 457 458 459 462 462 465 466 466

Contents

ix

12.6.2 Configuring exports using the CLI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7 Accessing an export . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.1 Accessing a CIFS share from Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.7.2 Accessing a CIFS share from a Windows command prompt . . . . . . . . . . . . . . 12.7.3 Accessing a NFS share from Linux. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Creating and using snapshots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.1 Creating snapshots with the GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.2 Creating snapshots with the CLI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.3 Accessing and using snapshots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9 Backing up and restoring data with Tivoli Storage Manager . . . . . . . . . . . . . . . . . . . Chapter 13. Hints, tips, and how to information . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.1 What to do when you receive an EFSSG0026I error message . . . . . . . . . . . . . . . . 13.1.1 EFSSG0026I error: Management service stopped . . . . . . . . . . . . . . . . . . . . . . 13.1.2 Commands to use when management service not running . . . . . . . . . . . . . . . 13.2 Debugging SONAS with logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.1 CTDB health check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.2 GPFS logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.3 CTDB logs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2.4 Samba and Winbind logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 When CTDB goes unhealthy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.1 How CTDB manages services . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Master file system unmounted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.3 CTDB manages GPFS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.4 GPFS unable to mount . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Additional component details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to Samba . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cluster implementation requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Clustered Trivial Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How CTDB works to synchronize access to data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Providing high availability for node failure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB Node recovery mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IP failover mechanism. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How CTDB manages the cluster. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB tunables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CTDB databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . File system concepts and access permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permissions and access control lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Traditional UNIX permissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Access control lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Permissions and ACLs in Windows operating systems . . . . . . . . . . . . . . . . . . . . . . . . GPFS overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS file management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS High Availability solution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GPFS failure group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other GPFS features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tivoli Storage Manager overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

469 470 470 471 471 472 472 473 473 475 479 480 480 480 480 480 481 481 481 482 482 482 482 483 485 486 486 486 486 487 490 492 493 494 495 495 496 496 497 498 498 498 498 499 502 503 504 507 508 508 509

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Tivoli Storage Manager concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tivoli Storage Manager architectural overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tivoli Storage Manager storage management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Policy management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Hierarchical Storage Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IBM Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . How to get Redbooks publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

509 510 516 519 522 527 527 527 527 528 528

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 529

Contents

xi

xii

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Notices
This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

Copyright IBM Corp. 2010. All rights reserved.

xiii

Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol ( or ), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:
AFS AIX BladeCenter DB2 Domino Enterprise Storage Server eServer FlashCopy GPFS HACMP IBM Lotus PowerVM pSeries Redbooks Redbooks (logo) System i System p5 System Storage System x Tivoli XIV xSeries z/OS

The following terms are trademarks of other companies: Snapshot, and the Network Appliance logo are trademarks or registered trademarks of Network Appliance, Inc. in the U.S. and other countries. Java, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both. Microsoft, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Intel, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. UNIX is a registered trademark of The Open Group in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

xiv

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Preface
IBM Scale Out Network Attached Storage (IBM SONAS) is a Scale Out NAS offering designed to manage vast repositories of information in enterprise environments requiring very large capacities, high levels of performance, and high availability. IBM SONAS provides a range of reliable, scalable storage solutions for a variety of storage requirements. These capabilities are achieved by using network access protocols such as NFS, CIFS, HTTP, and FTP. Utilizing built-in RAID technologies, all data is well protected with options to add additional protection through mirroring, replication, snapshots, and backup. These storage systems are also characterized by simple management interfaces that make installation, administration, and troubleshooting uncomplicated and straightforward. In this IBM Redbooks publication, we give you details of the hardware and software architecture that make up the SONAS appliance, along with configuration, sizing, and performance considerations. We provide information about the integration of the SONAS appliance into an existing network. We demonstrate the administration of the SONAS appliance through the GUI and CLI, as well as showing backup and availability scenarios. Using a quick start scenario, we take you through common SONAS administration tasks to familiarize you with the SONAS system.

The team who wrote this book


This book was produced by a team of specialists from around the world working at the International Technical Support Organization, San Jose Center. Mary Lovelace is a Consulting IT Specialist at the International Technical Support Organization. She has more than 20 years of experience with IBM in large systems, storage, and storage networking product education, system engineering and consultancy, and systems support. She has written many Redbooks publications about IBM Tivoli Storage Productivity Center, Tivoli Storage Manager, and IBM z/OS storage products. Vincent Boucher is an IT Specialist as a member of the EMEA Products and Solutions Support Center (PSSC) of Montpellier France. His role within the Storage Benchmark team is to demonstrate the efficiency of IBM solutions and their added value to customers. He holds an Engineering degree in Mathematics and Computer Science from the ENSEEIHT Engineering schools in Toulouse. Vincents area of expertise include Linux, IBM systems x, mid-range IBM Storage, and IBM GPFS from both his past High Performance Computing and new Storage benchmark experiences. Shradha Nayak is a Staff Software Engineer working with IBM India software Labs in Pune, India. She holds a Bachelor of Computer Science Engineering degree and has around 6.5 years of experience. She has been working in the storage domain since and has good expertise in Scale out File Service (SoFS) and Scale Out Network Attached Storage (SONAS). Prior to this, she worked as a Level-3 developer for Distributed File Service (DFS) and also worked for IBM AFS. Shradha is focusing on storage products and cloud storage and is currently part of the Level-3 developers teams for SONAS. Being a part of the SONAS developing and testing team, she has developed a thorough knowledge of SONAS, its components and functions. In this book, she has mainly focused on the installation, configuration, and administration of SONAS. Shradha is also interested in social media and social networking tools and methodologies.
Copyright IBM Corp. 2010. All rights reserved.

xv

Curtis Neal is an Executive IT Specialist working for the IBM System Storage Group in San Jose, California. He has over 25 years of experience in various technical capacities, including mainframe and open system test, design, and implementation. For the past eight years, he has led the Open Storage Competency Center, which helps customers and IBM Business Partners with the planning, demonstration, and integration of IBM System Storage Solutions. Lukasz Razmuk is an IT Architect at IBM Global Technology Services in Warsaw, Poland. He has six years of IBM experience in designing, implementing and supporting solutions in IBM IBM AIX, Linux, IBM pSeries, virtualization, high availability, General Parallel File System (GPFS), SAN Storage Area Network, Storage for Open Systems, and IBM Tivoli Storage Manager. Moreover, he acts as a Technical Account Advocate for Polish clients. He holds a Master of Science degree in Information Technology from Polish-Japanese Institute of Information Technology in Warsaw as well as many technical certifications, including IBM Certified Advanced Technical Expert on IBM System p5, IBM Certified Technical Expert pSeries, IBM HACMP, Virtualization Technical Support, and Enterprise Technical Support AIX 5.3. John Sing is an Executive IT Consultant with IBM Systems and Technology Group. John has specialties in large Scale Out NAS, in IT Strategy and Planning, and in IT High Availability and Business Continuity. Since 2001, John has been an integral member of the IBM Systems and Storage worldwide planning and support organizations. He started in the Storage area in 1994 while on assignment to IBM Hong Kong (S.A.R. of China), and IBM China. In 1998, John joined the IBM Enterprise Storage Server Planning team for PPRC, XRC, and IBM FlashCopy. He has been the marketing manager for these products, and in 2002, began working in Business Continuity and IT Strategy and Planning. Since 2009, John has also added focus on IT Competitive Advantage strategy including Scale Out NAS and Cloud Storage. John is the author of three Redbooks publications on these topics, and in 2007, celebrated his 25th anniversary of joining IBM. John Tarella is a Senior Consulting IT Specialist who works for IBM Global Services in Italy. He has 25 years of experience in storage and performance management on mainframe and distributed environments. He holds a degree in Seismic Structural Engineering from Politecnico di Milano, Italy. His areas of expertise include IBM Tivoli Storage Manager and storage infrastructure consulting, design, implementation services, open systems storage, and storage performance monitoring and tuning. He is presently focusing on storage solutions for business continuity, information lifecycle management, and infrastructure simplification. He has written extensively on z/OS DFSMS, IBM Tivoli Storage Manager, SANs, storage business continuity solutions, content management, and ILM solutions. John is currently focusing on cloud storage delivery. He also has an interest in Web2.0 and social networking tools and methdologies. Thanks to the following people for their contributions to this project: Mark Doumas Desiree Strom Sven Oehme Mark Taylor Alexander Saupp Mathias Dietz Jason Auvenshine Greg Kishi Scott Fadden Leonard Degallado Todd Neville Warren Saltzman Wen Moy xvi
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Tom Beglin Adam Childers Frank Sowin Pratap Banthia Dean Hanson Everett Bennally Ronnie Sahlberg Christian Ambach Andreas Luengen Bernd Baeuml

Now you can become a published author, too!


Here's an opportunity to spotlight your skills, grow your career, and become a published author - all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html

Comments welcome
Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks publications form, found at: ibm.com/redbooks Send your comments in an email to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400

Preface

xvii

Stay connected to IBM Redbooks publications


Find us on Facebook: http://www.facebook.com/IBMRedbooks Follow us on twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks publications weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html

xviii

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 1.

Introduction to IBM Scale Out Network Attached Storage


SONAS is designed to address the new storage challenges posed by the continuing explosion of data. Leveraging mature technology from IBMs High Performance Computing experience, and based upon IBMs General Parallel File System (GPFS), SONAS is an easy-to-install, turnkey, modular, scale out NAS solution that provides the performance, clustered scalability, high availability, and functionality that are essential to meeting strategic Petabyte Age and cloud storage requirements. In this chapter, we consider how a high-density, high-performance SONAS solution can help organizations consolidate and manage data affordably, reduce crowded floor space, and reduce management expense associated with administering an excessive number of disparate storage systems. With its advanced architecture, SONAS virtualizes and consolidates multiple filers into a single, enterprise-wide file system, which can translate into reduced total cost of ownership, reduced capital expenditure, and enhanced operational efficiency. SONAS uses Hierarchical Storage Management (HSM), which refers to a function of Tivoli Storage Manager that automatically distributes and manages data on disk, tape, or both by regarding devices of these types and potentially others as levels in a storage hierarchy.

Copyright IBM Corp. 2010. All rights reserved.

1.1 Marketplace requirements


There are various factors driving the need for a new way of looking at information and the way we make decisions based on that information. Today, the changes in our worldthe instrumentation, interconnectedness, and intelligence of our environmentscombine to produce a massive glut of new information, from new sources, with new needs to utilize it. These pressures exacerbate a few of the challenges that we have been dealing with for awhile now, but on a whole new scale There is an explosion in the amount of data, of course, but also there are shifts in the nature of data (see Figure 1-1). Formerly, virtually all the information available to be processed was authored by someone. Now that kind of data is being overwhelmed by machine-generated data, spewing out of sensors, RFID, meters, microphones, surveillance systems, GPS systems, and all manner of animate and inanimate objects. With this expansion of the sources of information comes a large variance in the complexion of the available datavery noisy, lots of errorsand no time to cleanse it in a world of real-time decision making. Also, consider that todays economic times require corporations and governments to analyze new information faster and make timely decisions for achieving business goals. As the volume, variety, and velocity of information and decision making increases, this places a larger burden on organizations to effectively and efficiently distribute the right information, at the right time, to the people, processes, and applications that are reliant upon that information to make better business decisions. All of these situations are creating challenges, while providing an excellent opportunity for driving an information-led transformation.

Figure 1-1 Explosion of data demands an Information-led transformation

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Todays businesses are demanding the ability to create, manage, retrieve, protect, and share business and social digital content or large rich media files over a broadband Internet that reaches to every corner of the globe (Figure 1-2). Users are creating and using data that is redefining our business and social world in real time. Unlike traditional IT data, this rich digital content is almost entirely file-based or object-based, and it is growing ever larger in size, with highly diverse and unpredictable usage patterns.

Figure 1-2 Todays workloads demand new approach to data access

Innovative applications in business analytics, digital media, medical data and cloud storage are creating requirements for data access rates and response times to individual files that were previously unique to high-performance computing environmentsand all of this is driving a continuing explosion of business data. While many factors are contributing to data growth, these trends are significant contributors: Digital representation of physical systems and processes Capture of digital content from physical systems and sources Deliveries of digital content to a global population Additional trends are driven by the following kinds of applications: Product Life Cycle Management (PLM) systems, which include Product Data Management systems and mechanical, electronic, and software design automation Service Life Cycle Management (SLM) systems Information Life Cycle Management (ILM), including email archiving Video on demand: Online, broadcast, and cable Digital Video Surveillance (DVS): Government and commercial Video animation rendering Seismic modeling and reservoir analysis Pharmaceutical design and drug analysis Digital health care systems Web 2.0 and service-oriented architecture

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

When it comes to traditional IT workloads, traditional storage will continue to excel for the traditional applications for which they were designed. But solutions such as Network Attach Storage (NAS) were not intended to scale to the high levels and extremely challenging workload characteristics required by todays Internet-driven, Petabyte Age applications.

1.2 Understanding I/O


A major source of confusion regarding NAS is the concept of File I/O versus Block I/O. Understanding the difference between these two forms of data access is crucial to realizing the potential benefits of any SAN-based or NAS-based solution.

1.2.1 File I/O


When a partition on a hard drive is under the control of an operating system (OS), the OS will format it. Formatting of the partition occurs when the OS lays a file system structure on the partition. This file system is what enables the OS to keep track of where it stores data. The file system is an addressing scheme that the OS uses to map data on the partition. Now, when you want to get to a piece of data on that partition, you must request the data from the OS that controls it. For example, suppose that Windows 2000 formats a partition (or drive) and maps that partition to your system. Every time you request to open data on that partition, your request is processed by Windows 2000. Because there is a file system on the partition, it is accessed by File I/O. Additionally, you cannot request access to just the last 10 KB of a file. You must open the entire file, which is another reason that this method is referred to as File I/O. Using File I/O is like having an accountant. Accountants are good at keeping up with your money for you, but they charge you for that service. For your personal checkbook, you probably want to avoid that cost. On the other hand, for a corporation where many kinds of requests are made, having an accountant is a good idea, so that wrongful checks do not get written. A file I/O specifies the file. It also indicates an offset into the file (Figure 1-3). For instance, the I/O might specify Go to byte 1000 in the file (as if the file were a set of contiguous bytes), and read the next 256 bytes beginning at that position. Unlike block I/O, there is no awareness of a disk volume or disk sectors in a file I/O request. Inside the NAS appliance, the operating system keeps track of where files are located on disk. It is the NAS OS which issues a block I/O request to the disks to fulfill the client file I/O read and write requests that it receives. By default, a database application that is accessing a remote file located on a NAS device is configured to run with File System I/O. It cannot utilize raw I/O to achieve improved performance.

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 1-3 File I/O

1.2.2 Block I/O


Block I/O (raw disk) is handled in various ways (Figure 1-4). There is no OS format done to lay out a file system on the partition. The addressing scheme that keeps track of where data is stored is provided by the application using the partition. An example of this might be IBM DB2 using its tables to keep track of where data is located rather than letting the OS do that job. We do not mean to say that DB2 cannot use the OS to keep track of where files are stored. It is just more efficient for the database to bypass the cost of requesting the OS to do that work.

Figure 1-4 Block I/O

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

When sharing files across a network, something needs to control when writes can be done. The operating system fills this role. It does not allow multiple writes at the same time, even though many write requests are made. Databases are able to control this writing function on their own so in general they run faster by skipping the OS although this depends on the efficiency of the implementation of file system and database.

1.2.3 Network Attached Storage (NAS)


Storage systems that optimize the concept of file sharing across the network have come to be known as Network Attached Storage (NAS). The NAS solutions utilize the mature Ethernet IP network technology of the LAN. Data is sent to and from NAS devices over the LAN using TCP/IP protocol. One of the key differences in a NAS appliance, compared to direct attached storage (DAS) or other network storage solutions such as SAN or iSCSI, is that all client I/O operations to the NAS use file level I/O protocols. File I/O is a high level type of request that, in essence, specifies only the file to be accessed, but does not directly address the storage device. This is done later by other operating system functions in the remote NAS appliance. By making storage systems LAN addressable, the storage is freed from its direct attachment to a specific server, and any-to-any connectivity is facilitated using the LAN fabric. In principle, any user running any operating system can access files on the remote storage device. This is done by means of a common network access protocol; for example, NFS for UNIX servers and CIFS for Windows servers. In addition, a task such as backup to tape can be performed across the LAN using software such as Tivoli Storage Manager, enabling sharing of expensive hardware resources (for example, automated tape libraries) between multiple servers.

NAS file system access and administration


Network access methods such as NFS and CIFS can only handle file I/O requests to the remote file system. This is located in the operating system of the NAS device. I/O requests are packaged by the initiator into TCP/IP protocols to move across the IP network. The remote NAS file system converts the request to block I/O and reads or writes the data to the NAS disk storage. To return data to the requesting client application, the NAS appliance software repackages the data in TCP/IP protocols to move it back across the network. A storage device cannot just attach to a LAN. It needs intelligence to manage the transfer and the organization of data on the device. The intelligence is provided by a dedicated server to which the common storage is attached. It is important to understand this concept. NAS comprises a server, an operating system, and storage that is shared across the network by many other servers and clients. So a NAS is a specialized server or appliance, rather than a network infrastructure, and shared storage is attached to the NAS server.

NAS file system administration


However, NAS filers today do not scale to high capacities. When one filer was fully utilized, a second, third, and more filers were installed. The result was that administrators found themselves managing silos of filers. It was not possible to share capacity on individual filers. Certain filers were heavily accessed, while others were mostly idle.

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

However, NAS filers today do not scale to high capacities. When one filer was fully utilized, a second, third, and more filers were installed. The result was that administrators found themselves managing silos of filers. It was not possible to share capacity on individual filers. Various filers were heavily accessed, while others were mostly idle. Figure 1-5 shows a summary of traditional NAS limitations.

Figure 1-5 Network Attached Storage limitations

This situation is compounded by the fact that at hundreds of terabytes or more, conventional backup of such a large storage farm is difficult, if not impossible. There is also the issue that even though one might be using incremental only backup, scanning hundreds of terabytes to identify the changed files or changed blocks might in itself take to long, with too much overhead. More issues include the possibility that there might not be any way to apply file placement, migration, deletion, and management policies automatically from one centrally managed, centrally deployed control point. Doing manual management of tens or hundreds of filers was proving to be neither timely nor cost-effective, and effectively prohibited any feasible way to globally implement automated tiered storage.

1.3 Scale Out Network Attached Storage (SONAS)


IBM Scale Out Network Attached Storage (SONAS) is designed to address the new storage challenges posed by the continuing explosion of data. IBM recognizes that a critical component of future enterprise storage is a scale-out architecture that takes advantage of industry trends to create a truly efficient and responsive storage environment, eliminating the waste created by the proliferation of scale-up systems and providing a platform for file server consolidation. That is where SONAS comes in, as shown in Figure 1-6.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

Utilizing mature technology from the IBM High Performance Computing experience, and based upon the IBM flagship General Parallel File System (GPFS), SONAS is an easy-to-install, turnkey, modular, scale out NAS solution that provides the performance, clustered scalability, high availability, and functionality that are essential to meeting strategic Petabyte Age and cloud storage requirements. Simply put, SONAS is a scale out storage system combined with high-speed interface nodes interconnected with storage capacity and GPFS, which enables organizations to scale performance alongside capacity in an integrated, highly-available system. The high-density, high-performance SONAS can help your organization consolidate and manage data affordably, reduce crowded floor space, and reduce management expense associated with administering an excessive number of disparate storage systems

Figure 1-6 IBM SONAS overview

1.3.1 SONAS architecture


The SONAS system is available in as small a configuration as 30 terabytes (TB) usable in the base rack, up to a maximum of 30 interface nodes and 60 storage nodes in 30 storage pods. The storage pods fit into 15 storage expansion racks. The 60 storage nodes can contain a total of 7200 hard-disk drives when fully configured and you are using 96-port InfiniBand switches in the base rack. With its advanced architecture, SONAS virtualizes and consolidates multiple filers into a single, enterprise-wide file system, which can translate into reduced total cost of ownership, reduced capital expenditure, and enhanced operational efficiency.

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 1-7 provides a high level overview of the SONAS architecture.

Figure 1-7 SONAS architecture

Assuming 2 TB disk drives, such a system has 14.4 petabytes (PB) of raw storage and billions of files in a single large file system. You can have as few as eight file systems in a fully configured 14.4 PB SONAS system or as many as 256 file systems. It provides an automated policy-based file management that controls backups and restores, snapshots, and remote replication. It also provides: A single global namespace with logical paths that do not change because of physical data movement Support for Serial Attached SCSI (SAS), Nearline SAS, and in the past, Serial Advanced Technology Attachment (SATA) drives High-availability and load-balancing Centralized management Centralized backup An interconnected cluster of file-serving and network-interfacing nodes in a redundant high-speed data network Virtually no capacity limits Virtually no scalability limits IBM Call Home trouble reporting and IBM Tivoli Assist On Site (AOS) remote support capabilities Enhanced support for your Tivoli Storage Manager Server product, with a preinstalled Tivoli Storage Manager client
Chapter 1. Introduction to IBM Scale Out Network Attached Storage

Support for the cloud environment. A controlled set of end users, projects, and applications can perform the following functions: Share files with other users within one or more file spaces Control access to their files using access control lists (Microsoft Windows clients) and user groups Manage each file space with a browser-based tool

Global namespace
SONAS provides a global namespace that enables your storage infrastructure to scale to extreme amounts of data, from terabytes to petabytes. Within the solution, centralized management, provisioning, control, and automated information life-cycle management (ILM) are integrated as standard features to provide the foundation for a truly cloud storage enabled solution.

interface nodes
The high-performance interface nodes provide connectivity to your Internet Protocol (IP) network for file access and support of both 1-gigabit Ethernet (GbE) and 10-GbE connection speeds.Each interface node can connect to the IP network with up to eight separate data-path connections. Performance and bandwidth scalability are achieved by adding interface nodes, up to the maximum of 30 nodes, each of which has access to all files in all file systems You can scale out to thirty interface nodes. Each interface node has its own cache memory, so you increase caching memory and data paths in your file-serving capacity by adding an interface node. Of course, you also increase file-serving processor capacity. If raw storage capacity is the prime constraint in the current system, the SONAS system scales out to as much as 14.4 petabytes (PBs) with 2 terabyte (TB) disk drives, with up to 256 file systems that can each have up to 256 file-system snapshots. Most systems that a SONAS system typically displaces cannot provide clients with access to so much storage from a single file-serving head. Every interface node has access to all of the storage capacity in the SONAS system.

1.3.2 SONAS scale out capability


SONAS provides extreme scale out capability, a globally clustered NAS file system built upon IBM GPFS. The global namespace is maintained across the entire global cluster of multiple storage pods and multiple interface nodes. All interface nodes and all storage nodes share equally in the cluster to balance workloads dynamically and provide parallel performance to all users and storage, while also assuring high availability and automated failover. SONAS is a scalable virtual file storage platform that grows as data grows. It meets demanding performance requirements as new processors can be added independently or as storage capacity is added, eliminating a choke point found in traditional scale-up systems. SONAS is designed for high availability 24x7 environments with a clustered architecture that is inherently available and, when combined with the global namespace, allows for much higher utilization rates than found in scale-up environments.

10

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

1.3.3 SONAS Software


In this section we discuss the features and benefits of SONAS Software.

Storage management features


SONAS Software utilizes a powerful cross-platform access to the same files with locking for data integrity. In addition, SONAS provides high availability Linux, UNIX, CIFS (Windows) sessions with no client side changes (Figure 1-8). Deploying SONAS allows users to reduce the overall number of disk drives and file storage systems that need to be housed, powered, cooled, and managed relative to scale-up systems.

Figure 1-8 SONAS Software

Storage management benefits


SONAS also provides integrated support of policy-based automated placement and subsequent tiering and migration of data. Customers can provision storage pools and store file data according to its importance to the organization. For example, a user can define multiple storage pools with various drive types and performance profiles. They can create a higher performance storage pool with SAS drives and define a less expensive (and lower performance) pool with larger Nearline SAS drives. Rich, sophisticated policies are built into SONAS which can transparently migrate data between pools based on many characteristics, such as capacity threshold limits and age of the data. This helps to address business critical performance requirements. Leveraging automated storage tiering, users can finally realize the cost savings and business benefits of information lifecycle management (ILM) at an immense scale.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

11

1.3.4 High availability design


SONAS provides a NAS storage platform for global access of your business critical data. Your business critical data can be secured with both information protection and business continuity solutions, giving you a high level of business continuity assurance. In the event of data corruption or an unexpected disaster that might harm corporate data, SONAS helps you to recover and quickly resume normal enterprise and data center operations (Figure 1-9). SONAS supports large enterprise requirements for remote replication, point-in-time copy (file system-level snapshots), and scalable automated storage tiering, all managed as a single instance within a global namespace. SONAS asynchronous replication is specifically designed to cope with connections that provide low bandwidth, high latency, and low reliability. The async scheduled process will pick-up the updates on the source SONAS system and write them to the target SONAS system, which can be thousands of miles away. Security and information protection are enhanced in a consolidated SONAS environment. For example, users considering the implementation of security and protection solutions are concerned about maintaining data availability as systems scalea key design point for SONAS. Its clustered architecture is designed for high availability at scale and 24 x 7 x Forever operation, complementing consolidated security and protection solutions to provide an always-on information infrastructure.

Figure 1-9 High availability and disaster recovery design

12

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

1.4 SONAS architectural concepts and principles


In this section we review the overall SONAS architecture and operational principles. We start with the logical diagram in Figure 1-10.

Logical Diagram of IBM SONAS


/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls /home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS

Physical

Global Namespace
Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out ... > scale


out
Storage Pool 3

Stora ge node s

..

Stor age node s

St or age nodes

Storage Pool 1

Storage Pool 2

Figure 1-10 Logical diagram of IBM SONAS

In the top half of this diagram, we see the logical file directory structure as seen by the users. SONAS presents and preserves this same logical appearance to the users, no matter what we do to physically manage these files, and all files in the SONAS, from creation to deletion. The user sees only his global namespace, his user directories and files. As a SONAS expands, manages, and changes the physical data location and supporting physical infrastructure, the users will still have the unchanged appearance of one single logical global namespace, and maintain their logical file structure without change. In the lower half of this diagram, we see a representation of the SONAS internal architectural components. SONAS has interface nodes, which serve data to and from the users, over the network. SONAS also has storage nodes, which service the storage for the SONAS clustered file system. All SONAS nodes are in a global cluster, connected by InfiniBand. All interface nodes have full read/write access to all storage nodes. All storage nodes have full read/write access to all interface nodes. Each of the nodes runs a copy of IBM SONAS Software (5639-SN1), which provides all the functions of SONAS, including a Cluster Manager, which manages the cluster and dispatches workload evenly across the cluster.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

13

Also included is the SONAS central storage policy engine, which runs in a distributed fashion across all the nodes in the SONAS. The SONAS policy engine provides central management of the lifecyle of all files, in a centrally deployed, centrally controlled, enforceable manner. The policy engine function is not tied to a particular node, it executes in a distributed manner across all nodes. Not shown are the SONAS management nodes, which monitor the health of the SONAS. IBM SONAS Software manages the cluster and maintains the coherency and consistency of the file system, providing file level and byte level locking, using a sophisticated distributed, token (lock) management architecture that is derived from IBM General Parallel File System (GPFS) technology. As we shall see, the SONAS clustered grid architecture provides the foundation for automatic load balancing, high availability, scale out high performance, with multiple parallel, concurrent writers and readers.

Physical disk drives are allocated to SONAS logical storage pools. Typically, we might
allocate a high performance pool of storage (which uses the fastest disk drives), and a lower tier of storage for capacity (less expensive, slower spinning drives). In the previous example, we have allocated three logical storage pools.

1.4.1 Create, write, and read files


To understand the operation of SONAS Software and the interaction of the SONAS Software functions, it is best to follow the lifecycle of a file as it flows from creation through automated tiered storage management, to eventual destaging to backups, external storage, or deletion. We follow the lifecycle of reading and writing files, as they traverse the SONAS Software and the SONAS central policy engine. In this way, we can see how the SONAS Software and policy engine are used to manage these files in three logical storage pools. With easy to write rules, a SONAS administrator can automate the storage management, file placement, and file migration within a SONAS. The central policy engine in SONAS is the one central location to provide a enforceable powerful set of policies and rules to globally manage all physical storage, while preserving the appearance of a global namespace and an unchanging file system to the users.

1.4.2 Creating and writing a file


When a create file request comes into SONAS, it is directed to the SONAS Software central policy engine. The policy engine has file placement rules, which determine to which of the logical storage pools the file is to be written. SONAS Software works together with an external Domain Name Server to allocate an interface node to handle this client. The incoming workload is IP-balanced equally by the external network Domain Name Server, across all SONAS interface nodes.

14

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

As shown in Figure 1-11, we have selected an interface node and are determining the file placement by the policy engine.

Create and w rite file 1.1


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS


/home/appl/data/web/important_big_spreadsheet.xls

Global Namespace
Inter fa ce nodes

Policy Engine
Inter fa c e nodes

..

Inter fa ce nodes

> scale out ... > scale


out
Storage Pool 3

Stora ge node s

..

Physical

Stor age node s

St or age nodes

Storage Pool 1

Storage Pool 2

Figure 1-11 Create and write file 1 - step 1 Policy

All incoming create file requests pass through the SONAS central policy engine in order to determine file placement. The interface node takes the incoming create file request, and based on the logical storage pool for the file, passes the write request to the appropriate storage nodes. A logical storage pool can and often does span storage nodes. The storage nodes, in parallel, perform a large data striped write into the appropriate logical storage pool, exploiting the parallelism of writing the data simultaneously across multiple physical disk drives. SONAS data writes are done in a wide parallel data stripe write, across all disk drives in the logical storage pool. In this way, SONAS Software architecture aggregates the file write and read throughput of multiple disk drives, thus providing high performance. SONAS Software will write the file in physical blocksize chunks, according to the blocksize specified at the file system level. The default blocksize for a SONAS file system is 256 KB, and this is a good blocksize for the large majority of workloads, especially where there will be a mix of small random I/Os and large sequential workloads within the same file system. You can choose to define the file system with other blocksizes. For example, where the workload is known to be highly sequential in nature, you can choose to define the file system with a large 1 MB or even 4 MB blocksize. See the detailed sizing sections of this book for further best settings. This wide data striping architecture has algorithms that determine where the data blocks must physically reside; this provides the SONAS Software the ability to automatically tune and equally load balance all disk drives in the storage pool. This is shown in Figure 1-12.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

15

Create and w rite file 1.2


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out ... > scale


out
Storage Pool 3

Stora ge node s /home/appl /data/web/important_big_spreadsheet.xls

..

Physical

Stor age node s

St or age nodes

Storage Pool 1

Storage Pool 2

Figure 1-12 Create and write file 1 - step 2 - wide parallel data stripe write

Now, let us write another file to the SONAS. Another interface node is appropriately selected for this incoming work request by the Domain Name Server, and the file is passed to that interface node for writing, as shown in Figure 1-13.
e

Create and w rite file 2.1


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS


/home/appl/data/web/big_architecture_drawing.ppt

Global Namespace

Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out ... > scale


out
Storage Pool 3

Stora ge node s

..

Physical

Stor age node s

St or age nodes

Storage Pool 1

Storage Pool 2

Figure 1-13 Create and write file 2 - step 1 Policy

16

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Notice that another interface node has been chosen; this illustrates the automatic balancing of the incoming workload across the interface nodes. The interface node is told by the policy engine that this file is to be written to another logical storage pool. In the same manner as previously described, the file is written in a wide data stripe, as shown in Figure 1-14.

Create and w rite file 2.2


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out ... > scale


out
Storage Pool 3

Stora ge node s

..

Physical

Stor age node s

St or age nodes

/home/appl/data /web/ big_ar chitecture_drawing.ppt

Storage Pool 1

Storage Pool 2

Figure 1-14 Create and write file 2 - step 2 - wide parallel data stripe write

Finally, let us write a third file. As shown in Figure 1-15, a third interface node is selected by the Domain Name Server.

Create and w rite file 3.1


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS


/home/appl/data/web/ unstructured_big_video.mpg

Global Namespace
Inter fa ce nodes

Policy Engine
Inter fa c e nodes

..

Inter fa ce nodes

> scale out ... > scale


out
Storage Pool 3

Stora ge node s

..

Physical

Stor age node s

St or age nodes

Storage Pool 1

Storage Pool 2

Figure 1-15 Create and write file 3 - step 1 Policy

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

17

The SONAS policy engine has specified that this file is to be written into logical storage pool 3. A wide data stripe parallel write is done as shown in Figure 1-16.

Create and w rite file 3.2


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out ... >


Physical

Stora ge node s

..

Stor age node s

St or age nodes

scale out

/ho me/appl/data/web/ unstructured_big_video.mpg

Storage Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-16 Create and write file 2 - step 2 - wide parallel data stripe write

With these illustrations, we can now see how the components of the SONAS Software uses the SONAS policy engine together with the Domain Name Server, to drive workload equally across interface nodes. The SONAS Software then appropriately distributes workload among the storage nodes and physical disk drives. This is summarized in Figure 1-17.

Create and w rite files - summary


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

IBM Scale Out NAS

Workload autobalanced across all interface nodes

Note: all three files, in same directory, but each allocated to different physical storage pool

Global Namespace
Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out ... > scale


out

Stora ge node s

..

Data striped across all disks in storage pool. High performance, auto-tuning, autoload balancing

Physical

Stor age node s

St or age nodes

Storage Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-17 Create and write files - summary

18

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In summary, SONAS Software will automatically balance workload across all interface nodes. SONAS Software will write all data in wide stripes, cross all disks in the logical storage pool, providing high performance, automatic tuning, and automatic load balancing. Most importantly, notice that from the users perspective, these three files can all reside in the same logical path and directory. Users do not know that their files are physically located on various classes of storage (or that the physical location can change over time). This provides the ability to implement automatic physical tiered storage without impact to users, and without necessitating time-consuming, process-intensive application level s changes and change control. The SONAS Software will continue to maintain this same logical file structure and path, regardless of physical file location changes as the file is managed from creation through its life cycle, using SONAS automatic tiered storage.

1.4.3 Scale out more performance


Next, let us see how the SONAS Software architecture can be utilized to scale out increased performance. Performance in SONAS can be increased by simply adding more disk drives to a SONAS logical storage pool. With more drives, SONAS Software will write a wider parallel data stripe. The data stripe is not limited to any particular storage pod or storage node; a logical storage pool can span multiple storage nodes and storage pods, as shown in Figure 1-18.

Increase w rite performance scale out more disk drives


/hom e /appl /data /web

/home/appl/data/web/important_big_spreadsheet.xls
Logical

IBM Scale Out NAS

Global Namespace
Policy Engine
Inte rf ac e node

Stor age node s

Inter fa ce node

Inte rf ac e node

> scale out


Physical

Stora ge node s

..

..

St or age nodes

... > scale


out
Stor age pool 3

/home/appl/data/web/im portant_bi g_spreadsheet.xls

Expand Storage Pool with more disk drives

Figure 1-18 Scale out more disk drives for more write performance

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

19

By simply adding more disk drives, the SONAS Software architecture provides the ability to scale out both the number of disk drives and the number of storage nodes that can be applied to support a higher amount of parallel physical data write. The logical storage pool can be as large as the entire file system - and the SONAS file system can be as large as the entire SONAS machine. In this way, SONAS provides a extremely scalable and flexible architecture for serving large scale NAS storage. The SONAS Software architecture provides the ability to expand the scale and capacity of the system in any direction that is desired. The additional disk drives and storage nodes can be added non-disruptively to the SONAS. Immediately upon doing so, SONAS Software will start to automatically auto-balance and auto-tune new workload onto the additional disks, and automatically start taking advantage of the additional resources.

1.4.4 Reading a file


Reading data in SONAS applies the same principles of exploiting the wide data stripe for aggregating the performance of reading data in parallel, across multiple disk drives, as shown in Figure 1-19.

Read files one user


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa c e nodes

Parallel read of file, performance is aggregate all disk drives in that storage pool.

Inter fa ce nodes

..

Inter fa ce nodes

> scale out


Physical

Stora ge node s
/home/a ppl/data/web/ important_big_spreadsheet.xls

..

Stor age node s

St or age nodes

... > scale


out
Storage Pool 3

Storage Pool 1

Storage Pool 2

Figure 1-19 Read files - aggregate parallel data reads

Furthermore, the interface node is designed to utilize advanced algorithms that improve read-ahead and write-behind file functions, and recognizes and does intelligent pre-fetch caching of typical access patterns such as sequential, reverse sequential, and random.

20

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

This process is shown in Figure 1-20.

Read files 1.2


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

Interface node performs read-ahead caching, intelligent pre-fetch IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa c e nodes

Inter fa ce nodes

..

Inter fa ce nodes

> scale out


Physical

Stora ge node s

..

Stor age node s

St or age nodes

... > scale


out
Storage Pool 3

Storage Pool 1

Storage Pool 2

Figure 1-20 Read files - read-ahead caching, intelligent pre-fetch

In the same way that write performance can be enhanced by simply adding more disk drives to the logical storage pool, read performance can be enhanced in the same way, as shown in shown in Figure 1-21.

Read files one user scale out more disk drives


/hom e /appl /data /web

/home/appl/data/web/important_big_spreadsheet.xls
Logical

Interface node performs read-ahead caching, intelligent pre-fetch

IBM Scale Out NAS

Read performance is aggregate of all disk drives in storage pool. More disks and storage pods provides more performance

Policy Engine
Inter fa ce nodes

Inte rf ac e node s

..

Inter fac e node s

> scale out


Physical

Stora ge node s

..

Stora ge node s

Stor age node s

... >

/h o me/a pp l/d ata /w eb / impor tant_ big_s pr e adshe et .xls\

scale out

Expand Storage Pool with more disk drives

Storage Pool 3

Figure 1-21 Scale out more disk drives for read performance - parallel data stripe read

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

21

Notice that the parallelism in the SONAS for an individual client is in the storage read/write; the connection from the interface to the client is a single connection and single stream. This is done on purpose, so that any standard CIFS, NFS, FTP, or HTTPS client can access the IBM SONAS interface nodes without requiring any modification or any special code. Throughput between the interface nodes and the users are enhanced by sophisticated read-ahead and pre-fetching and large memories on each interface node, to provide very high in capacity and throughput on the network connection to the user. As requirements for NAS storage capacity or performance increase. the SONAS Software scale out architecture provides linearly scalable, high performance, parallel disk I/O capabilities as follows: Striping data across multiple disks, across multiple storage nodes and storage pods Reading and writing data in parallel wide data stripes. Increasing the number of disk drives in the logical storage pool can increase the performance Supporting a large block size, configurable by the administrator, to fit I/O requirements Utilizing advanced algorithms that improve read-ahead and write-behind file functions; SONAS recognizes typical access patterns like sequential, reverse sequential and random and optimizes I/O access for these patterns This scale-out architecture of SONAS Software provides superb parallel performance, especially for larger data objects and excellent performance for large aggregates of smaller objects.

1.4.5 Scale out parallelism and high concurrency


The SONAS Software scale out architecture is designed to utilize the ability to have many nodes active concurrently, thus providing the ability to scale to many tens or hundreds of thousands of users in parallel. Todays SONAS appliance is able to provide a scale out NAS architecture that can grow to 30 interface nodes and 60 storage nodes, all in a global active-active share-everything cluster. The technology that allows SONAS to do this is derived from the IBM General Parallel File System (GPFS), which has proven in the large supercomputing environment, the ability to scale to this high level of many thousands of nodes. IBM is scaling down that GPFS capability and providing access to this capability, in the SONAS appliance.

22

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

With the SONAS clustered node architecture, the larger the machine, the more parallel the capability to concurrrently scale out capacity and performance for many individual nodes, many concurrent users, and their storage requests, in parallel, as shown in Figure 1-22.

Read files multiple users in parallel


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/big_architecture_design.ppt /home/appl/data/web/unstructured_big_video.mpg

IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa ce nodes

Parallel streaming reads of multiple files to multiple nodes

Inte rf ac e node s

..

Inter fac e node s

> ... >

scale out
Physical

Stora ge node s /home/a ppl/data/web/ important_big_spreadsheet.xls

..

Stora ge node s

Stor age node s

scale out

Storage Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-22 SONAS Software parallel concurrent file access

The value of the SONAS scale out architecture is the ability to flexibly and dynamically add as many nodes as needed, to increase the amount of parallel concurrent users that can be supported. Each individual node works in parallel to service clients, as shown in Figure 1-23.

Read files multiple users in parallel


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

Logical

/home/appl/data/web/big_architecture_design.ppt /home/appl/data/web/unstructured_big_video.mpg

IBM Scale Out NAS

Global Namespace
Policy Engine
Inter fa ce nodes

All interface nodes performs read-ahead caching, intelligent pre-fetch in parallel

Inte rf ac e node s

..

Inter fac e node s

> scale out


Physical

Stora ge node s

..

Stora ge node s

Stor age node s

... > scale


out
Storage Pool 3

Storage Pool 1

Storage Pool 2

Figure 1-23 SONAS Software parallel concurrent service of multiple users

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

23

SONAS has the same operational procedures and read/write file system architectural philosophy, whether you have a small two interface nodes and two storage node SONAS, or a 30 interface node and 60 storage node very large SONAS.

1.4.6 Manage storage centrally and automatically


Now that we have seen how SONAS Software can provide linear scale out performance and capacity for petabytes of storage, we next need to consider how the software provides tools to physically manage this storage when we are operating at this level of scale, for instance: How can you affordably automate the physical management of that much storage? What automated tools does SONAS provide to do this? Will these tools permit operation at this scale with fewer people and less resources? The answer is a definite Yes. Let us see how SONAS Software provides integrated, automated tools to help you accomplish these goals.

1.4.7 SONAS logical storage pools for tiered storage


SONAS Software is designed to help you to achieve data lifecycle management efficiencies through providing integrated policy-driven automation and tiered storage management, all as part of the base SONAS Software license. SONAS Software provides integrated logical storage pools, filesets and user-defined policies to provide the ability to do automated tiered storage, and therefore more efficiently match the cost of your storage to the value of your data. SONAS logical storage pools allow you to allocate physical storage hard drives to logical storage pools within the SONAS file system. Using logical storage pools, you can create tiers of storage by grouping physical disk storage based on performance, locality or reliability characteristics. Logical storage pools can span storage nodes and storage pods. You can have multiple logical storage pools (up to 8 per file system), the size of a storage pool can be as big as the entire file system, and the file system can be as big as the entire SONAS. SONAS automatically manages, load-balances, and balances storage utilization at the level of the entire logical storage pool.

24

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In Figure 1-24, multiple logical storage pools have been set up, as shown in Figure 1-24.

Logical storage pools and filesets in an IBM SONAS


/h ome /ap pl /d ata /w eb /ho me/a pp l /d ata /we b/ im por tant _big_s pre ads hee t.xls /home/appl/data/web/ file_in_storage_pool_1.xls /h om e/ap p l/d ata/w eb / big_ ar chite ctur e _dra wing.ppt /home/appl/data/web/ file_in_storage_pool_2.ppt /h om e/ap p l/d ata/w eb / unstr uctur ed_ big_ video.m pg /home/appl/data/web/ file_in_storage_pool_3.mpg

SONAS fileset

IBM Scale Out NAS

Global Namespace
Policy Engine
In e t rf a c e n od es In e t rf a c e n od es I nt er f a c e n ode s I nt er f ac e no de s I nt er f ac e no de s n e I t r f ac e no des In e t rf a c e n od es In e t rf a c e n od es

>
St or ag e nodes

scale out scale out scale out

St or ag e nodes

St or ag e nodes

t orag S e nodes

St or ag e nodes

St or ag e nodes

St or ag e no des

St or ag e nodes

St r oag e nodes

St or ag e nodes

St or ag e nodes

St or ag e nodes

St orag e nodes

St or g a e nodes

St or ag e nodes

St or ag e node s

St or ag e no des

>

>
etc.. etc.. etc ..

Logical Storage Pool 1

Logical Storage Pool 2

Logical Storage Pool 3

External HSM storage

Figure 1-24 SONAS Logical Storage Pools

Logical storage pool #1 can be high performance SAS disks, and logical storage pool #2 might be more economical Nearline SAS large TB disk drives. Logical storage pool #3 might be another large TB drive storage pool defined with external HSM for when the data is intended to be staged in and out of the SONAS, to external storage, external tape or tape libraries, or external data de-decapitation technology. Within the internal SONAS logical storage pools, all of the data management, from creation to physical data movement to deletion, is done by SONAS Software1. In addition to internal storage pools, SONAS also supports external storage pools that are managed through an external Tivoli Storage Manager server. When moving data to an external pool, SONAS Software utilized a high performance scan engine to locate and identify files that need to be managed, and then hands the list of the files to be managed to either the SONAS Software data movement functions (for moving data internally within the SONAS), or passes the list of files to be moved externally to the Tivoli Storage Manager for backup and restore, or for HSM storage on alternate media, such as tape, tape library, virtual tape library, or data de-duplication devices. If the data is moved for HSM purposes, a stub file is left on the disk, and this HSM data can be retrieved from the external storage pool on demand, as a result of an application opening a file. HSM data can also be retrieved in a batch operation if desired.

Note that if all data movement is within a SONAS, there is no need for an external Tivoli Storage Manager server.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

25

SONAS Software provides the file management concept of a fileset, which is a sub-tree of the file system namespace, providing a way to partition the global namespace into smaller, more manageable units. A fileset is basically a named collection of files and/or directories that you want to operate upon or maintain as a common unit. Filesets provide an administrative boundary that can be used to set quotas and be specified in a user defined policy, to control initial data placement or data migration. Currently, up to 1,000 filesets can be defined per file system, and it is a known requirement to increase this number in the future. Data and files in a single SONAS fileset can reside in one or more logical storage pools. As the data is physically migrated according to storage policy, the fileset grouping is maintained. Where the file data physically resides, and how and when it is migrated, is based on a set of rules in a user defined policy, that is managed by the SONAS Software policy engine. Let us next overview this SONAS central policy engine.

1.4.8 SONAS Software central policy engine


All files under the control of SONAS Software are managed under the control of a integrated central storage policy engine. Within the central policy engine are rules that specify all aspects of file management. There are two types of rules: Initial physical file placement is the first consideration. Physical file management includes physical movement of the file between tiers of disk storage, and between disk storage and external tape, virtual tape library, or de-duplication storage. File management rules can also include backup/restore, global alteration of file characteristics according to any file system criteria, and deletion of files that have expired. File placement policies determine which storage pool file data is initially placed in. File placement rules are determined by attributes known when a file is created, such as file name, user, group or the fileset. Examples might be: place all files that end in .avi in the silver storage pool, place all files created by the performance critical applications in the gold storage pool, or place all files in the fileset development in the copper pool. Files written to SONAS are physically placed according to these rules, and these rules are contained in a SONAS storage policy. The SONAS administrator writes these rules, which are SQL-like statements.

26

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Examples of these rules are shown in Figure 1-25.

Policy engine and Storage Policies

Automated Tiered Storage policy statem ent examples: Migration policies, evaluated periodically

rule 'cleangold' migrate from pool TIER1' threshold (90,70) to pool TIER2 rule 'hsm' migrate from pool T IER3' threshold(90,85) weight(current_timestamp access_time) to pool HSM' where file_size > 1024kb rule 'cleansilver' when day_of_week()=Monday migrate from pool 'silver' to pool 'bronze' where access_age > 30 days

Deletion policies, evaluated periodically

rule 'purgebronze' when day_of_month()=1 delete from pool 'bronze' where access_age>365 days

There are also P olicies for:


File based Backup/Archive Restore/Retrieve Many more options..

Figure 1-25 SONAS Software policy engine and storage policies

After files exist in a SONAS file system, SONAS Software file management policies allow you to move, change the replication status or delete files. You can use file management policies to move data from one pool to another, without changing the files location in the directory structure. The rules are very flexible; as an example, you can write a rule that says: replicate all files in /database/payroll which have the extension *.dat and are greater than 1 MB in size to storage pool 2. In addition, file management policies allow you to prune the file system, deleting files as defined by policy rules. File management policies can use more attributes of a file than file placement policies, because after a file exists, there is more known about the file. In addition to the file placement attributes, the policies can now utilize attributes such as last access time, size of the file or a mix of user and file size. This can result in policy statements such as: delete all files with a name ending in .temp that have not been accessed in 30 days, move all files that are larger than 2 GB to pool2, or migrate all files owned by GroupID=Analytics that are larger than 4 GB to the SATA storage pool

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

27

Rules can include attributes related to a pool instead of a single file, using the threshold option. Using thresholds you can create a rule that moves files out of the high performance pool if it is more than 80% full, for example. The threshold option comes with the ability to set high low and pre-migrate thresholds. This means that SONAS Software begins migrating data at the high threshold, until the low threshold is reached. If a pre-migrate threshold is set, SONAS Software begins copying data until the pre-migrate threshold is reached. This allows the data to continue to be accessed in the original pool until it is quickly deleted to free up space the next time the high threshold is reached. Policy rule syntax is based on the SQL 92 syntax standard and supports multiple complex statements in a single rule enabling powerful policies. Multiple levels of rules can be applied because the complete policy rule set is evaluated for each file when the policy engine executes.

1.4.9 High performance SONAS scan engine


We apply these storage management rules to all files in the SONAS. However, as numbers of files and storage grows from terabytes into petabytes, storage management now has a major new requirement: how can we scan the file systems fast enough in order to identify files that must be: Migrated to another storage pool Propagated to remote site(s) Backed up Restored Deleted Any other storage management requirements As the number of files continues to grow, the time required for this scan using the traditional walk the directory tree method becomes a major obstacle to effective storage management. The shrinking backup and storage management windows require scan times to stay small or even shrink more, even as the file systems continue grow from hundreds of terabytes to petabyte level. At at the level of petabyte storage scalability, it becomes unfeasible to use the traditional method of walk the directory tree to identify files; that simply takes too long. To address this essential requirement, the SONAS is specifically designed to provide a high performance, high speed scan engine. The SONAS scan engine is an integrated part of the file system. Also integrated into the SONAS file system is an internal database of file system metadata, which is specifically designed for the integrated scan engine. The goal of these two functions is to provide the ability to scan the file system very quickly, at any scale, extending to billions of files.

28

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Let us see how this works in more detail. To begin a scan to identify files, we submit a job to the SONAS Software central policy engine to evaluate a set of policy rules, as shown in Figure 1-26.

Scan Engine
/hom e /appl /data /web

Scan Engine reads internal SONAS file system metadata Does not need to read the file or directory tree All nodes can participate in scan of file system

Central policy engine starts scan by reading policies

IBM Scale Out NAS 1. Start scan

Global Namespace
Inter fa ce node

Policy Engine
2. Read policies
Interface nodes
Int erf ac e node Inter fa ce node

Scale Out
Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s

Stora ge nodes

Stor age Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-26 High performance scan engine - start scan by reading policies

The SONAS Software high performance scan engine is designed to utilize the multiple hardware nodes of the SONAS in parallel, to scan the internal file system metadata. The multiple nodes equally spread the policy engine rule evaluation, file scan identification, and and subsequent data movement responsibilities over the multiple nodes in the SONAS cluster. If greater scan speed is required, more SONAS nodes can be allocated to the scan, and each node will scan only its equal portion of the total scan.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

29

This architectural aspect of SONAS Software provides a very scalable, high performance, scale out rule processing engine, that can provide the speed and parallelism required to address petabyte file system scan requirements. This is shown in Figure 1-27.

Scan Engine
/hom e /appl /data /web

Scan Engine reads internal SONAS file system metadata Does not need to read the file or directory tree All nodes can participate in scan of file system

Parallel metadata scan IBM Scale Out NAS Scan > 10 million files/minute

Some or all nodes (both storage and interface) participate in parallel scan engine

Global Namespace
Policy Engine
Inter fa ce node

Interface nodes

Int erf ac e node

Inter fa ce node

3. Parallel Scan
Stora ge nodes Stora ge nodes Stor age node s

3. Parallel Scan
Stor age node s Stor age node s Stor age node s

Scale Out

Stor age Pool 1

Storage Pool 2

Storage 3

Figure 1-27 High performance scan engine - parallel scan of metadata by all nodes

The results of the parallel scan are aggregated, and returned as the actionable list of candidate files, as shown in Figure 1-28.

Scan Engine
/hom e /appl /data /web

Scan Engine reads internal SONAS file system metadata Does not need to read the file or directory tree All nodes can participate in scan of file system

Scan results completed in much shorter period of time, compared to traditional methods

IBM Scale Out NAS 4. Return results of scan

Policy Engine
Inter fa ce node

Global Namespace
Interface nodes
Int erf ac e node Inter fa ce node

Scale Out
Stora ge nodes Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s

Stor age Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-28 High performance scan engine - return results of parallel scan

30

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Notice that the SONAS scan engine is not limited to just storage. The scan engine can also be used as follows: Reset file attributes according to policy (change deletions, change storage pool allocation, and so on) Run reports on file system usage and user activities Identify changed data blocks for asynchronous replication to remote site

Summary: SONAS Software high performance scan engine


The SONAS parallel scan engine offers the following functionality: Reads policy engine policies Identifies files that need to be moved within the physically tiered storage, sent to remote sites, and so on. Enables and makes feasible automated tiered storage at terabyte and petabyte scale The SONAS high performance scan engine has the following capabilities: Does not need to read the file or directory tree Reads special metadata integrated and maintained by the file system All nodes can participate in parallel scan of file system Delivers very high performance scan with minimized impact on concurrent workloads Can perform scan on frequent basis due to low overhead As long as the data movement is within the SONAS, or between SONAS devices, then all physical data movement is done solely through SONAS Software, and no involvement of any external servers or external software is involved. This combination of the internal file system metadata and the SONAS scale out parallel grid software architecture enables SONAS to provide a architectural solution to scanning the file system(s) quickly and efficiently, at the level of millions and billions of files in a short period of time. The SONAS Software integrated high performance scan engine and data movement engine work together to make feasible, the management of automated tiered storage, with physical data movement transparent to the users, at the level of hundreds of terabytes to petabytes in the file systems.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

31

1.4.10 High performance physical data movement for ILM / HSM


Now that we have used the scan engine to identify candidate files for automated storage management, let us see how the parallel grid architecture of SONAS is used to scale out physical data movement. After the list of candidate files has been identified using the SONAS parallel scan engine, SONAS Software then performs physical data movement according to the outcome of the rules. Physical data movement is also performed using the multiple hardware nodes of the SONAS cluster in parallel. Figure 1-29 shows physically moving a file from Storage Pool 1 to Storage Pool 2.

High performance internal data movement


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg

IBM Scale Out NAS 5. Perform results of scan

6. All nodes (both storage and interface) can participate in parallel data movement

Global Namespace
Interface nodes
Int erf ac e node Inter fa ce node

Policy Engine
Inte rfa c e node

Scale Out
Stora ge nodes Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s

Storage Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-29 High performance parallel data movement for ILM - pool 1 to pool 2

All files remain online and fully accessible during this physical data movement; the logical appearance of the file path and location to the user does not change. The user has no idea that the physical location of his file has moved. This is one of the design objectives of the SONAS.

32

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

According to the results of the scan, SONAS continues with other physical file movement. According to policy, data can be up-staged as well as down-staged, as shown in Figure 1-30.

High performance internal data movement


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg

IBM Scale Out NAS 5. Perform results of scan

6. All nodes (both storage and interface) can participate in parallel data movement

Global Namespace
Interface nodes
Int erf ac e node Inter fa ce node

Policy Engine
Inte rfa c e node

Scale Out
Stora ge nodes Stora ge nodes Stor age node s Stor age node s Stor age node s Stor age node s

Storage Pool 1

Storage Pool 2

Storage Pool 3

Figure 1-30 High performance parallel data movement for ILM - pool 2 to pool 1

As the SONAS grows in capacity over time, it is a straightforward matter to add additional nodes to the parallel cluster, thus maintaining the ability to perform and complete file system scans and physical data movement in a timely manner, even as the file system grows into hundreds of terabytes and petabytes.

1.4.11 HSM backup/restore to external storage


SONAS also supports the ability to extend physical data movement to external storage outside of SONAS. There are two types of operations to external storage: Hierarchical Storage Management (HSM): Migrate inactive files to external storage, while leaving a stub file on disk. Backup / restore (B/R): Backup or restore copies of files, to and from SONAS and external storage Traditional software that performs these functions can accomplish these functions on SONAS, using walk the directory tree methods, identifying candidate files through normal means, and performing normal LAN I/O to do data movement. In this case, the normal parameters of file system scan time will apply. Tip: IBM has an extensive Independent Software Vendor (ISV) certification program for SONAS. Enterprises use many ISV applications for their storage management to address business requirements. IBM has done extensive testing and intends to continue to ensure interoperability and compatibility of the leading ISV applications with SONAS to reduce deployment risks.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

33

1.4.12 Requirements for high performance external HSM and backup restore
SONAS can support any standard HSM and backup/restore software. These conventional solutions use normal walk the directory tree methods to identify files that need to be managed and moved, and then copies these files using conventional methods. However, as file systems continue grow to the hundreds of terabytes to petabyte level, the following requirements have arisen: The elapsed time for traditional scans to identify files that need to be moved for HSM or backup/restore purposes, are becoming too long. In other words, due to the scale of the search, the time required to walk the directory tree is becoming too long, and incurs a very large amount of small block IOPs. These long scan times can severely inhibiting the ability to manage a large amount of storage. In many cases, the scan time alone can be longer than the backup or tiered storage management window. In addition, after we do identify the files, the large amount of data that this can represent often drives the need for very high data rates in order to accomplish the needed amount of HSM or backup/restore data movement, within a desired (and continually shrinking) time window Therefore, in order to address these issues and make feasible automated tiered storage at large scale, SONAS provides a specific set of technology exploitations to solve these issues, significantly reduce this overly long scan time, to perform efficient data movement as well as HSM to external storage. SONAS does this by providing optional (yet highly desirable) exploitation and integration with IBM Tivoli Storage Manager. SONAS Software has specific high performance integration with Tivoli Storage Manager to provide accelerated backup/restore and accelerated, more functional HSM to external storage.

1.4.13 SONAS high performance HSM using Tivoli Storage Manager


With IBM Tivoli Storage Manager, the SONAS scan engine works together, combining SONAS parallel grid architecture with software parallelism in Tivoli Storage Manager, to significantly enhance the speed, performance, and scale of both HSM and backup/restore processes. In a SONAS environment, Tivoli Storage Manager does not need to walk directory trees to identify files that need to be moved to external storage, backed up, or restored. Instead, the SONAS high performance scan engine is used to identify files to be migrated, and Tivoli Storage Manager server(s) are architected to exploit multiple SONAS interface nodes, in parallel, for data movement.

34

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The architecture of the SONAS HSM to external storage is shown in Figure 1-31.

Hierarchical Storage Management to tape using TSM


Scan engine results TSM/HSM Server
Parallel data streams

Migrate inactive data to tape, tape lib, or de-duplication device


IBM Scale Out NAS Stub file is left on disk, remainder of file migrated to external

Figure 1-31 HSM to external storage using Tivoli Storage Manager

In this SONAS + Tivoli Storage Manager HSM scenario, a stub file is left on disk. allowing the appearance of the file to be active in the file system. Many operations, such as list files, are satisfied by the stub file without any need for recall. You have flexible control over the HSM implementation, such as specifying the size of the stub file, the minimum size of file in order to be eligible for migration, and so on. If the file is requested to be accessed but is resident only on external storage, the file is transparently auto-recalled from the external storage through the Tivoli Storage Manager server. Data movement to and from external storage is done in parallel through as many multiple SONAS interface nodes as desired, maximizing throughput through parallelism. Data can be pre-migrated, re-staged and de-staged, according to policy. In this manner, SONAS provides the ability to support the storing of petabytes of data in the online file system, yet staging only the desired portions of the file system on the actual SONAS disk. The external storage can be any Tivoli Storage Manager-supported storage, including external disk, tape, virtual tape libraries, or data de-duplication devices.

1.4.14 SONAS high performance backup/restore using Tivoli Storage Manager


The same SONAS and Tivoli Storage Manager architecture is used for backup and restore acceleration. The first step in backup/restore is to identify the files that need to be backed up for Tivoli Storage Managers incremental forever method of operation. The SONAS high performance scan engine is called by Tivoli Storage Manager to perform this task, and then the SONAS scan engine passes this list of identified changed files to be backed up to Tivoli Storage Manager. Rules for backup are submitted to the be included in a SONAS policy engine scan of the file system. The high performance scan engine locates files that needs to be backed up, builds a list of these files, and then passes these results to the Tivoli Storage Manager server, as shown in Figure 1-32.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

35

Backup/restore acceleration using Tivoli Storage Manager


Scan engine results Tivoli Storage Manager backup ser ver

Parallel data streams Any TSM-supported devices including: ProtectTier de-dup Virtual Tape Library Tape IBM Scale Out NAS

Figure 1-32 Backup and restore acceleration using Tivoli Storage Manager

Now, let us examine how the SONAS and Tivoli Storage Manager exploitation works in a little more detail.

1.4.15 SONAS and Tivoli Storage Manager integration in more detail


The first step in either HSM or backup/restore is to identify the files that need to be migrated or backed up. We submit Tivoli Storage Manager rules to the SONAS central policy engine scan of the file system, which specify to perform a external storage HSM, or a high performance backup or restore. After scanning the file system as previously described in this chapter, SONAS passes the scan engine results back to the Tivoli Storage Manager server, as shown in Figure 1-33.

SONAS scan engine identifies files for Tivoli Storage Manager


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

Scan engine identifies files to be restaged (up or down), or to be backed IBM Scale Outup NAS

List of files passed to Tivoli Storage Manager server


Tiv oli Stor age M ana ger

Global Namespace
Policy Engine
Inte rf ac e node

Inter fa ce node

2. Pass list of changed files to TSM 1. Scan to identify changed files


St or age node Stor age node

Stor age Pool 1

Storage Pool 2

Figure 1-33 SONAS scan engine and Tivoli Storage Manager

36

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Rather that using the walk the directory tree method, the SONAS scan engine uses multiple SONAS interface nodes to parallel scan the file system and identify the list of changed files. SONAS then passes the list of changed files directly to the Tivoli Storage Manager server. In this way, we use the SONAS scan engine to avoid the need to walk the directory tree, and to avoid the associated traditional time-consuming small block directory IOs. The results of the scan are divided up among multiple interface nodes. These multiple interface nodes then work in parallel to with the Tivoli Storage Manager servers to initiate the HSM or backup/restore data movement, creating parallel data streams. The Tivoli Storage Manager software implements a virtual node function, which allows the multiple SONAS interface nodes to stream the data in parallel to a Tivoli Storage Manager server, as shown in Figure 1-34.

Data m ovement to external storage using Tivoli Storage Manager


/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls
/hom e /appl /data /web

/home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

High performance parallel data movement to Tivoli Storage Manager server IBM Scale Out NAS
Tiv oli Stor age M ana ger

Global Namespace
Policy Engine
Inte rf ac e node

Inter fa ce node


3. Parallel data movement using multiple interface nodes to TSM server - backup to Tape, Virtual Tape Lib, or De-duplication

2. Pass list of changed files to TSM 1. Scan to identify changed files


St or age node Stor age node

Stor age Pool 1

Storage Pool 2

Figure 1-34 Parallel data streams between SONAS and Tivoli Storage Manager

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

37

In this way, the SONAS Software and Tivoli Storage Manager work together to exploit the SONAS scale out architecture to perform these functions at petabyte levels of scalability and performance. As higher data rates are required, more interface nodes can be allocated to scale out the performance in a linear fashion, as shown in Figure 1-35.

High performance data movement to external storage using Tivoli Storage Manager server
/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg

IBM Scale Out NAS 5. Perform results of scan

TSM s erv ers TSM ser v er s

Policy Engine
Inte rfa c e node

Interface nodes

Int erf ac e node

Inter fa ce node

Scale Out

Stora ge nodes

Stora ge nodes

Stor age node s

Stor age node s

Stor age node s

Stor age node s

Storage Pool 1

Storage Pool 2

Storage Pool 3

Stub files left on disk, auto-recall

Figure 1-35 High performance parallel data movement at scale, from SONAS to external storage

SONAS scale out architecture combined with Tivoli Storage Manager can be applied to maintain desired time windows for automated tiered storage, HSM, and backup/restore, even as file systems grow into hundreds of terabytes to petabytes. Integration: SONAS only requires external Tivoli Storage Manager servers to exploit: Accelerated HSM to external storage pools Accelerated backup/restores and HSM that exploit the SONAS Software scan engine Accelerated external data movement that exploits multiple parallel interface nodes to raise the backup/restore and HSM data rates All internal data movement within a SONAS (between internal SONAS logical storage pools) or between SONAS machines (SONAS async replication) is done by the SONAS Software itself, and does not require any involvement with an external Tivoli Storage Manager servers. Of course, SONAS also supports conventional external software that performs backup/restore and HSM, through normal walk the directory tree and normal copying of files. You can find more information about SONAS and Tivoli Storage Manager integration in SONAS and Tivoli Storage Manager integration on page 119 and Backup and restore of file data on page 185.

38

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

1.4.16 Summary: Lifecycle of a file using SONAS Software


In this section, we have seen the lifecycle of a file in SONAS, and through that, an overview of the operational characteristics of SONAS Software. We have seen how SONAS Software performs the following functions: Creation of files Serving of files in a high performance manner, including providing scalability and parallel high performance using wide striping Automatic tiered storage management, effecting physical storage movement using central policy engine Migration of data to external storage for HSM and for backup, using an external Tivoli Storage Manager server In the remainder of this chapter, we explore in more detail how SONAS Software provides a rich, integrated set of functions to accomplish these operational methods. As shown in Figure 1-36, this is the end result of SONAS automated tiered storage: the centralized data management capability to manage the lifecyle of a file.

End result SONAS automated tiered storage:


/home/appl/data/web/important_big_spreadsheet.xls
/hom e /appl /data /web

/home/appl/data/web/big_architecture_drawing.ppt /home/appl/data/web/unstructured_big_video.mpg

Note: all three files, no change to logical directory IBM Scale Out NAS 5. Perform results of scan

TSM s erv e rs TSM s er v er s

Policy Engine
Inte rfa c e node

Interface nodes

Int erf ac e node

Inter fa ce node

Scale Out

Stora ge nodes

Stora ge nodes

Stor age node s

Stor age node s

Stor age node s

Stor age node s

Storage Pool 1

Storage Pool 2

Storage Pool 3

Physical data movement transparent to users

Stub files left on disk, auto-recall

Figure 1-36 End result - SONAS automated tiered storage

During all of these physical data movement and management operations, the user logical file path and appearance remains untouched. The user does not have any idea that this large scale, high performance physical data management is being automatically performed on their behalf.

Chapter 1. Introduction to IBM Scale Out Network Attached Storage

39

1.4.17 Chapter summary


SONAS is a highly desired choice for organizations seeking to better manage their growing demand for file-based storage. SONAS is designed to consolidate data that is scattered in multiple storage locations and allows them to be efficiently shared and managed. The solution helps improve productivity by providing automated ILM, automatic storage allocation, user management by storage quota, and universal reporting and performance management. In the following chapters, we begin a more detailed discussion of the SONAS Software architecture, components, and operational methodologies.

40

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 2.

Hardware architecture
In this chapter, we discuss the basic hardware structure of the SONAS appliance product, The configuration consists of a collection of interface nodes that provide file services to external application machines running standard file access protocols such as NFS or CIFS, a collection of storage nodes that provide a gateway to the storage, and at least one management node that provides a management interface to the configuration. In addition to the nodes, there are switches and storage pods. We describe the hardware components of the SONAS appliance, providing the basis for the configuration, sizing, and performance considerations discussed throughout the book.

Copyright IBM Corp. 2010. All rights reserved.

41

2.1 Nodes
The SONAS system consists of three types of server nodes. A set of interface nodes that provide connectivity to your Internet Protocol (IP) network for file services to external application machines running standard file access protocols such as NFS or CIFS A management node that provides a management interface to the SONAS configuration Storage nodes that provide a gateway to the SONAS storage The management node, the interface nodes, and the storage nodes all run the SONAS Software product in a Linux operating system. Product software updates to the management node are distributed and installed on each of the interface nodes and storage nodes in the system. The interface nodes, management nodes, and storage nodes are connected through a scalable, redundant InfiniBand fabric allowing data to be transferred between the interface nodes providing access to the application and the storage nodes with direct attachments to the storage. InfiniBand was chosen for its low overhead and high speed, 20 Gbits/sec for each port on the switches. The basic SONAS hardware structure is shown in Figure 2-1.

Figure 2-1 Overview of SONAS hardware structure

42

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2.1.1 Interface nodes


The interface node is a 2U server that provides the TCP/IP data network connectivity and the file services for the SONAS system. SONAS supports the following file-serving capabilities: Common Internet File System (CIFS) Network File System v3 (NFSv3) File Transfer Protocol (FTP) HyperText Transfer Protocol Secure (HTTPS) The interface node contains two redundant hot-swappable 300 GB 2.5-inch 10K RPM SAS hard disk drives (HDDs) with mirroring between the two HDDs for high-availability. The HDDs contain the SONAS system software product, containing the operating system and all other software needed for an operational SONAS system. These nodes can operate at up to 10 Gb speeds with optional adapters to providing extremely fast access to data. They are connected to the rest of the SONAS system by a redundant high speed InfiniBand data network. Files can be accessed through each of the interface nodes, which provide a highly scalable capability for data access. Additional data access performance can be obtained by adding interface nodes up to the limits of SONAS. SONAS R1 allows a minimum of two interface nodes and a maximum of 30 interface nodes. Collections of files (file systems) are provided by storage nodes which are gateways to storage controllers and disk drawers. All interface nodes can access all storage on all storage nodes. All storage nodes can send data to any interface node. Two of the onboard Ethernet ports connect to the internal private management network within the SONAS system for health monitoring and configuration. The other two onboard Ethernet ports connect to the IP external network for network file serving capabilities. Two of the PCIe adapter slots are available for use to add more adapters for host IP interface connectivity. The InfiniBand Host Channel Adapters (HCAs) attach to the two independent InfiniBand switches in the system to interconnect the interface nodes to the management nodes and the storage nodes in an InfiniBand fabric.

Interface node components


An interface node contains the following components: Two Intel Nehalem EP quad-core processors 32 GB of Double Data Rate (DDR3) memory standard, with options to expand to 64 GB or 128 GB Four onboard 10/100/1000 Ethernet ports: Two of which are used within the system for management features Two of which are available for connectivity to customers, external IP network Two 300 GB 2.5-inch Small Form Factor (SFF) 10K RPM SAS Slim-HS hard disk drives with mirroring between the two HDDs (RAID 1) for high availability Four PCIe Gen 2.0 x8 adapter slots: The top two adapter slots, each containing a single-port 4X (DDR InfiniBand Host Channel Adapter (HCA) card that connects to the SONAS InfiniBand fabric for use within the system One of the bottom two adapter slots, which can optionally contain zero or one Quad-port 10/100/1000 Ethernet NIC (FC 1100)

Chapter 2. Hardware architecture

43

One of the bottom two adapter slots, which can optionally contain zero or one Dual-port 10 Gb Converged Network Adapter (CNA) (FC 1101) The bottom two adapter slots, which can have zero, one, or two optional Ethernet cards, but only one of each kind of adapter, with a maximum total number of six additional ports Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Two redundant hot-swappable power supplies Six redundant hot-swappable cooling fans

Optional features for interface nodes


Consider these optional features for interface nodes when planning for your SONAS device: Additional 32 GB of memory (FC 1000): Feature code 1000 provides an additional 32 GB of memory in the form of eight 4 GB 1333MHz double-data-rate three (DDR3) memory dual-inline-memory modules (DIMMs). The feature enhances system throughput performance, but it is optional. You can order only one of FC 1000 per interface node. Installation of FC 1000 into an already installed interface node is a disruptive operation that requires you to shut down the interface node. However, a system with a functioning interface node continues to operate with the absence of the interface node being upgraded. 128 GB memory option (FC 1001): Provides 128 GB of memory in the interface node. This feature provides for installation of 128 GB of memory in the interface node in the form of sixteen 8 GB 1333 MHz DDR3 memory DIMMs. Only one of FC 1001 can be ordered. Feature code 1001 is mutually exclusive with feature #1000 on the initial order of an interface node. Installation of FC 1001 in an existing already installed SONAS interface node is a disruptive operation requiring the interface node to be shut down. However, a SONAS system will continue to operate with the absence of the interface node being upgraded with the addition of FC 1001. Quad-port 1 GbE NIC (FC 1100): This provides a quad-port 10/100/1000 Ethernet PCIe x8 adapter card. This NIC provides four RJ45 network connections for additional host IP network connectivity. This adapter supports a maximum distance of 100m using Category 5 or better unshielded twisted pair (UTP) four-pair media. The customer is responsible for providing the network cables to attach the network connections on this adapter to their IP network. One of feature code 1100 can be ordered for an interface node. The manufacturer of this card is Intel, OEM part number: EXPI9404PTG2L20. Dual-port 10 Gb Converged Network Adapter (FC 1101): This provides a PCIe 2.0 Gen 2 x8 low-profiles dual-port 10 Gb Converged Network Adapter (CNA) with two SFP+ optical modules. The CNA supports short reach (SR) 850nm multimode fiber (MMF). The customer is responsible for providing the network cables to attach the network connections on this adapter to their IP network. One of Feature code 1101 can be ordered for an interface node. The manufacturer of this card is Qlogic, OEM part number: FE0210302-13. Cables: Cat 5e cable or better is required to support 1 Gb network speeds. Cat 6 cable provides better support for 1 Gbps network speeds. The 10 GbE data-path connections support short reach (SR) 850 nanometer (nm) multimode fiber (MMF) optic cables that typically can reliably connect equipment up to a maximum of 300 meters (M) using 50/125 (2000MHz*km BW) OM3 fiber.

44

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2.1.2 Storage nodes


The storage node is a 2U server that connects the SONAS system to the InfiniBand cluster and also directly connects to the Fibre Channel attachment on the SONAS storage controller. Storage nodes are configured in high-availability (HA) pairs that are connected to one or two SONAS storage controllers. Two of the onboard Ethernet ports connect the storage node to the internal private management network, and two for a NULL Ethernet connection to the DASD Disk Storage Controller. All interface nodes can access all storage on all storage nodes. All storage nodes can send data to any interface node. A SONAS system contains a minimum of two storage nodes and a maximum of 60 when using 96-port InfiniBand switches in the base rack. When using 36-port InfiniBand switches in the base rack, the maximum number of storage nodes is 28. The storage node contains two redundant hot-swappable 300 GB 2.5-inch 10K RPM SAS HDDs with mirroring between them for high-availability. The hard disk drives contain the SONAS System Software product which hosts the operating system and all other software needed for an operational SONAS system. All of the PCIe x8 adapter slots in the storage node are already populated with adapters. Two of the PCIe adapter slots are populated with two single-port 4X DDR InfiniBand HCAs for attaching to the two InfiniBand switches in the SONAS system. The other two PCIe x8 adapter slots are populated with two dual-port 8 Gbps Fibre Channel Host Bus Adapters (HBAs) for attaching to the SONAS Storage Controller. A storage node contains the following components: Two Intel Nehalem EP quad-core processors 8GB of Double Data Rate (DDR3) memory Four onboard 10/100/1000 Ethernet ports: Two of which are used within the system for management features Two of which connect directly to Disk Storage Controllers. Two 300 GB 2.5-inch Small Form Factor (SFF) 10K RPM SAS Slim-HS hard disk drives with mirroring between the two HDDs (RAID 1) for high availability Four PCIe Gen 2.0 x8 adapter slots: The top two adapter slots each contain a single-port 4X (DDR InfiniBand Host Channel Adapter (HCA) card that connects to the SONAS InfiniBand fabric for use within the system. Two of the bottom slots each contain a dual-port 8Gb/s Fiber Channel Host Bus Adapters (HBAs) for attaching to the SONAS Storage Controller. Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Two redundant hot-swappable power supplies Six redundant hot-swappable cooling fans

2.1.3 Management nodes


The management node is a 2U server that provides a central point for the system administrator to configure, monitor, and manage the operation of the SONAS cluster. The management node supports both a browser-based graphical user interface (GUI) and a command line interface (CLI). It also provides a System Health Center for monitoring the overall health of the system.
Chapter 2. Hardware architecture

45

A single management node is required. The SONAS system will continue to operate without a management node, but configuration changes can only be performed from an active management node. Nodes: SONAS administrators will only interact with the storage and interface nodes directly for the purpose of debug under the guidance of IBM service. You will have no need to access the underlying SONAS technology components for SONAS management functions, and will have no need to directly access the interface or storage nodes. The management node contains two hot-swappable 300 GB 2.5-inch 10K RPM SAS hard disk drives with mirroring between the two HDDs for high-availability. The hard disk drives contain the SONAS System Software product, containing the operating system and all other software needed for an operational SONAS system. The third hot-swappable 300 GB 2.5-inch 10K RPM SAS hard disk drive stores the logging and trace information for the entire SONAS system. Two of the PCIe x8 adapter slots are already populated with two single-port 4X Double Data Rate (DDR) InfiniBand Host Channel Adapters (HCA). The two HCAs attach to two independent InfiniBand switches in the SONAS system and interconnect the management node to the other components of the SONAS system. A management node contains the following components: Two Intel Nehalem EP quad-core processors 32 GB of Double Data Rate (DDR3) memory standard Four onboard 10/100/1000 Ethernet ports: Two of which are used to connect to your Internet Protocol (IP) network for health monitoring and configuration Two of which connect to the internal private management network within the SONAS system for health monitoring and configuration Two 300 GB 2.5-inch Small Form Factor (SFF) 10K RPM SAS Slim-HS hard disk drives with mirroring between the two HDDs (RAID 1) for high availability One non-mirrored 300 GB 2.5-inch SFF 10K RPM SAS Slim-HS hard disk drive for centralized collection of log files and trace data Four PCIe Gen 2.0 x8 adapter slots: Two of which each contain a single-port 4X Double Data Rate (DDR) InfiniBand Host Channel Adapter (HCA) for use within the system Two of which are available for your use to add more adapters for host IP interface connectivity Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Two redundant hot-swappable power supplies Six redundant hot-swappable cooling fans The management node comes with all of the cables that are required to connect it to the switches withinthe base rack. The management node is assumed to be in the SONAS base rack with the two InfiniBand switches.

46

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2.2 Switches
The SONAS system contains internal InfiniBand, internal Ethernet switches, and external customer supplied external Ethernet switches.

2.2.1 Internal InfiniBand switch


All major components of a SONAS system, such as interface nodes, storage nodes, and management node, are interconnected by a high-performance low-latency InfiniBand 4X Double Data Rate (DDR) fabric. Two redundant InfiniBand switches are included inside each SONAS system. For small and medium configurations a 1U 36 port 4X DDR InfiniBand switch is available. For larger configurations a 7U 96-port 4X DDR InfiniBand switch is available. You can choose the larger InfiniBand switch, but start with basic configuration with just the default 24-port InfiniBand card in each switch and as demand for your system grows, you can order additional 24-port cards until you scale out the system to the maximum amount of 96 InfiniBand ports in each switch. Two identical InfiniBand switches must be ordered for a SONAS system, either two 36-port InfiniBand switches or two 96-port InfiniBand switches. Important: At the time of ordering the SONAS, you must choose between the smaller InfiniBand 36-port switch or the larger 96-port InfiniBand switch. It is not possible to field upgrade the 36-port InfiniBand switch to the 96-port InfiniBand switch. Your only option might be to purchase a new base rack with the 96 port InfiniBand switch. The migration can require a file system outage for the migration to the new base rack. It is important to take into consideration future growth in your environment and order appropriately. The SONAS InfiniBand network supports high bandwidth, low latency file system data and control traffic among the nodes of the system. The data carried by the InfiniBand fabric includes low level file system data, as well as the TCP/IP-based locking and management messaging traffic. The locking and management traffic occurs on IP over InfiniBand, which bonds a single IP address across all of the InfiniBand adapters. There are two switch sizes: total backplane bandwidth is 144 Gigabytes/sec for the smaller 36-port InfiniBand switches, and 384 Gigabytes/sec for the larger 96 port InfiniBand switches). Cabling within a rack is provided as part of the rack configuration and it is done by IBM. You must order InfiniBand cable features for inter-rack cabling after determining the layout of your multi-rack system.

36-port InfiniBand switch


The SONAS 36-port InfiniBand switch (2851-I36) is a 1U 4X DDR InfiniBand switch that provides 36 QSFP ports that each operate at 20 gigabits per second (Gbps). This InfiniBand switch provides a maximum backplane bandwidth of 1.44 terabits per second (Tbps) and contains an embedded InfiniBand fabric manager. The switch provides two redundant hot-swappable power supplies. The 36-port InfiniBand switch (2851-I36) has no options or upgrades.

96-port InfiniBand switch


The SONAS 96-port InfiniBand switch (2851-I96) provides a 7U Voltaire ISR2004 96-port 4X DDR InfiniBand switch providing up to 96 4x DDR CX4 switch ports. The 96-port InfiniBand switch is intended for large SONAS system configurations. The InfiniBand switch provides a maximum backplane bandwidth of 3.84 Tbps.

Chapter 2. Hardware architecture

47

The 96-port InfiniBand switch comes standard with the following components: Two Voltaire sFB-2004 Switch Fabric boards One Voltaire sLB-2024 24-port 4X DDR InfiniBand Line Board One Voltaire sMB-HM Hi-memory Management board containing an embedded InfiniBand fabric manager. Two sPSU power supplies All fan assemblies The 96-port switch comes standard with one 24-port 4X DDR InfiniBand line board providing 24-port 4X DDR (20 Gbps) IB ports. Up to three additional sLB-2024 24-port 4X DDR InfiniBand line boards can be added for a total of 96-ports. The 96-port InfiniBand switch comes standard with two sLB-2004 Switch Fabric boards. Up to two additional sLB-2004 Switch Fabric boards can be added to provide additional backplane bandwidth. The 96-port InfiniBand switch comes standard with two power supplies. Up to two additional sPSU power supplies can be added for redundancy. The two standard power supplies are capable of powering a fully configured 96-port InfiniBand switch with the following components: Four sFB-2004 Switch Fabric boards Four sLB-2024 24-port 4X DDR line boards Two sMB-HM Hi-memory Management boards You can upgrade the 96 port switch non-disruptively. InfiniBand backplane bandwidth: InfiniBand switches = 20 Gbit/sec per port (2 GBytes/sec per port) InfiniBand 36-port switch backplane = 1.44 Tbits/sec (144 GBytes/sec total) InfiniBand 96-port switch backplane = 3.84 Tbits/sec (384 GBytes/sec total) The InfiniBand switches have sufficient bandwidth capability to handle a fully configured SONAS solution.

2.2.2 Internal private Ethernet switch


The Management network is an internal network to the SONAS components. Each SONAS rack has a pair of 50-port Ethernet switches, which are interconnected to form this network. The management network is designed to carry low bandwidth messages in support of the configuration, health, and monitoring of the SONAS subsystem. All major components of a SONAS system like interface nodes, storage nodes, management nodes, InfiniBand switches are connected to internal Ethernet network by internal, private Ethernet switches. The adapters connecting to the Ethernet switches are bonding with one another such that they share a single IP address. This requires that the Ethernet switches in each rack be interconnected with one another such that they form a single network. For multi-site sync configurations, the management network needs to be extended between the sites, connecting the internal switches from the base racks at each site. The internal Ethernet switches cannot be shared with external customers connections. They can be used only for internal communication within SONAS cluster.

48

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Internal IP addresses
During installation you can choose one of the IP address ranges listed next. The range that you select must not conflict with the IP addresses used for the customer Ethernet connections to the management node(s) or interface nodes. These are the available IP address ranges: 172.31.*.* 192.168.*.* 10.254.*.*

Integrated Management Module


The Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) on all interface nodes, storage nodes, and management node can be connected. For additional details about the Baseboard Management Controller, see IBM eServer xSeries and BladeCenter Server Management, SG24-6495. Ethernet cabling within a rack is provided as part of the rack and rack components order. You must order inter-rack Ethernet cables to connect the Ethernet switches in a rack to the Ethernet switches in the base rack.

2.2.3 External Ethernet switches


You have to provide an external TCP/IP infrastructure for external connections in your installation (data and management). This infrastructure cannot be shared with internal SONAS Ethernet switches. This network infrastructure has to support 1 Gb/s or 10 Gb/s Ethernet links; this depends on the NIC/CNA mounted in the interface nodes. These switches and cables are not provided as part of the SONAS appliance. All interface nodes have to be connected to your external network infrastructure for data serving. The management node has to be connected to your external network infrastructure for management the operation of the SONAS cluster by browser-based graphical user interface (GUI) and a command line (CLI).

2.2.4 External ports: 1 GbE / 10 GbE


SONAS supports up to 30 interface nodes which connect customer Ethernet network over up to two 10 Gb/s ports or up to six 1 Gb/s ports per interface node with data. SONAS is designed as a parallel grid architecture. Every node is a balanced modular building block, with sufficient main memory and PCI bus capacity to provide full throughput with the adapters configured to that node. Therefore, it is not possible, for example, for the 10 Gb/s Ethernet card to overrun the PCI bus of the interface node, because the PCI bus has more capacity than the 10 Gb/s Ethernet card. The InfiniBand switches have sufficient bandwidth capability to handle a fully configured SONAS. It means that customer can choose between 1 Gb/s or 10 Gb/s communication for SONAS storage and SONAS clients, internal SONAS network in any case will not be a bottleneck. The choice must depend on requested speed between SONAS clients and SONAS storage and current network infrastructure. By default all Ethernet 10 Gb ports in interface nodes work in bonding configuration as active-backup interfaces (only one slave in the bond is active) and all 1 Gb ports work in bonding as load balancing configuration, so transfer even with 1 Gb/s adapters can be sufficient in certain cases (see Bonding on page 150).

Chapter 2. Hardware architecture

49

2.3 Storage pods


Each storage pod consists of a pair of storage nodes, at least one storage controller with high density storage (at least 60 disks). A pod can be expanded by an additional second storage controller and one storage expansion attached for 60 additional disks per controller and per storage expansion. Each storage pod provides dual paths to all storage for reliability. Each interface node, and each storage pod, operate in parallel with each other. There is no fixed relationship between the number of interface nodes and storage pods.

Storage pod scaling


Storage pods can be scaled in providing an ability to increase storage capacity and bandwidth independently of the interface nodes. Capacity can be added to the system in two ways either adding disks to existing storage pods, or adding storage pods. Each storage pod supports a maximum of 240 hard disk drives.

Storage pod expansion


Figure 2-2 shows two possible ways to expand a storage pod. In Figure 2-3, you can see how to further expand the storage pod. A maximum of 30 storage pods are supported for a maximum of 7200 hard disk drives in a single system. SONAS stripes data across all disks in the storage pool, storage pod performance can be increased by adding more disks to the pod. The highest performance will be achieved by adding a new storage controller instead of expanding an existing one using a storage expansion unit. You can mix the disk drive types within a storage pod, but all the drives within an enclosure (that is, a drawer of 60 drives) have to be the same type.

Storage StorageNode Node 2851SS1 2851-SS1

St orage Node Storage Nod e 2851-SS1 2851- SS1

Storage Pod starts with 1 drawer of 60 disks

Storage StorageController Controller 2851-D R1 2851-DR1 Includes I ncludes 1 1 drawer draw er of of60 60disks disks Storage Pod

Two possible ways to expand Storage Pod:

Performance is approx. 1.85x of 60-disk pod

Performance is approx. 2x of 60-disk pod

Storage StorageNode Node 2851SS1 2851-SS1

St orage Node Storage Nod e 2851-SS1 2851- SS1

Storage Stor ag eNode N ode 2851-SS1 2851-SS1

St orage Node Storage Node 2851-SS1 2851-SS1

Storage StorageController Controller 2851-D R1 2851-DR1 Includes I ncludes 1 1 drawer draw er of of60 60disks disks Disk nit Disk Storage Stor ageExpansion ExpansionU Unit 2851-DE1 2851-D E1 Includes Includes 1 1drawer drawer of of 60 60 disks disks Sto rage Pod

or

Storage Stor ageController C ontroller 2851-DR1 2851-DR1 Includes Includes 1 1drawer drawer of of60 60disks disks

Stor age Controller Storage Controller 2851-DR1 2851-DR 1 Includes Includes 1 1drawer drawer of of 60 60 disks disks

2a
Storage Pod

2b

Figure 2-2 Adding additional storage controllers and expansion units to storage pod

50

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

St orage Node Storage Node 2851-SS1 2851-SS 1

Storage St orage Node Node 2851-SS1 2851-S S1

Storage Pod

Storage St orage Controller Controller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks dis ks

To expand these St orage Pods further

St orage Node Storage Node 2851-SS1 2851-SS 1

Storage S torageNode Node 2851-S S1 2851-SS1

Storage Pod
Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks disks St orage Control ler S torage Cont roller 2851-DR1 2851-DR1 I Includes ncludes 1 1 drawer drawer of of60 60 disks disk s

Dis kS torage E xpansion Unit Disk Storage Expansion Unit 2851-DE1 2851-DE1 Inc ludes 11 drawer Includes drawerof of60 60disks disks

2a

2b

St orage Node Storage Node 2851-SS1 2851-SS 1

S torage Node Storage Node 2851-SS1 2851-S S1

S torage St orage Node Node 2851-SS1 2851-S S1

Storage S torage Node Node 2851-SS 1 2851-SS1

Storage roller St orage Cont Controller 2851-DR1 2851-DR1 Includes I ncludes11drawer drawerof of60 60disks dis ks Dis k Storage on Uni t Disk St orage Expansi Ex pansion Unit 2851-DE1 2851-DE1 Includes Includes 1 1 drawer drawerof of60 60disks disks

Storage St orage Controller Controller 2851-DR1 2851-DR1 Incl udes 11drawer Inc ludes drawerof of 60 60 disks disk s

Add 2851-DR1 includes drawer of 60 drives

Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks disks

Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 I Includes nc ludes 1 1 drawer drawerof of60 60 disks disks Disk orage Expans ion Dis kSt S torage Expansi on Unit Unit 2851-DE1 2851-DE1 Includes I ncludes11drawer drawerof of60 60 dis ks di sks

Add 2851-DE1 to either Storage Cont roller

St orage Node Storage Node 2851-SS1 2851-SS 1

Storage P o d S torage Node Storage Node 2851-SS1 2851-S S1

S torage St orage Node Node 2851-SS1 2851-S S1

S torage Pod Storage Node S torage Node 2851-SS 1 2851-SS1

Storage roller St orage Cont Controller 2851-DR1 2851-DR1 Includes I ncludes11drawer drawerof of60 60disks dis ks Disk ion Uni t Disk Storage St orage Expans Ex pansion Unit 2851-DE1 2851-DE1 Includes Includes11drawer drawerof of60 60 disks dis ks

Storage St orage Controller Controller 2851-DR1 2851-DR1 Incl udes 11drawer Includes drawerof of 60 60disks di s ks Di skkStorage Dis St orage Expansion E xpansionUnit Unit 2851-DE 1 2851-DE1 Inc ludes ks Incl udes1 1 drawer drawer of of60 60 dis disks

Then completely fill the Storage Pod by adding final 2851-DE1 drawer of 60 drives

Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Incl Inc udes ludes11drawer drawerof of 60 60disks disks Dis k Storage t Disk St orage Expansion Ex pansionUni Unit 2851-DE1 2851-DE1 Inc ludes 1 Includes 1 drawer drawerof of60 60disks disks

Storage er S torage Controll Cont roller 2851-DR1 2851-DR1 Inc ludes 1 Includes 1 drawer drawerof of60 60 S ARA AS disk s SA RAor orS SAS di sks Disk DiskStorage S torage Expansion E xpansion Unit Unit 2851-DE 1 2851-DE1 Includes s Inc ludes11drawer drawerof of60 60 disk di sks S torage Pod

Then fill in t he ot her position with 2851-DE1 with 60 driv es

Storage P o d

Figure 2-3 Adding additional storage controllers and expansion units to storage pod.

Pods: The storage within the IBM SONAS is arranged in storage pods, where each storage pod contains: Two storage nodes One or two high density storage controllers Zero, one or two high density disk Storage expansion units

2.3.1 SONAS storage controller


The high density SONAS storage controller (2851-DR1) is a 4U enclosure containing redundant active/active RAID controllers, redundant power and cooling modules and 60 hard disk drives that connects the SONAS system to its storage. SONAS contains dual redundant hot-swappable RAID controllers and dual redundant hot-swappable power supplies and cooling modules. Each SONAS RAID controller supports up to 60 hard-disk drives. The storage controller is configured to use either of the following possibilities: RAID 5 with SAS hard-disk drives RAID 6 with Nearline SAS or SATA hard-disk drives

Chapter 2. Hardware architecture

51

All 60 disk drives in the storage controller must be the same type and capacity, that is, a drawer of 60 drives must be all SAS or all Nearline SAS, and all 60 drives must be the same capacity size. You cannot mix drive types or sizes within an enclosure. Controller and attached expansion driver can contain various disk types. You can order one high-density disk-storage expansion unit to attach to each storage controller. The expansion unit also contains a drawer of 60 disk drives; these 60 drives must be the same size and type. The size/type used in the expansion unit drawer, can be other than the size / type used in the storage controller drawer. Each SONAS storage controller supports up to four Fibre Channel host connections, two per Controller. Each connection is auto-sensing and supports 2 Gb/s, 4 Gb/s or 8 Gb/s. Each RAID controller contains: Cache, 4 GB Two 8 Gbps Fibre Channel host ports One drive-side SAS expansion port The storage controller is configured by default to work only with RAID 5 or RAID 6 arrays according to the hard disk drive type. Currently you cannot change the predefined RAID levels. The storage controller automatic drive failure recovery procedures ensure that absolute data integrity is maintained while operating in degraded mode. Both full and partial (fractional) rebuilds are supported in the storage controller. Rebuilds are done at the RAID level. Partial rebuilds will reduce the time to return the RAID level to full redundancy. The timer will begin when a disk in the RAID level is declared missing. If the disk reappears prior to the expiration of the timer, a fractional rebuild will be done. Otherwise, the disk will be declared failed, replaced by a spare and a full rebuild will begin to return the Storage Pool to full redundancy. The default partial rebuild timer (Disk Timeout) setting is 10 minutes. The controller supports limit between 0 and 240 minutes, but currently only supported value is default configuration. Under heavy write workloads, it is possible that the number of stripes that need to be rebuilt will exceed the systems internal limits prior to the timer expiration. When this happens, a full rebuild will be started automatically instead of waiting for the partial rebuild timeout (see Table 2-1 on page 52).
Table 2-1 Configured and supported RAID arrays Disk drive type RAID level Number of RAID arrays per controller or expansion unit 6 6 6 6 6 RAID configuration Total Drives Raw usable capacity

1TB 7.2K RPM SATA 2TB 7.2K RPM SATA 2 TB 7.2K RPM Nearline SAS 450GB 15k RPM SAS 600GB 15k RPM SAS

RAID 6 RAID 6 RAID 6 RAID 5 RAID 5

8+P+Q 8+P+Q 8+P+Q 8+P+Spare 8+P+Spare

60 60 60 60 60

46 540 265 619 456 93 956 704 567 296 93 956 704 567 296 20 564 303 413 248 27 419 071 217 664

52

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 2-4 shows the layout of drives in a SONAS storage controller.

Figure 2-4 SONAS Storage controller drive layout

2.3.2 SONAS storage expansion unit


The high density disk storage expansion (2851-DE1) unit is a 4U enclosure containing redundant connections, redundant power and cooling modules and 60 hard disk drives. One disk storage expansion unit can be attached to each storage controller. The storage controller and the disk expansion unit support both high performance 15K RPM SAS disk drives and high-capacity 7.2K RPM Nearline SAS or SATA disk drives. Note that all the drives within an enclosure (i.e a drawer of 60 drives) have to be the same type and capacity.

2.4 Connection between components


This section describes the connections between the SONAS components.

2.4.1 Interface node connections


A single interface node has five 1 gigabit Ethernet (GbE) path connections (ports) on the system board, two of the onboard Ethernet ports connect to the internal private management network within the SONAS system for health monitoring and configuration. The other two onboard Ethernet ports are used for connectivity to your external IP network for network file serving capabilities and one is used for connectivity to Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) that enables the user to manage and control the servers easily.

Chapter 2. Hardware architecture

53

Two of the PCIe adapter slots are available for customer use to add more adapters for host IP interface connectivity. Six additional Ethernet connections to the customer TCP/IP data network are possible for each interface node. A 4-port Ethernet adapter card feature can provide four 1 GbE connections. A 2-port Ethernet adapter card feature can provide two 10 GbE connections. You can have zero or one of each feature in a single interface node.

Interface node connectivity


In Table 2-2 you can find physical connectivity for each card in an interface node. The possible configurations of data path connections (ports) are shown in Figure 2-5.
Table 2-2 Possible data port configurations Number of ports in various configurations of a single interface node Interface node on board 1 GbE connectors (always 2 ports are available) Number of Installed features Feature Code 1100, Quad-port 1 GbE Network Interface Card (NIC) 0 0 1 (with 4 ports) 1 (with 4 ports) Feature Code 1101, Dual-port 10 GbE Converged Network Adapter (CNA) 0 (1 with two ports) 0 1 (with 2 ports) Total number of data path connectors

2 2 2 2

2 4 6 8

Figure 2-5 Interface node connectivity

54

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Interface node rear view


In Figure 2-6 you can see the rear view of an interface node.

Figure 2-6 Interface node

1. PCI slot 1 (SONAS single-port 4X DDR InfiniBand HCA) 2. PCI slot 2 (SONAS quad-port 1 GbE or dual-port 10 GbE feature for additional TCP/IP data path connectors) 3. PCI slot 3 (SONAS single-port 4X DDR InfiniBand HCA) 4. PCI slot 4 (SONAS quad-port 1 GbE or dual-port 10 GbE feature for additional TCP/IP data path connectors) 5. Ethernet 2 (SONAS GbE management network connector) 6. Ethernet 1 (SONAS GbE management network connector) 7. Ethernet 4 (TCP/IP data path connector) 8. Ethernet 3 (TCP/IP data path connector) 9. Integrated Management Module (IMM) Integrated Baseboard Management Controller (iBMC) with an Integrated Management Module (IMM) Failover: If you are using only Ethernet ports 3 (point 8) and 4 (point 7) for external network connections to an interface node, then that daughter card is a single point of failure for that one node. In the event of a failure of an entire network card, the interface node with this network card will be taken offline, and the workload running on that interface node will be failed over, by SONAS Software, to another interface node. So failure of a single interface node will not be a significant concern, although very small systems can see a performance impact. As an example, a system with the minimum two interface nodes can see workload on the remaining interface node double if one interface node fails.

Chapter 2. Hardware architecture

55

2.4.2 Storage node connections


Two of the 1 gigabit Ethernet (GbE) connections are to each of the storage controllers that the storage node uses and also has an additional integrated Baseboard Management Controller (iBMC) portthe service maintenance portconnection to one of the GbEs. If only one storage controller is present, only one cable is installed. Two of the four 8-gigabit Fibre Channel (GbFC) connections attach to each storage controller. If only one storage controller is present, only two Fibre Channel cables are present. In Figure 2-7 you can find physical connectivity for each card in a storage node.

Figure 2-7 Storage node connectivity

56

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2.4.3 Management node connections


The management node connects to the two gigabit Ethernet (GbE) switches at the top of the base rack, to the two InfiniBand switches in the base rack. Two of the onboard Ethernet ports connect to the internal private management network within the SONAS system for health monitoring and configuration. The other two onboard Ethernet ports connect to the customer network for GUI and CLI access and also has an additional integrated Baseboard Management Controller (iBMC) portthe service maintenance portconnection to one of the GbE switches in the base rack. Figure 2-8 shows the find physical connectivity for each card in a management node.

Figure 2-8 Management node connectivity

Chapter 2. Hardware architecture

57

2.4.4 Internal POD connectivity


Figure 2-9 shows internal connectivity within a storage pod. Two storage nodes in the storage pod are connected to two unique InfiniBand switches (InfiniBand fabrics) within SONAS rack, they are configured in high-availability pairs. The two storage nodes in the HA pair are directly attached through Fiber Channel links to one or two storage controllers. Disk storage expansion units are directly attached to storage controllers through a 3 Gbps SAS interface. All connections within the storage pod are redundant and configured in dual-fabric configuration, and are installed by an IBM Customer Engineer (CE).

Figure 2-9 Internal storage pod cabling

58

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2.4.5 Data InfiniBand network


Figure 2-10 shows all the internal InfiniBand connections within SONAS cluster. There are two InfiniBand switches within SONAS. All nodes (management, interface, storage) are connected to these switches. All SONAS nodes have a bonded IP address to communicate with each other. It means that in case of link failure traffic will be moved to another available interface. SONAS InfiniBand network carries file system data and low level control and management traffic. Expansion racks can be moved away from each other for the length of InfiniBand cables. Currently the longest cable available is 50m.

Figure 2-10 SONAS internal InfiniBand connections

Chapter 2. Hardware architecture

59

2.4.6 Management Ethernet network


The Management Ethernet network carries SONAS administration traffic such as monitoring and configuration by GUI and CLI. Figure 2-11 shows all the internal Ethernet connections within a SONAS cluster. The blue connections are for low bandwidth management messages and green connections are for Integrated Management Module connections. Internal Ethernet switches are always installed in all racks. All SONAS nodes have a bonded IP address to communicate with each other. It means that in case of link failure traffic will be moved to another available interface.

Figure 2-11 SONAS internal Ethernet connections

2.4.7 Connection to the external customer network


Figure 2-12 shows an illustration of all external Ethernet connections between the SONAS cluster and your network. Interface nodes have a bonded IP address to communicate with external SONAS clients. This means that in case of link failure, traffic will be moved to another available interface. By default, interface nodes work in active-backup configuration for 10 Gb ports and in load balancing for 1 Gb ports. Currently the management node does not offer IP bonding for external administrator connectivity (see 4.3, Bonding on page 150).

60

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 2-12 SONAS external customers Ethernet connections

2.5 SONAS configurations available


In this section we look at the available SONAS configurations.

2.5.1 Rack types: How to choose the correct rack for your solution
A SONAS system can consist of one or more racks, into which the components of the system are installed. A 42U enterprise class rack is available. Note that installation of SONAS components in customer-supplied racks is not permitted. The rack can have two or four power distribution units (PDU) mounted inside of it. The PDUs do not consume any of the racks 42U of space. The first pair of PDUs is mounted in the lower left and lower right sidewalls. The second pair of PDUs is mounted in the upper left and upper right sidewalls. The rack supports either Base PDUs or Intelligent PDUs (iPDUs). The iPDUs can be used with the Active Energy Manager component of IBM Systems Director to monitor the energy consumption of the components in the rack. When installed in the rack, the iPDUs are designed to collect energy usage information about the components in the rack and report the information to the IBM Active Energy Manager over an attached customer-provided local area network (LAN). Using iPDUs and IBM Systems Director Active Energy manager, you can gain a more complete view of energy used with the datacenter. There are three variations of the SONAS rack: Base rack Interface expansion rack Storage expansion rack
Chapter 2. Hardware architecture

61

Base rack
The Scale Out Network Attached Storage (SONAS) system always contains a base rack that contains the management node, InfiniBand switches, a minimum two interface nodes, and a keyboard, video, and mouse (KVM) unit. The capacity of the SONAS system that you order affects the number of racks in your system and the configuration of the base rack. Figure 2-13 shows the three basic SONAS racks.

Figure 2-13 SONAS base rack options

There are three available options of the SONAS base rack: 2851-RXA feature code 9003, 9004 and 9005. Your first base rack will depend on how you are going to scale out the SONAS system in the future.

SONAS base rack feature code 9003


See the left rack in Figure 2-13. This base rack has two gigabit Ethernet (GbE) switches located at the top of the rack, a management node, two smaller 36-port InfiniBand switches and at least two interface nodes. There is no storage with this rack, so you have to add storage in additional storage expansion racks. You can add to the base rack additional interface expansion racks. In this rack you are limited to 2x36 InfiniBand ports, these switches cannot be expanded or exchanged. The rack can be expanded on additional maximum 14 interface nodes.

62

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Rack specifications: The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP management network. The rack must have at least one management node installed. The rack must have a minimum of two interface nodes, with the rest of the interface node bays being expandable options for a total of 16 interface nodes.

SONAS base rack feature code 9004


See the middle rack in Figure 2-13. This is the base rack with at top two gigabit Ethernet (GbE) switches, a management node, two larger 96-port InfiniBand switches and at least two interface nodes. There is no storage with this rack, so you have to add storage in another storage expansion racks. You can add to the rack additional interface expansion racks. The rack can be expanded on additional maximum eight interface nodes. The rack will allow you to scale to maximum scalability, because it has installed large 96-port InfiniBand switches. Rack specifications: The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP management network. The rack must have at least one management node installed. The rack must have a minimum of two interface nodes, with the rest of the interface node bays being expandable options for a total of 10 interface nodes.

SONAS base rack feature code 9005


See the right rack in Figure 2-13. This is the base rack with at top two gigabit Ethernet (GbE) switches, a management node, two smaller 36-port InfiniBand switches, at least two interface nodes and at least one storage pod which consist of two storage nodes and a RAID controller. Rack specifications: The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP management network. The rack must have at least one management node installed. The rack must have a minimum of two interface nodes, with the rest of the interface node bays being expandable options. The rack must have two storage nodes. The rack must have a minimum of one storage controller. disk storage expansion units extending up to a total of two disk storage expansion units and two disk storage controllers.

Chapter 2. Hardware architecture

63

Interface expansion rack


The IBM SONAS interface expansion rack extends the number of interface nodes in an already existing base rack by providing up to 20 additional interface nodes. The total number of interface nodes cannot exceed 30. Mandatory are the two 50-port 10/100/1000 Ethernet switches and at least one interface node per interface expansion rack. Figure 2-14 on page 64 shows Interface expansion rack.

Figure 2-14 Interface Expansion Rack

Rack: The rack must have two 50-port 10/100/1000 Ethernet switches for an internal IP management network. The rack must have a minimum of one interface node, with the rest of the interface node bays being expandable options for a total of 20 interface nodes.

64

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Storage expansion rack


The storage expansion rack extends the storage capacity of an already existing base rack, by adding up to four additional storage nodes, up to four additional storage controllers, and up to four disk storage expansion units. The eight possible disk storage expansion and controller units can hold a maximum of 480 hard-disk drives. Up to 30 Storage Pods can exist in a SONAS system. Figure 2-15 shows the storage expansion rack.

Figure 2-15 Storage expansion rack

Chapter 2. Hardware architecture

65

Rack specifications: The rack must have two 50-port 10/100/1000 Ethernet switches for internal IP management network. The rack must have two Storage Nodes. The rack must have a minimum of one Storage Controller. It can be expanded as follows: Add a disk storage expansion unit #1.2 to the first storage controller #1.1. Add a 2nd storage controller #2.1 to the first storage pod. Add a disk storage expansion unit #2.2 to the second storage controller in the 1st storage pod, if the first storage controller #1.1 also has a disk storage expansion unit. Add the start of a second storage pod, which include two storage nodes (#3 and #4) and another storage controller #3.1 attached to these storage nodes. Add a disk storage expansion unit #3.2 to storage controller #3.1. Add a second Storage controller #4.1 to the second storage pod. Add a disk Storage expansion unit #4.2 to storage controller #4.1, if storage controller #3.1 also has a disk storage expansion unit. Power limitation affects number of SAS drives per rack. At the current time, SONAS does not yet support a 60A service option, which can limit the total amount of HW that can be installed in the expansion rack, 2851-RXB. Because of the power consumption of a Storage Controller (MTM 2851-DR1) fully populated with sixty (60)15K RPM SAS hard disk drives and the power consumption of a disk storage expansion unit (MTM 2851-DE1) fully populated with sixty (60) 15K RPM SAS hard disk drives, a Storage Expansion Rack (2851-RXB) is limited to a combined total of six Storage Controllers and Disk Storage Expansion Unit when they are fully populated with sixty (60) 15K RPM SAS hard disk drives. Tip: This is a known requirement to provide a 60 amp power option. When that option is available in the future, this will provide enough electrical power that you will be able to fully populate a SONAS Storage Expansion rack with all SAS drives.

2.5.2 Drive types: How to choose between various drive options


The lowest hardware storage component of SONAS is a physical disk. These are grouped in sets of 10 disks. Each set of disks consists of a single type of physical disk drive either six 10-packs 1 TB hard-disk drives, six 10-packs of 2 TB drives, or six 10-packs of SAS drives. You cannot mix drive types or sizes within an enclosure. The SONAS storage controller is capable to use RAID 5 with SAS hard-disk drives and RAID 6 with Nearline SAS and SATA hard-disk drives. The entire capacity of a RAID array of 10 disk drives is mapped into a single LUN, and this LUN is mapped to all hosts. Each LUN is presented across a Fibre Channel interface and detected as a multipath device on each storage node. The SONAS code assigns a unique multipath alias to each LUN based on its WWID.

66

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

SAS disk drives require more power than SATA drives, at the current time, each storage expansion rack can hold up to 360 SAS drives or up to 480 Nearline SAS or SATA drives (see Figure 8-4 on page 253)1. Nearline SAS or SATA drives are always configured within a storage controller or a storage expansion unit at the RAID 6 array. There are eight data drives and two parity drives per an array, this means that within a storage controller or a storage expansions unit there are 48 data drives. SAS drives are always configured within the storage controller or storage expansion unit as a RAID 5 array. There are eight data drives, one parity drive and one spare drive per an array, which means that within a storage controller or a storage expansion unit there are 48 data drives. Table 2-3 shows summary of possible configurations. This is preconfigured and cannot be changed.
Table 2-3 Drive types configuration summary Drive type SATA Nearline SAS SAS SAS Drive capacity 1 or 2 TB 2 TB 450 GB 600 GB RAID array RAID 6 RAID 6 RAID 5 RAID 5 Total drives 60 60 60 60 Data drives 48 48 48 48 Parity drives 12 12 6 6 Spare drives 0 0 6 6

Generally speaking, SAS drives have smaller seek time, larger data transfer rate, and higher Mean Time Between Failures (MTBF) than cheaper but higher capacity Nearline SAS or SATA drives. The SONAS internal storage will perform disk scrubbing, as well as isolation of failed disks for diagnosis and attempted repair. The storage can conduct low-level formatting of drives, power-cycle individual drives if they become unresponsive, correct data using checksums on the fly, rewrite corrected data back to the disk, and use smart diagnostics on Nearline SAS or SATA disks to determine if the drives need to be replaced. SONAS supports drives intermix; it is possible to have within the same storage pod, a storage enclosure with high performance SAS disks and another storage enclosure with high capacity Nearline SAS or SATA disks. Non-enterprise class application data or rarely used data can be automatically migrated within SONAS from faster, but smaller and more expensive SAS disks to slower, but larger and cheaper Nearline SAS or SATA disks.

2.5.3 External ports: 1 GbE / 10 GbE


SONAS supports up to 30 interface nodes that connect customer Ethernet network over up to two 10 Gb/s ports or up to six 1 Gb/s ports per interface node with data. SONAS is designed as a parallel grid architecture. Every node is a balanced modular building block, with sufficient main memory and PCI bus capacity to provide full throughput with the adapters configured to that node. Therefore, it is not possible for the 10 Gb/s Ethernet card to overrun the PCI bus of the interface node, because the PCI bus has more capacity than the 10 Gb/s Ethernet card. The InfiniBand switches have sufficient bandwidth capability to handle a fully configured SONAS. It means that customer can choose between 1 Gb/s or 10 Gb/s communication for SONAS storage and SONAS clients, internal SONAS network in any case will not be a bottleneck. The choice must depend on requested speed between SONAS clients and SONAS storage and current network infrastructure. By default all Ethernet 10 Gb ports in interface nodes work in bonding configuration as active-backup interfaces (only one slave in the bond is active) and all 1 Gb ports work in bonding as load balancing configuration, so transfer even with 1 Gb/s adapters can be sufficient in certain cases.
1

When a 60 amp power option is available for the SONAS Storage Expansion rack, this restriction will be lifted.

Chapter 2. Hardware architecture

67

2.6 SONAS with XIV storage overview


IBM offers a specialized version of the IBM Scale Out NAS (SONAS), which uses the IBM XIV Storage System as the storage. This section describes how the specialized SONAS configuration attaches XIV storage, and outlines general considerations and considerations. IBM SONAS with XIV storage is a specialized SONAS configuration, available under special bid only. This offering modifies the SONAS base rack to attach only XIV storage. In this configuration, the SONAS will have one storage pod, with two Storage Nodes, without integrated SONAS storage. To these two storage nodes, you can attach one or two external XIV storage systems. The XIV storage systems provide the necessary usable disk storage for the SONAS file systems. The XIVs can be shared with other Fibre Channel or iSCSI hosts that are attached to the XIVs, provided that all LUNs allocated to SONAS are hard allocated (no thin provisioning). All of the normal functionality of the SONAS system will be available when XIV is used as the back-end storage systems, which includes: Network file serving by NFS, CIFS, FTP, and HTTPS. Quotas Snapshots Tivoli Storage Manager backup and recovery Information Lifecycle Management (ILM) and Hierarchical Space management (HSM) Storage pools: Because no intermix with regular SONAS integrated storage is supported, and because XIV only supports SATA disk, SONAS HSM is limited in the sense that no tiered disk storage pooling is necessary. However, if desired, multiple logical storage pools can be defined for management purposes. SONAS HSM can be used as normal for transparent movement of data out to external storage, such as tape, tape libraries, or data de-duplication devices.

2.6.1 Differences between SONAS with XIV and standard SONAS system
The SONAS with XIV will be similar to a standard SONAS system, with the following exceptions: It will be limited to the SONAS Base Rack (2851-RXA) Configuration #3 (FC 9005), with two 36-ports InfiniBand switches (2851-I36), a management node (2851-SM1), Keyboard-video-mouse (KVM) unit and two 10/100/1000 Level 2 50-port Ethernet switches It can have between two and six SONAS interface nodes (2851-SI1) within the base rack. It will have one pair of storage nodes (2851-SS1) within the base rack The SONAS Software Management GUI and Health Center do not provide monitoring or management functionality with regards to the XIV, or the SAN switch(es) by which it is connected. The requirement for built-in SONAS storage is removed as part of this specialized configuration. It is not supported for this specialized configuration of SONAS to support a mixed storage environment; there cannot be a combination of internal SONAS storage and external XIV storage.

68

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2.6.2 SONAS with XIV configuration overview


The SONAS with XIV offering will allow a SONAS system to be ordered without any SONAS Storage controllers (2851-DR1) and without any Disk Storage expansion units (2851-DE1). In other words, this will be a SONAS storage pod without any integrated SONAS storage. The storage pod will consist of two standard SONAS storage nodes (2851-SS1), ordered and placed into a SONAS rack 2851-RXA. The SONAS with XIV storage pod, connecting to XIV storage, requires the customer to supply: Two external SAN switches currently supported by XIV for attachment, such as (but not limited to) the IBM SAN24B 24-port 8Gbps Fibre Channel (FC) switches (2498-B24) One or two XIV storage systems. Existing XIVs can be used, as long as they meet the minimum required XIV microcode level The diagram in Figure 2-16 shows a high-level representation of a single storage pod connecting to XIV storage.

Figure 2-16 SONAS storage pod attachment to XIV storage

Chapter 2. Hardware architecture

69

2.6.3 SONAS base rack configuration when used with XIV storage
Figure 2-17 shows the maximum configuration of the SONAS Base Rack (2851-RXA) when ordered with specify code #9006 (indicating configuration #3) and the SONAS i-RPQ number #8S1101. Note that to mitigate tipping concerns, the SONAS interface nodes will be moved to the bottom of the rack. Also notice that components that are not part of the SONAS appliance (including SAN switches) cannot be placed in the empty slots.

Figure 2-17 Maximum configuration for SONAS base rack for attaching XIV storage

2.6.4 SONAS with XIV configuration and component considerations


The SONAS with XIV offering supports one or two XIV systems, as well as a pair of XIV supported SAN switches. The XIV systems can be existing XIVs or new XIV systems. This is in addition to the SONAS base rack order shown before in Figure 2-17. When all of these components have been delivered to your site, the IBM Customer Engineer service representative will connect, power up, and perform initial configuration on the SONAS and XIV components. SAN switches must follow the normal setup rules for that SAN switch.

70

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

This specialized SONAS configuration is available for original order plant manufacture only. It is not available as a field MES. The SAN switches must be mounted in a customer supplied rack and cannot be mounted in the SONAS rack. The open slots in the SONAS base rack will be covered with filler panels for aesthetic and airflow reasons. Components that are not part of the SONAS (including SAN switches) cannot be placed in the empty slots. One or two XIV storage systems can be attached. Any model of XIV storage can be used. All available XIV configurations starting from 6 modules are supported. The firmware code level on the XIV must be version 10.2 or higher. Larger SAN switches, such as the SAN40B, can also be used. Any switch on the XIV supported switch list can be used, provided there are sufficient open, active ports with SFPs to support the required connectivity. Connectivity for external block device users sharing the XIVs is beyond the scope of this specialized offering, and must be planned/set up by the end user so as not to interfere with the connectivity requirements for the IBM SONAS and XIV. The following SONAS file system settings have been tested and are intended to be used: 256K Block Size Scatter Block Allocation One failure group if only one XIV is present Two failure groups (supports metadata replication requirement), if two XIVs are present. Metadata replication: if two XIVs are present, the SONAS file system metadata will be replicated across XIV systems. It is supported to add more storage to the XIV system, provision LUNs on that additional storage, have the LUNs recognized by the SONAS system and available to be added to an existing file system or new file system in the SONAS system. It is supported to share the XIV systems between SONAS and other block storage applications, provided that the LUNs allocated to SONAS are hard allocated (no thin provisioning). While the current specialized offering only supports one storage pod with one pair of storage nodes, and only supports one or two XIV systems, this is not an architectural limitation. It is only a testing and support limitation. IBM requires the use of IBM Services to install this specialized configuration. This assures that the proper settings and configuration are done on both the XIV and the SONAS appliance for this offering. The intent of this specialized SONAS configuration offering is to allow existing or aspiring users of the IBM XIV storage system to be able to attach XIV to an IBM SONAS appliance.

Chapter 2. Hardware architecture

71

72

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 3.

Software architecture
This chapter provides a description of the software architecture, operational characteristics, and components of the IBM Scale Out Network Attached Storage (IBM SONAS) software. We review the design and concepts of the SONAS Software licensed program product that operates the SONAS parallel clustered architecture. We present an overview of the SONAS Software functionality stack, the file access protocols, the SONAS Cluster Manager, the parallel file system, central policy engine, scan engine, automatic tiered storage, workload allocation, availability, administration, snapshots, asynchronous replication, and system management services. This is an excellent chapter in which to gain an overview of all of these SONAS Software concepts, and provide the base knowledge for further detailed discussions of these topics in subsequent chapters.

Copyright IBM Corp. 2010. All rights reserved.

73

3.1 SONAS Software


The functionality of IBM SONAS is provided by IBM SONAS Software (5639-SN1). Each node of the IBM SONAS appliance is licensed and pre-installed with one copy of SONAS Software. According to the role of the node (interface node, storage node, or management node), the appropriate functions are called upon out of the common software code load. Each node and each copy of SONAS Software operates together in a parallel grid cluster architecture, working in parallel to provide the functions of IBM SONAS. SONAS Software provides multiple elements and integrated components that work together in a coordinated manner to provide the functions shown in the diagram in Figure 3-1. In this chapter, we give you an overview each of these components.

CIFS

NFS

FTP

HTTPS

future

SONAS Cluster Manager


HSM and ILM Backup & Restore Snapshots and Replication

Parallel File System Policy Engine Scan Engine

Monitoring Agents GUI/CLI mgmt Interfaces Security

Enterprise Linux

IBM Servers

Figure 3-1 SONAS Software functional components

This chapter describes the following SONAS components: SONAS data access layer: the CIFS, NFS, FTP, HTTPS file protocols SONAS Cluster Manager for workload allocation and high availability SONAS authentication and authorization SONAS data repository layer: the parallel clustered file system SONAS data management services: Automated data placement and management: Information Lifecycle Management (ILM) and Hierarchical Storage Management (HSM)

74

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Back up and restore data protection and HSM, using integration with Tivoli Storage Manager as discussed in SONAS and Tivoli Storage Manager integration on page 119 Snapshots for local data resiliency Remote async replication for remote recovery SONAS system management services: GUI, Health Center, CLI, and management interfaces Monitoring agents, security, and access control lists We review the functions of each of the SONAS Software components as shown in Figure 3-1, starting at the top and working our way down.

3.2 SONAS data access layer: File access protocols


We begin by examining the SONAS data access layer, and the file access protocols that are currently supported, as shown in Figure 3-2.
Network Storage Users Ethernet

CIFS

NFS

FTP

HTTPS

future

SONAS Cluster Manager


HSM and and ILM ILM Backup & Restore Snapshots and and Replication Monitoring Agents GUI/CLI mgmt Interfaces Security

Parallel File System Policy Engine

Scan Engine

Enterprise Linux IBM Servers

Figure 3-2 SONAS Software - file access protocols

The network file access protocols that are supported by SONAS today are CIFS, NFS, FTP, and HTTPS. These file access protocols provides the mapping of the client file requests onto the SONAS parallel file system. The file requests are translated from the network file access protocol to the SONAS native file system protocol. The SONAS Cluster Manager is used to provides cross-node and cross-protocol locking services for the file serving functions in CIFS, NFS, FTP, and HTTPS. The CIFS file serving function maps of CIFS semantics and security onto the POSIX-based parallel file system with native NFSv4 Access Control Lists.

Chapter 3. Software architecture

75

Following this section, we then discuss the role the SONAS Cluster Manager plays in concurrent access to a file from multiple platforms (concurrently access a file from both CIFS and NFS, for example). For additional information about creating exports for file sharing protocols you can refer to Creating and managing exports on page 378.

3.2.1 File export protocols: CIFS


SONAS CIFS support has been explicitly tested for data access from clients running Microsoft Windows (2000, XP, Vista 32-bit, Vista 64-bit, Windows 2008 Server), Linux with SMBClient, Mac OS X 10.5, and Windows 7. The base SONAS file system is a full POSIX-compliant file system, a UNIX/Linux-style file system. SONAS communicates with CIFS clients and Microsoft Windows clients by emulating CIFS Windows file system behavior over this POSIX-compliant SONAS file system. For Windows clients, the SONAS system maps UNIX/Linux Access Control Lists (ACLs) to Windows security semantics. A multitude of appropriate file access concurrency and cross-platform mapping functions are done by the SONAS Software, especially in the SONAS Cluster Manager. This support includes the following capabilities to allow Windows users to interact transparently with the SONAS file system: The full CIFS data access and transfer capabilities are supported with normal locking semantics. User authentication is provided through Microsoft Active Directory or through LDAP NTFS Access Control Lists (ACLs) are enforced on files and directories; they can be modified using the standard Windows tools Semi transparent fail over, if the CIFS application supports network retry1 Consistent central Access Control Lists (ACLs) enforcement across all platforms ACLs are enforced on files and directories, and they can be modified (with proper authority and ownership) using the standard Windows tools Supports the win32 share modes for opening and creating files Supports case insensitive file lookup Support for DOS attributes on files and directories Archive bit, ReadOnly bit, System bit, other semantics not requiring POSIX file attributes MS-DOS / 16 bit Windows short file names Supports generation of 8.3 character file names Notification support of changes to file semantics to all clients in session with the file Provides consistent locking across platforms by supporting mandatory locking mechanisms and strict locking Opportunistic locks and Leases are supported, supports lease management for enabling client side caching Off-line or de-staged file support: Windows files that have been de-staged to external tape storage, using the SONAS HSM function through Tivoli Storage Manager, will be displayed as off-line within the Windows Explorer because they are marked with an hourglass symbol, the off-line bit. Users and applications can see in advance that a file is off-line.
1

See 3.3.3, Principles of interface node failover and failback on page 83 for details.

76

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Recall to disk is transparent to the application, so no additional operation beside the file open is needed. Directory browsing using the Windows Explorer supports file property display without the need to recall off-line or migrated files. Support of SONAS Snapshot integrated into the Windows Explorer VSS (Volume Shadow Services) interface, allowing users with proper authority to recall files from SONAS Snapshots. This file version history support is for versions created by SONAS Snapshots. The standard CIFS timestamps are made available: Created Time stamp: The time when the file is created in the current directory. When the file is copied to a new directory, a new value will be set. Modified Time stamp: The time when the file is last modified. When the file is copied to elsewhere, the same value will be carried over to the new directory. Accessed Time stamp: The time when the file is last accessed. This value is set by the application program that sets or revises the value (this is application dependent; unfortunately various applications do not revise this value)

3.2.2 File export protocols: NFS


NFSv2 and NFSv3 are supported by SONAS and any standard NFS client is supported. NFSv4 is not currently supported by SONAS, this is a known requirement. The following characteristics apply to the NFS exports: Supports normal NFS data access functions with NFS consistency guarantees. Supports authorization and ACLs. Supports client machine authorization through NFS host lists. Supports enforcement of Access Control Lists (ACLs). Supports reading and writing of the standard NFSv3 / POSIX bits. Supports the NFSv3 advisory locking mechanism. Semi-transparent node failover (application must support network retry)2. Note that the SONAS Software file system implements NFSv4 Access Control Lists (ACLs) for security, regardless of the actual network storage protocol used. This provides the strength of the NFSv4 ACLs even to clients that access SONAS by the NFSv2, NFSv3, CIFS, FTP, and HTTPS protocols. Do not mount the same NFS share on one client from two SONAS interface nodes because data corruption might occur. Also, do not mount the same export twice on the same client.

3.2.3 File export protocols: FTP


SONAS provides FTP support from any program supporting the FTP protocol. The following characteristics apply: Supports data transfer to and from any standard FTP client. Supports user authentication through Microsoft Active Directory and through LDAP. Supports enforcement of Access Control Lists (ACLs) and retrieval of POSIX attributes. ACLs cannot be modified using FTP as there is no support for the chmod command On node failover, SONAS supports FTP resume for application that support network retry. Characters for file names and directory names are UTF 8 encoded.

See 3.3.3, Principles of interface node failover and failback on page 83 for details.
Chapter 3. Software architecture

77

3.2.4 File export protocols: HTTPS


SONAS supports simple read only data access to files through the HTTPS protocol from any web browser. All web downloads from SONAS are by HTTPS. If you try to connect to SONAS by HTTP, you will be automatically redirected to a HTTPS connection. The reason for this design is security as we have to provide a secure logon mechanism for access authorization.The following features are supported through this protocol: Supports read access to appropriately formatted files. Supports enforcement of Access Control Lists (ACLs). ACLs cannot be modified or viewed using this protocol. Supports user authentication through Microsoft Active Directory or LDAP. On node fail during a file transfer, the transfer is cancelled and must be retried at another node. Partial retrieve is supported, minimizing duplicate transfers in a failover situation. Characters for file names and directory names are UTF 8 encoded. The Apache daemon provides HTTPS access to the SONAS file system. SONAS supports secure access only, so that the credentials will always be SSL encrypted. SONAS uses HTTP aliases as vehicle to emulate the share concept. For example, share XYZ will be accessible by https://server.domain/XYZ. Note that Web-based Distributed Authoring and Versioning (WebDAV) and the Representational State Transfer (REST) API are not supported at this time in SONAS, they are known requirements.

3.2.5 SONAS locks and oplocks


POSIX byte-range locks set by NFS clients are stored in the SONAS file system, and Windows clients accessing the cluster using CIFS honor these POSIX locks. Mapping of CIFS locks to POSIX locks is updated dynamically on each locking change. Unless the applications specifically know how to handle byte-range locks on a file or are architected for multiple concurrent writes, concurrent writes to a single file are not desirable in any operating system. To maintain data integrity, locks are used to guarantee that only one process can write to a file or to a byte range in a file a time. Although traditional operating systems and file systems traditionally locked the entire file, newer ones such as SONAS support the ability for a range of bytes within a file to be locked. Byte-range locking is supported for both CIFS and NFS but this does require the application to know how to exploit this capability. Byte range locks are handled by the SONAS parallel file system. If another process attempts to write to a file, or a section of one, that is already locked, it will receive an error from the operating system and will wait until the lock is released. SONAS Software supports the standard DOS and NT filesystem (deny-mode) locking requests, which allow only one process to write to an entire file on a server at a give time, as well as byte-range locking. In addition, SONAS Software supports the Windows locking known as opportunistic locking or oplock. CIFS byte-range locks set by Windows clients are stored both in the SONAS interface node cluster-wide database, and by mapping to POSIX byte-range locks in the SONAS file system. This mapping ensures that NFS clients see relevant CIFS locks as POSIX advisory locks, and NFS clients honor these locks.

78

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

3.3 SONAS Cluster Manager


Next, we examine the SONAS Cluster Manager, as shown in Figure 3-3. The SONAS cluster manager is a core SONAS component that coordinates and orchestrates SONAS functions.

CIFS

NFS

FTP

HTTPS

future

SONAS Cluster Manager


HSM and ILM Backup & Restore Snapshots and Replication Monitoring Agents GUI/CLI mgmt Interfaces Security

Parallel File System Policy Engine

Scan Engine

Enterprise Linux IBM Servers

Figure 3-3 SONAS Cluster Manager

The SONAS Cluster Manager has the following responsibilities: 1. Coordinates the mapping of the various file sharing protocols onto the SONAS parallel file system. The CIFS file serving function maps of CIFS semantics and security onto the POSIX-based parallel file system and NFSv4 Access Control Lists 2. Provides the clustered implementation and management of the interface nodes, including tracking and distributing record updates across the interface nodes in the cluster. 3. Controls the interface nodes in the cluster. SONAS Cluster Manager controls the public IP addresses used to publish the NAS services, and moves them as necessary between nodes. By monitoring scripts, SONAS Cluster Manager monitors and determines the health state of each individual interface node. If an interface node has problems, such a hardware failures, or software failures such as broken services, network links, or the node becomes unhealthy. In this case, SONAS Cluster Manager will dynamically migrate affected public IP addresses and in-flight workloads to healthy interface nodes, and uses tickle-ack technology with the affected user clients, so that they reestablish connection to their new interface node. 4. Provides the interface to manage cluster IP addresses, add and removes nodes, ban and disable nodes.

Chapter 3. Software architecture

79

5. The SONAS Cluster Manager coordinates advanced functions such as the byte-range locking available in the SONAS parallel file system, it manages the interface nodes and coordinates the multiple file sharing protocols to work in conjunction with the SONAS parallel file system base technology so as to allow concurrent access, parallel read and write access, for multiple protocols and multiple platforms, across multiple SONAS interface nodes. It is the key to guarantee full data integrity to all files, anywhere within the file system. For information about how to administer the SONAS cluster manager, see Cluster management on page 351.

3.3.1 Introduction to the SONAS Cluster Manager


The SONAS global Cluster Manager provides workload allocation and high availability. SONAS provides high availability through a sophisticated implementation of a global cluster of active-active peer nodes. Each SONAS node is a peer to all other SONAS nodes, each SONAS node is in an active-active relationship with all other SONAS nodes, and incoming workload can be evenly distributed among all SONAS nodes of the same type. If a node fails, the SONAS Software will automatically fail over the workload to another healthy node of the same type. We discuss the following topics: The operational characteristics and types of SONAS nodes in the global cluster Clustered node failover/failback for both CIFS and NFS Dynamic insertion/deletion of nodes into the cluster Let us start by reviewing the SONAS architecture. as shown in Figure 3-4.

HTTP HTTP Clients Clients

NFS NFS Clients Clients

CIFS CIFS Clients Clients IP Network

FTP FTP Clients Clients

Other Other Clients Clients

Global Namespace
Management Management Node Node

Interface Interface Node Node

...

Interface Interface Node Node

....

Interface Interface Node Node

IP Mgmt. Network

Infiniband Data Network

Tape

Storage Pod
Storage Storage Node Node Storage Storage Node Node

Storage Pod
Storage Storage Node Node Storage Storage Node Node

Storage controller & disk

Storage controller & disk

...

Storage controller & disk

Storage controller & disk

Storage Expansion

Storage Expansion

Storage Expansion

Storage Expansion

Figure 3-4 SONAS architecture

There are three types of nodes in a SONAS. The nodes are divided and configured according to one of three roles. All nodes are in a global cluster, and a copy of SONAS Software runs on each of the nodes. 80
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

A node performs only one of the three roles: Interface node: Provides the connections to the customer IP network for file serving. These nodes establish and maintain the connections to CIFS, NFS, FTP, or HTTP users, and serve the file requests. All four of these protocols can and do co-exist on the same interface node. Each interface node can talk to any of the storage nodes. Storage node: Acts as a storage servers, and reads and writes data to and from the actual storage controllers and disks. Each storage node can talk to any of the interface nodes. A storage node serves file and data requests from any requesting interface node. SONAS Software writes data in a wide stripe across multiple disk drives in a logical storage pool. If the logical storage pool is configured to span multiple storage nodes and storage pods, the data striping will also span storage nodes and storage pods. Management node: Monitors and manages SONAS global cluster of nodes, and provides Command Line Interface management and GUI interface for administration. Command Line Interface commands come into the SONAS through the management node. Notice that SONAS is a two-tier architecture, because there are multiple clustered interface nodes in the interface tier and multiple clustered storage nodes in the storage tier. This is an important aspect of the design, as it allows independent scalability of interface nodes for user file serving throughput from storage pods and storage nodes for storage capacity and performance. Each SONAS node is an IBM System x commercial enterprise class 2U server, and each node runs a copy of IBM SONAS Software licensed program product (5639-SN1). SONAS Software manages the global cluster of nodes, provides clustered auto-failover, and provides the following functions: The IBM SONAS Software manages and coordinates each of these nodes running in a peer-peer global cluster, sharing workload equitably, striping data, running the central policy engine, performing automated tiered storage. The cluster of SONAS nodes is an all-active clustered design, based upon proven technology derived from the IBM General Parallel File System (GPFS). All interface nodes are active and serving file requests from the network, and passing them to the appropriate storage nodes. Any interface node can talk to any storage node. All storage nodes are active and serving file and data requests from any and all interface nodes. Any storage node can respond to a request from any interface node. SONAS Software will stripe data across disks, storage RAID controllers, and storage pods. SONAS Software also coordinates automatic node failover and failback if necessary. From a maintenance or failover and failback standpoint, any node can be dynamically deleted or inserted into the global cluster. Upgrades or maintenance can be performed by taking a node out of the cluster, upgrading it if necessary, and re-inserting it into the cluster. This is a normal mode of operation for SONAS, and this is the manner in which rolling upgrades of software and firmware are performed. SONAS Software is designed with the understanding that over time, various generations and speeds of System x servers will be used in the global SONAS cluster. SONAS Software understands this and is able to distribute workload equitably, among various speed interface nodes and storage nodes within the cluster.

3.3.2 Principles of SONAS workload allocation to interface nodes


In this section we discuss how workload is allocated and distributed among the multiple interface nodes, and the role played within that by the SONAS Cluster Manager.

Chapter 3. Software architecture

81

In order to cluster SONAS interface nodes so that they can serve the same data, the interface nodes must coordinate their locking and recovery. This coordination is done through the SONAS Cluster Manager. It is the SONAS Cluster Managers role to manage all aspects of the SONAS interface nodes in the cluster. Clusters usually cannot outperform a standalone server to a single client, due to cluster overhead. At the same time, clusters can outperform standalone servers in aggregate throughput to many clients, and clusters can provide superior high availability. SONAS is a hybrid design that provide the best of both of these approaches. From a incoming workload allocation standpoint, SONAS uses the Domain Name Server (DNS) to perform round-robin IP address balancing, to spread workload equitably on an IP address basis across the interface nodes, as shown in Figure 3-5.

SONAS.virtual.com

SONAS.virtual.com

Client I Client II Client n DNS Server


(name resolution)

10.0.0.10 10.0.0.11

10.0.0.12 10.0.0.13

10.0.0.14 10.0.0.15

SONAS.virtual.com
10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15

Figure 3-5 SONAS interface node workload allocation

SONAS allocates a single user network client to a single interface node, to minimize cluster overhead. SONAS Software does not rotate a single clients workload across interface nodes. That is not only unsupported by DNS or CIFS, but will also decrease performance, because caching and read-ahead is done in the SONAS interface node. It is for this reason that any one individual client is going to be assigned, for the duration of their session, to one interface node at the time they authenticate and access the SONAS. At the same time, workload from multiple users that can be numbering into the thousands or more, is equitably spread across as many SONAS interface nodes as are available. If more user network capacity is required, you simply add more interface nodes. SONAS scale out architecture thus provides linear scalability as the numbers of users grow. Agnostically to the application or the interface nodes, SONAS Software will always stripe data across disks, storage RAID controllers, and storage pods, thus providing wide data striping performance and parallelism to any file serving requests, by any interface node. This is shown in Figure 3-6.

82

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Single connection

write file

write file

for ease of attach

Parallelism for high performance agnostic to application

Figure 3-6 SONAS interface node workload allocation - parallelism at storage level

SONAS provides a single high performance NFS, CIFS, FTP, or HTTPS connection for any one individual network clients. In aggregate, multiple users are IP-balanced equitably across all the interface nodes, thus providing scale out capability; the more interface nodes, the more user capacity that is available. SONAS was designed to make the connection a standard CIFS, NFS, FTP, or HTTP connection, in order to allow attachability by as wide a range of standard clients as possible, and to avoid requiring the installation of any client side code.

3.3.3 Principles of interface node failover and failback


In the event that the redundancy within an interface node fails, for example, if there is a fatal error or if the interface node simply needs to be upgraded or maintained, interface nodes can be dynamically removed from and later re-inserted into the SONAS cluster. The normal method of upgrade or repair of a interface node is to take the interface node out of the cluster, as described in Modifying the status of interface nodes and storage nodes on page 352. The SONAS Cluster Manager will manage the failover of the workload to the remaining healthy interface nodes in the SONAS cluster. The offline interface node can then be upgraded or repaired, and then re-inserted into the SONAS cluster, and workload will be automatically rebalanced across the interface nodes in the SONAS. The SONAS Software component that actually performs this function of managing the interface node monitoring, failover, and failback, is the SONAS Cluster Manager. Whenever an interface node is removed from the cluster, or if there is an interface node failure, healthy interface nodes take over the load of the failed node as shown in Figure 3-7. In this case, the SONAS Software Cluster Manager will automatically perform these actions: Terminate old network connections and move the network connections to a healthy interface node. IP addresses are automatically re-allocated to a healthy interface node. Session and state information that was kept in the Cluster Manager is used to support re-establishment of the session and maintaining IP addresses, ports, and so on.

Chapter 3. Software architecture

83

This state and session information and metadata for each user and connection is stored in memory in each node in a high performance clustered design, along with appropriate shared locking and any byte-range locking requests, as well as other information needed to maintain cross-platform coherency between CIFS, NFS, FTP, HTTP users Notification technologies called tickle ack are used to tickle the application and cause it to reset the network connection.

SONAS.virtual.com

SONAS.virtual.com

Client I
10.0.0.13 10.0.0.10 10.0.0.11

Client II
10.0.0.12 10.0.0.14 10.0.0.15

Client n DNS Server


(name resolution)

SONAS.virtual.com
10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15

Figure 3-7 SONAS interface node failover

At the time of the failover of the node, if the session or application is not actively in a connection transferring data, the failover can usually be transparent to the client. If the client is transferring data, depending on the protocol and application, the application service failover might be transparent to the client, depending on nature of the application, and depending on what is occurring at the time of the failover. In particular, if the client application, in response to the SONAS failover and SONAS notifications, automatically does a retry of the network connection, then it is possible that the user will not see an interruption of service. Examples of software that do this include many NFS-based applications, as well as Windows applications that do retries of the network connection, such as the Windows XCOPY utility. If the application does not do automatic network connection retries, or the protocol in question is stateful (that is, CIFS), then a client side reconnection might be necessary to re-establish the session. Unfortunately for most CIFS connections, this will be the likely case. For more information about interface node cluster failover, see Chapter 6, Backup and recovery, availability, and resiliency functions on page 181.

3.3.4 Principles of storage node failover and failback


We previously discussed interface node failover and failback in SONAS. A similar principle is operational for storage node failover and failback. The SONAS Cluster Manager does not directly participate in storage node failover and failback as it is the SONAS parallel file system that manages the storage node failover. 84
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In SONAS, there is the concept of a storage pod as a modular building block of storage, as illustrated in Figure 3-4 on page 80. Each storage pod contains between 60 and 240 disk drives, arranged in groups of 60 drives, and each storage pod contains two active - active storage nodes. The two storage nodes provide resiliency and backup for each other in the storage pod. If a storage node fails, the remaining healthy storage node in the storage pod takes over the load of the failed storage node. An individual storage node is very high in capacity and throughput, to allow good operation in the event of a failed storage node. Furthermore, as we saw in Chapter 1, Introduction to IBM Scale Out Network Attached Storage on page 1, recall that SONAS can be configured to perform storage load balancing by striping data across any of the following components: Disks Storage RAID controllers Storage pods Logical storage pools in SONAS can be defined, and usually are defined, to span disks, storage RAID controllers and storage pods. Furthermore, the data striping means that files are spread in blocksize chunks across these components in order to achieve parallel performance and balanced utilization of the underlying storage hardware. One of the purposes of this dispersion of SONAS data is to mitigate the effect of a failed storage node. Files in SONAS are spread across multiple storage nodes and storage pods, with the intent that only a small portion of any file can be affected by a storage node failure, and only in terms of performance, not of data availability that is maintained. As the SONAS grows larger and scales out to more and more storage nodes, the failure of any one storage node becomes a smaller and smaller percentage of the overall storage node aggregate capacity. The SONAS scale out architecture thus has the effect of reducing the amount of impact of a storage node failure, as the SONAS grows. Just as with interface nodes, storage nodes can be dynamically removed and re-inserted into a cluster. Similar to the interface node methodology, the method of upgrade or repair of a storage node is to take the storage node out of the cluster. The remaining storage node in the storage pod will dynamically assume the workload of the pod. The offline storage node can then be upgraded or repaired, and then re-inserted into the cluster. When this is done, workload will then be automatically rebalanced across the storage nodes in the storage pod. During all of these actions, the file system stays online and available and file access to the users is maintained.

3.3.5 Summary
We have seen that the IBM SONAS provides equitable workload allocation to a global cluster of interface nodes, including high availability through clustered auto-failover. In summary: All SONAS nodes operate in a global cluster. Workload allocation to the interface nodes is done in conjunction with external Domain Name Servers. The global SONAS cluster offers dynamic failover/failback, and if the application supports network connection retries, can provide transparent failover of the interface nodes. Normal upgrade and maintenance for SONAS nodes is by dynamic removal and insertion of nodes into and out of the cluster.

Chapter 3. Software architecture

85

We now proceed to discuss in more detail the SONAS Software components that provide the functionality to support these capabilities.

3.3.6 SONAS Cluster Manager manages multi-platform concurrent file access


One of the primary functions of the SONAS Cluster Manager is to support concurrent access from concurrent users, spread across multiple various network protocols and platforms to many files. SONAS Software also supports, with proper authority, concurrent read and write access to the same file, including byte-range locking. Byte-range locking means that two users can access the same file concurrently, and each user can lock and update a subset of the file, with full integrity among updaters. We see that all file accesses from the users to the SONAS parallel file system will logically traverse the SONAS Cluster Manager, as shown in Figure 3-8. It logically implies that the cluster manager will handle metadata and locking, but will not handle data transfer; in other terms, the cluster manager is not in-band in regard to data transfer.

CIFS

NFS

FTP

HTTP

SONAS Cluster Manager


concurrent_access_to_a_file
SONAS File System
uses NFSv4 Access Control Lists

Figure 3-8 All file accesses traverse the SONAS Cluster Manager including concurrent accesses

The SONAS Cluster Manager is logically positioned in the file access path-length, as it is the SONAS Cluster Manager that provides the mapping of the multiple protocols onto the SONAS parallel file system, simultaneously managing the necessary locking to guarantee data integrity across all the interface nodes. Finally, if necessary, the SONAS Cluster Manager provides the failover and failback capabilities if the interface node experiences an unhealthy or failed state. The SONAS Cluster Manager works together with the SONAS parallel file system to provide concurrent file access from multiple platforms in the following way: SONAS Cluster Manager: Provides the mapping and concurrency control across multiple interface nodes, and when multiple various protocols access the same file, and provides locking across users across the interface nodes SONAS parallel file system: Provides the file system concurrent access control at the level of the physical file management, provides ability to manage and perform parallel access, provides the NFSv4 access control list security, and provides the foundational file system data integrity capabilities 86
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

We discuss the parallel file system in more detail a little later. First, let us explain how the IBM SONAS Cluster Manager provides multiple concurrent interface node file serving with data integrity, across the following network protocols at the same time: CIFS (typically these are Windows users) NFS (typically these are UNIX or Linux users) FTP HTTPS SONAS Software Cluster Manager functionality supports multiple exports and shares of the file system, over multiple interface nodes, by providing distributed lock, share, and lease support. The SONAS Cluster Manager is transparent to the NFS, CIFS, FTP, and HTTPS clients; these clients are unaware of, and do not need to know that the SONAS Cluster Manager is servicing and managing these multiple protocols concurrently. When sharing files and directories, SONAS reflects changes made by one authorized user, to all other users that are sharing the same files and directories. As an example, if a SONAS-resident file is renamed, changed, or deleted, this fact will immediately be properly reflected to all SONAS-attached clients on other platforms, including those using other protocols, as shown in Figure 3-9.

delete files

SONAS User1

SONAS User2

01:00
12 9 6 3

01:01
12 9 6 3

Other users see files are deleted

Figure 3-9 SONAS concurrent access to a shared directory from multiple users

SONAS Software employs sophisticated distributed cluster management, metadata management, and a scalable token management system to provide data consistency while supporting concurrent file access from thousands of users. All read and write locking types are kept completely coherent between NFS and CIFS clients, globally, across the cluster. SONAS Cluster Manager provides the capability to export data from a collection of nodes using CIFS, NFSv2, NFSv3, FTP, and HTTPS.

Chapter 3. Software architecture

87

3.3.7 Distributed metadata manager for concurrent access and locking


In order to assure data consistency, SONAS Software provides both a sophisticated multi-platform interface node locking capability that works in conjunction with a sophisticated token (lock) management capability in the file system that is derived from the IBM General Parallel File System. This capability coordinates a shared-everything global access from any and all interface nodes, to any and all disk storage, assuring the consistency of file system data and metadata when various nodes access the same file. SONAS Software is designed to provide a flexible, multi-platform environment, as shown in Figure 3-10.
Logical

/home /appl /data /web

Windows/home/appl/data/web/writing_reading_the_file.dat Unix-Linux/home/appl/data/web/writing_reading_the_file.dat Any/home/appl/data/web/writing_reading_the_file.dat

IBM Scale Out NAS

CIFS

NFS
SONAS file system provides ability for multiple concurrent readers/writers from multiple platforms
Interface nodes

Global Namespace
Policy Engine
Interface nodes

Interface nodes

..

> ... >

scale out

Storage nodes

..

Storage nodes

Storage nodes

scale out
Physical

Tier 1

Tier 2

Tier 3

Figure 3-10 SONAS provides concurrent access and locking from multiple platforms

SONAS Software has multiple facilities to provide scalability. These include the distributed ability for multiple nodes to act as token managers for a single file system. SONAS Software also provides scalable metadata management by providing for a distributed metadata management architecture, thus allowing all nodes of the cluster to dynamically share in performing file metadata operations while accessing the file system. This distinguishes SONAS from other cluster NAS filer architectures that might have a centralized metadata server handling fixed regions of the file namespace. A centralized metadata server can often become a performance bottleneck for metadata intensive operations and can represent a scalability limitation and single point of failure. SONAS solves this problem by managing metadata at the node which is using the file or in the case of parallel access to the file, at a dynamically selected node which is using the file.

3.3.8 SONAS Cluster Manager components


The SONAS Cluster Manager provides services to the following file serving protocols: NFS file serving CIFS file serving by SONAS CIFS component

88

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Clustered CIFS provided by Clustered Trivial Data Base (CTDB), which clusters the SONAS CIFS component, and monitors interface node services including start, failover, and failback of public IP address FTP daemon HTTPS daemon In SONAS Cluster Manager, the software used includes the following components: SONAS CIFS component, which provides Windows CIFS file serving, including mapping CIFS semantics, userids, security identifiers, NTFS access control lists, and other required CIFS mapping, to the underlying SONAS parallel file system Clustered Trivial Data Base (CTDB), which in combination with SONAS CIFS component provides a fully clustered CIFS capability Working together, the SONAS Cluster Manager and these components provide true multi-protocol, active/active clustering within a single global namespace, spanning multiple interface nodes, all clustered transparently to applications.

SONAS CIFS component and CTDB in the SONAS Cluster Manager


IBM SONAS Cluster Manager uses SONAS CIFS component technology to provide the CIFS file serving capability on individual interface nodes. SONAS Cluster Manager uses the open-source Clustered Trivial Data Base (CTDB) technology to store important cluster and locking information in small databases, each called a Trivial Data Base (TDB). The local Trivial Data Base (TDB) files also contain the messaging, the locking details for files, and the information about open files that are accessed by many clients. Each TDB has metadata information about the POSIX to CIFS semantics mapping and vice versa, across the cluster. CTDB addresses the fact that a SONAS CIFS component process, by itself running on an interface node, does not know about the locking information held by SONAS CIFS component processes running locally on the other interface nodes. CTDB provides the functionality to coordinate the SONAS CIFS component processes that run on various SONAS interface nodes. To have consistency in the data access and writes, CTDB provides the mechanism by which SONAS CIFS component running on each interface node can communicate with each other and effectively share the information for proper locking, with high performance, and to assure data integrity by avoiding shared data corruption. An example of this operation is as follows; Suppose that file1 has been accessed through multiple various nodes by multiple end user clients. These multiple nodes need to know about the locks and know that each of them have accessed the same file. CTDB provides the architecture and the function to provide lightweight, fast, scalable intercommunication between all the nodes in the cluster, to coordinate the necessary cross-node communication, and to intelligently minimize that cross-communication to assure scalability. If any of the nodes want to write to the file, the SONAS Cluster Manager CTDB function assures that proper file integrity is maintained. CTDB performs the services and high performance architecture for individual SONAS interface nodes (regardless of protocol used to access the file) to take ownership and transfer ownership of individual records, as necessary, to assure data integrity.

Chapter 3. Software architecture

89

CTDB assures data integrity by tracking and assuring that only the owning interface node has the most recent copy of the record and that only the proper owning node has the ability to updated the record. When required, CTDB is specifically architected to provide the high performance, lightweight messaging framework and trivial data bases to quickly cross-notify, cross-share, and properly pass ownership among requesting interface nodes, to update records with integrity and high performance

More information about SONAS CIFS component and CTDB


The CTDB is a shared TDB approach to distributing locking state in small, fast access files that are called trivial data bases (TDBs), so called because they are designed to be very lightweight in message size and very fast in speed of access. In this approach, all cluster nodes access the same TDB files. CTDB provides the same types of functions as TDB but in a clustered fashion, providing a TDB-style database that spans multiple physical hosts in a cluster, while preserving the high-speed of access and very lightweight small message size. CTDB technology also provides fundamental SONAS Cluster Manager failover mechanisms to ensure that data integrity is not lost if any interface node goes down while serving data. In summary, the CTDB functionality provides important capabilities for the SONAS Cluster Manager to provide a global namespace virtual file server to all users from any protocol, in which all interface nodes appear as a single virtual file server which encloses all the interface nodes. The CTDB also assures that all the SONAS CIFS components on each interface node are able to talk to each other in a high performance, scalable manner, and update each other about the locking and other information held by the other SONAS CIFS components.

SONAS Cluster Manager summary


The SONAS Cluster Manager provides SONAS interface node clustering through integrating, testing, and providing the enterprise class support for the SONAS Software capabilities. SONAS Cluster Manager provides the multi-protocol, cross-platform locking and control, interface node monitoring, IP address management, and interface node failover and failback. Here we summarize the functions of the SONAS Cluster Manager: It provides ability for all user clients to be able to connect to any interface node. All interface nodes appear to the users as a single large global namespace NAS server. It fully supports and exploits the internal SONAS parallel file filesystem, from which all interface nodes can serve out various files or the same set of files, with parallel high performance. It provides full data integrity across all interface nodes and across all concurrent users and applications, from multiple network storage access protocols Interface nodes can fail and clients are transparently reconnected to another interface node. All file changes are immediately seen on all interface nodes and all other clients accessing the same file. It minimizes the latency and cross-communication required of interface nodes to check for proper file and data integrity. It provides ability to scale in a linear, non-disruptive fashion by simply adding more interface nodes or storage nodes.

90

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Here are specific enhancements that IBM has made in the SONAS Cluster Manager for CIFS: Clustering enhancements: Multiple exports and shares of the same file system over multiple nodes including distributed lock, share and lease support Failover capabilities on the server Integration with NFS, FTP, HTTPS daemons in regard of locking, failover and authorization Performance optimization with the SONAS file system (with GPFS) NTFS Access Control List (ACL) support in SONAS CIFS component using the native GPFS NFSv4 ACL support HSM support within SONAS CIFS component to allow destaging of files to tape and user transparent recall. VSS integration of SONAS Snapshots In the following sections, we examine SONAS authentication and authorization.

3.4 SONAS authentication and authorization


SONAS requires an external service to provide authentication and authorization of client users. Authentication is the process of verifying client user identity and is typically performed by verifying credentials such as user ID and password. Authorization is the process of deciding which resources a user can access, for example, a user can have full control over one directory allowing read, write, create, delete and execute and no access to another directory. SONAS supports the following authentication methods: Microsoft Active Directory: Active Directory itself provides Kerberos Infrastructure) Active Directory with SFU (Services for UNIX, RFC2307 schema) LDAP (Lightweight Directory Access Protocol), including LDAP with MIT Kerberos Samba Primary Domain Controller PDC / NT4 mode Network Information Service (NIS) with NFS NetGroup support only for ID mapping The authentication server is external to SONAS and must have a proper connectivity to SONAS. The authentication server is configured externally to the SONAS, using normal authentication server skills. Note that the external authentication server is an essential component in the SONAS data access flow, if the authentication server is unavailable then the data access is not available. At the current SONAS 1.1.1 release level, a single SONAS system supports only one of the foregoing authentication methods at a time, and in order to access a SONAS, the user must be authenticated using the authentication method that is configured on that particular SONAS system. Only the SONAS interface nodes are configured for authentication by users. SONAS storage nodes are not part of this configuration. In the current SONAS release, the number of groups per user is limited to approximately 1000.

Chapter 3. Software architecture

91

Care must be taken with time synchronization; all of the SONAS nodes must have their time set by a network time protocol (NTP) server, and the same server must synchronize the time for the authentication server such as an Active Directory (AD) server and/or Kerberos KDC server. Note that and Active Directory (AD) domain controller can be used as an NTP time source. To set up a SONAS system, obtain administrative information for the selected authentication server in advance. Examples of the information required are administrative account, password, SSL certificate, Kerberos keytab file. Refer to the Managing authentication server integration chapter in the IBM Scale Out Network Attached Storage Administrators Guide, GA32-0713 for the information required for each authentication protocol. Additional information can be also found in Authentication using AD or LDAP on page 266.

3.4.1 SONAS authentication concepts and flow


To access files, client users must authenticate with SONAS. How authentication operates depends on the file sharing protocol that is used.

CIFS authentication concepts


For the CIFS protocol, user authentication is performed using the challenge-response method where the challenge is asking for the password and the response is the correct password. In the case of CIFS, no password is transferred over the wire and instead only a password hash is send by the client. For this reason, LDAP needs a special schema to store password hashes. With Kerberos, the CIFS client can also authenticate using a valid Kerberos ticket that has been granted by a trusted authority or KDC.

HTTP/FTP/SCP authentication concepts


When using HTTP/FTP/SCP, user authentication is done by transferring the password to the protocol server; in the case of HTTP and SCP, the password is encrypted. Linux Pluggable Authentication Module (PAM) system will forward the authentication request to the configured authentication system. These protocols do not support the use of Kerberos tickets.

NFS authentication concepts


In the case of NFS, authentication is only performed by host name and IP address. There is no user authentication concept. Authorization is based on UNIX user IDs (uids) and group IDs (gids). The NFS client will send the uid/gid of the current user to the NFS Server inside SONAS. To guarantee consistent authorization, you must ensure that the client has the same id mapping, the same uid/gid mappings, as the NFS server in SONAS. How to do this is explained in SONAS authentication methods on page 93. With Kerberos, the CIFS client can also authenticate using a valid Kerberos ticket that has been granted by a trusted authority or KDC.

92

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The SONAS authentication of users occurs according to the diagram shown in Figure 3-11.

Clients
w/o Kerberos

1. User Auth. Request

2. verify Auth. Request

4. Response

3. Response

Authentication Server

SONAS
3. Kerberos Ticket

4. Response

Clients
with Kerberos
1. User Auth. Request 2. Granted Kerberos Ticket

with Kerberos KDC

Figure 3-11 SONAS authentication flow

Clients without Kerberos send (1) a user authentication request to SONAS that (2) sends the authentication request to the external authentication server. The authentication server then (3) sends a response to SONAS and SONAS, then (4) sends the response back to the client. In the case of Kerberos, the client sends (1) a user authentication request directly to the authentication server that also has a Kerberos Distribution Center (KDC). The authentication server then (2) replies with a Kerberos ticket for the client. The client then (3) sends a request to SONAS with the Kerberos ticket that was granted and SONAS then (4) sends the response back to the client. Kerberos tickets have a lease time before expiring, so a client can access SONAS multiple times without requiring re-authentication with the KDC.

3.4.2 SONAS authentication methods


Here we discuss various considerations regarding the SONAS authentication process.

SONAS ID mapping
SONAS Software is designed to support multiple various platforms and protocols all accessing the SONAS concurrently. However, Windows CIFS systems use Security Identifiers (SID) internally to identify users and groups, whereas a UNIX systems uses a 32 bit userid / group id (uid/gid). To make both worlds work together in SONAS and provide full concurrent and consistent access from both platforms, SONAS performs a user mapping between Windows SID and UNIX uid/gid. Because the underlying SONAS data is stored in a POSIX-compliant UNIX and Linux style file system based on IBM GPFS, all Access Control List (ACL) and access information is ultimately controlled using SONAS GPFS uid/gid. that is the standard way of controlling user access in UNIX based environments. Therefore, while accessing SONAS data from UNIX or Linux systems using the NFS protocol, there are no issues because their uid/gid directly maps to the UNIX system uid/gid.

Chapter 3. Software architecture

93

However, when Windows clients access SONAS, the SONAS Software provides the mapping between the Windows user Security identifier (SID) and the internal file system UID to identify users. In SONAS, depending on the type of authentication used, various methods are applied to solve this UID to SID mapping requirement. The SONAS user ID mapping flow is shown in Figure 3-12.
User- or Groupname Microsoft Security ID (SID)

Windows AD
SONAS file system

SONAS file System (GPFS

CIFS
User- / Groupname

uses Linux UID / GID

NFS
UID / GID

Shared Id map db

NFS provides UID / GID at client level only. This means the mapping has to happen on NFS CLIENT level: - Create user with correct IDs manually - Use Microsoft AD Service for Unix (SFU) Interface nodes

SONAS maps Usernames and Groups to Unix User/group IDs consistently across all nodes

Figure 3-12 SONAS authentication - userid mapping

To solve the ID mapping issue, SONAS supports multiple authentication server integrations: LDAP and LDAP with MIT Kerberos Samba primary domain controller (PDC) for Microsoft Windows NT version 4 (NT4) Active Directory Server (ADS itself works as Kerberos), and AD with Microsoft Windows Services for UNIX (SFU) Network Information Service (NIS) as an extension to AD/Samba PDC

SONAS Active Directory authentication


In case of Active Directory (AD), SONAS generates a uid for each SID, using auto-increment logic. This means that if any new user accesses SONAS, SONAS creates a uid at runtime and it is stored in the SONAS Cluster Manager Trivial Data Base (TDB) that we discussed earlier in this chapter. Therefore, when using AD as authentication for SONAS, be aware of this and plan to create the user on the UNIX/Linux machine, which matches the uid created on SONAS.

SONAS Active Directory authentication with SFU


When using AD with AD Services for UNIX (SFU), SONAS uses AD SFU to read the SID to uid mapping. In this case, when users access SONAS, the SONAS Software fetches this SID to uid mapping from SFU. The uid/gid is stored in a dedicated field in the user/group object on the AD server, this requires SFU schema extension or Windows 2003 R2.

94

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

SONAS with LDAP authentication


For SONAS authentication using UNIX/Linux Lightweight Directory Access Protocol (LDAP), the SID to uid mapping is kept in LDAP server itself, uid/gid is stored in a dedicated field in the user/group Object on the LDAP server. So this is typically a very straightforward authentication environment. with little or no issue on ID mapping in LDAP.

SONAS NIS authentication extension for ID mapping


NIS is used in UNIX based environments for centralized management of users and other services. NIS is used for keeping user, domain and netgroup information. Using NIS for user and hostname management ensures that all machines have the same user information. This is useful when using NAS data stores such as SONAS through the NFS protocol. The netgroup construct is used to name a group of client machine IP/hostnames, the netgroup name is then specified while creating NFS exports instead of individuals client machines. NIS is also used for user authentication for various services such as ssh, ftp, and http. NIS: In SONAS we do not support NIS as an authentication mechanism, we use NIS exclusively for netgroup support and ID mapping. SONAS supports the following three modes of NIS configuration: Plain NIS without any authentication, just for netgroup support: In this case only the NFS protocol is supported and the other protocols are disabled. This mode is used only for customer site where customer has only NFS client access without any authentication. Any previous authentication is removed. SONAS uses NIS default domain to resolve netgroup even though we support multiple NIS domains. NIS for netgroup only and Active Directory for authentication and AD Auto increment ID mapping logic: This mode is used only for customers needing netgroup support for NFS clients and other protocols use AD. This is extension to existing AD authentication. All protocols will use AD for authentication and ID Mapping is doing using auto increment logic. SONAS needs to be configured with AD using cfgad and then configured for NIS using cfgnis. NIS with ID mapping as extension to Active Directory and netgroup support: This configuration is used when you have both Windows and UNIX systems and you want to keep a known mapping of UNIX users with Windows users. We need SONAS to be configured with AD using cfgad and then run cfgnis to configure NIS. In this mode NIS becomes an extension to existing AD authentication. All protocols will use AD for authentication and ID Mapping is done using NIS. For user accessing SONAS the SID to uid mapping is done by NIS ID mapping logic with the help of domain map and user map rules. We support Samba PDC also with NIS. In the previous discussion, what is valid for AD is also valid for SAMBA PDC.

Chapter 3. Software architecture

95

3.5 Data repository layer: SONAS file system


In this section, we describe, in more detail, the internal architecture of the SONAS file system, which is based upon the IBM General Parallel File System (IBM GPFS). In the SONAS Software the parallel file system, which includes the central policy engine and the high performance scan engine, is at the heart of SONAS Software functionality as illustrated in Figure 3-13.

CIFS

NFS

FTP

HTTPS

future

SONAS Cluster Manager


HSM and ILM Backup & Restore Snapshots and Replication Monitoring Agents GUI/CLI mgmt Interfaces Security

Parallel File System Policy Engine

Scan Engine

Enterprise Linux IBM Servers

Figure 3-13 SONAS Software - parallel file system, policy engine, scan engine

We discuss core SONAS file system concepts, including the high-performance file system itself, the manner in which the policy engine and scan engine provide the foundation for SONAS Information Lifecycle Management (ILM) (discussed in detail in SONAS data management services on page 107), and characteristics of the SONAS file system for configuration, performance, scalability, and storage management. As mentioned, the SONAS file system is based upon IBM General Parallel File System (GPFS) so, if you are familiar with IBM GPFS, then you will be quite familiar with the concepts discussed in this section. The SONAS file system offers more than a traditional file system; it is the core foundation for an end to end NAS file management infrastructure within SONAS. IBM utilizes IBM GPFS technology to provide a proven high performance parallel grid file system architecture, with high reliability and high scalability. In addition to providing file storage capabilities, the SONAS file system also provides storage management and information life cycle management tools, centralized administration and facilities that, in conjunction with the SONAS Cluster Manager, allows for shared high performance access from multiple NAS protocols simultaneously. IBM SONAS was designed to utilize the IBM GPFS long history as a high performance parallel file system, supporting many types of applications ranging from relational databases, to digital media, to high performance analytics, to scalable file serving.

96

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The core GPFS technology is installed today in across many industries including financial, retail and government applications. GPFS has been tested in very demanding large environments for over 15 years, making GPFS a solid foundation for use within the SONAS as the central parallel file system. For more detailed information about configuring SONAS file systems, see File system management on page 354. We now discuss the SONAS file system in greater detail.

3.5.1 SONAS file system scalability and maximum sizes


The SONAS maximum file system size for standard support is 2 PB. Larger PB file systems are possible by submitting a request to IBM for support. The SONAS file system is based upon IBM GPFS technology, which today runs on many of the worlds largest supercomputers. The largest existing GPFS configurations run into the 10s of thousands of nodes. IBM GPFS has been available on IBM AIX since 1998, and on Linux since 2001. IBM GPFS has been field proven time and again on the world's most powerful supercomputers to provide efficient use of disk bandwidth. It is this technology that is being packaged in a Scale Out NAS form factor, manageable with standard NAS administrator skills. SONAS utilizes the fact that GPFS was designed from the beginning to support extremely large, extremely challenging high performance computing environments. Today, SONAS uses that technology to support building a single global namespace and a single file system over the entire 14.4 PB current maximum size of a physical SONAS system. The theoretical limits to the SONAS file system are shown in Table 3-1. The currently supported maximum SONAS number of files per file system is 231 -1, approximately 2 billion.
Table 3-1 SONAS file system theoretical limits Attribute Maximum SONAS capacity Maximum size of a single shared file system Maximum number of file systems within one cluster Maximum size of a single file Maximum number of files per file system Maximum number of snapshots per file system Maximum number of subdirectories per directory Theoretical limit 134217728 Yobibytes (2107 Bytes) 524288 Yobibytes (299 Bytes) 256 16 Exibytes (264 Bytes) 2.8 quadrillion (248 ) 256 216 (65536)

The SONAS cluster can contain up to 256 mounted file systems. There is no limit placed upon the number of simultaneously opened files within a single file system.

Chapter 3. Software architecture

97

3.5.2 Introduction to SONAS file system parallel clustered architecture


The SONAS file system is built upon a collection of disks which contain the file system data and metadata. A file system can be built from a single disk or contain thousands of disks storing Petabytes of data. SONAS implements its file system upon a grid parallel architecture, in which every 'node' runs a copy of SONAS Software and thus has a copy of the SONAS parallel file system code running on it. SONAS implements a two-tier global cluster, with SONAS interface nodes as the upper tier of network file serving nodes, and with SONAS storage nodes serving as the lower tier. On the interface nodes, the SONAS file system code serves as file system storage requesters, Network Shared Disk (NSD) clients in GPFS terminology. On the storage nodes, the SONAS file system code serves as file system storage servers or NSD servers in GPFS terminology. All SONAS interface, management and storage nodes are a peer-to-peer global cluster. SONAS utilizes the experience from worldwide current GPFS customers who are using single file systems from 10 to 20 PB in size and growing. Other GPFS user file systems containing hundreds of millions of files.

3.5.3 SONAS File system performance and scalability


SONAS file system achieves high performance I/O by using the following techniques: Striping data across multiple disks attached to multiple nodes: All data in the SONAS file system is read and written in wide parallel stripes. The blocksize for the file system determines the size of the block writes. The blocksize is specified at the time the file system is defined, is global across the file system, and cannot be changed after the file system is defined To optimize for small block writes, a SONAS block is also sub-divided into 32 sub-blocks, so that multiple small block application writes can be aggregated and stored in a SONAS file system block, without unnecessarily wasting space in the block Providing a high performance metadata (inode) scan engine, to scan the file system very rapidly in order to enable fast identification of data that needs to managed or migrated in the automated tiered storage environment or replicated to a remote site Supports a large block size, configurable by the SONAS administrator, to fit I/O requirements. Typical blocksizes are the default 256KB which is good for most workloads, especially mixed small random and large sequential workloads For large sequential workloads, the SONAS file system can optionally be defined with blocksizes at 1 MB or 4 MB Utilizes advanced algorithms that improve read-ahead and write-behind file functions for caching in the interface node Uses block level locking based on a very sophisticated scalable token management system to provide data consistency while allowing multiple application nodes concurrent access to the files.

98

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Let us see how SONAS scalability is achieved using the expandability of the SONAS building block approach. Figure 3-14 shows a SONAS interface node, performing a small single read or write on a disk. Because this read or write is small in size, only the resources of one path, one storage node, and one RAID Controller / disk are sufficient to handle the IO operation.

Interface Interface Node Node

InfiniBand network
Storage Pod
Storage Storage Node Node (NSD (NSD Server) Server) Storage Storage Node Node (NSD (NSD Server) Server)

RAID
Controller

RAID
Controller

Raid disk

Figure 3-14 A small single read or write in the SONAS file system

The power of the SONAS file system, however, is in its ability to read or write files in parallel chunks of the defined blocksize, across multiple disks, controllers, and storage nodes inside a storage pod, as shown in Figure 3-15.

Interface Interface Node Node

InfiniBand network
Storage Pod Storage Storage Node Node (NSD (NSD Server) Server) Storage Storage Node Node (NSD (NSD Server) Server)

RAID
Controller

RAID
Controller

RAID
Controller

RAID
Controller

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Raid (NSD)

Figure 3-15 A high parallel read or write in the SONAS file system; this can be one file

Chapter 3. Software architecture

99

The scalability of the SONAS file system does not stop with a single storage pod, In addition to the very large parallel write capability of a single storage pod as shown in Figure 3-16.

Interface Interface Node Node

InfiniBand network
Storage Storage Node Node Storage Storage Node Node

High Density Storage Array High Density Storage Array

Storage Pod

Figure 3-16 SONAS file system parallel read / write capability to one storage pod

If the file is big enough, or if the aggregate workload is big enough, the SONAS file system easily expands to multiple storage pods in parallel as shown in Figure 3-17.

Interface Interface Node Node

InfiniBand network
Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node Storage Storage Node Node

High Density Storage Array High Density Storage Array

High Density Storage Array High Density Storage Array

High Density Storage Array High Density Storage Array

High Density Storage Array High Density Storage Array

Storage Pod

Storage Pod

Storage Pod

Storage Pod

Figure 3-17 SONAS file system parallel read/ write capability to multiple storage pods

We can see that the SONAS file system provides the capability for extremely high parallel performance. This is especially applicable to modern day analytics-intensive data types with the associated large data objects and unstructured data. The SONAS file system recognizes typical access patterns like sequential, reverse sequential and random and optimizes I/O access for these patterns.

Distributed metadata and distributed locking


The SONAS file system also implements the sophisticated GPFS-based token lock management, which coordinates access to shared disks ensuring the consistency of file system data and metadata when various nodes access the same file.

100

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The SONAS file system has implemented a sophisticated distributed metadata server function, in which multiple nodes act, share, acquire, and relinquish roles as token managers for a single file system. This distributed architecture avoids metadata server bottlenecks, and has been proven to scale to very large file systems. Along with distributed token management, the SONAS file system provides scalable metadata management by allowing all nodes of the cluster accessing the file system to perform file metadata operations. This key and unique feature distinguishes SONAS from other cluster file systems which have a centralized metadata server handling fixed regions of the file namespace. The SONAS file system design avoids a centralized metadata server, to avoid problems where there is a performance bottleneck for metadata intensive operations. This also improves availability, as the distributed metadata server function provides additional insulation against a metadata server single points of failure. SONAS implements the GPFS technology that solves this problem by managing metadata at the node which is using the file or in the case of parallel access to the file, at a dynamically selected node which is using the file.

SONAS file system administration


The SONAS file system provides an administration model that is easy to use and consistent with standard Linux file system administration, while providing extensions for the clustering aspects of SONAS. These functions support cluster management and other standard file system administration functions such as user quotas, snapshots and extended access control lists. The SONAS file system provides functions that simplify cluster-wide tasks. A single SONAS command or GUI command can perform a file system function across the entire SONAS file system cluster. The distributed SONAS file system architecture facilitates a rolling upgrade methodology, to allow you to upgrade individual SONAS nodes in the cluster while the file system remains online. The SONAS file system also supports a mix of nodes running at current and new release levels, to enable dynamic SONAS Software upgrades. SONAS file system implements quotas to enable the administrator to control and monitor file system usage by users and groups across the cluster. The SONAS file system provides commands to generate quota reports including user, group and fileset inode and data block usage.

SONAS file system snapshots


In the current release, up to 256 read-only snapshots of an entire GPFS file system can be created to preserve the file system's contents at a single point in time. The SONAS file system implements a space efficient snapshot mechanism that generates a map of the file system at the time it was taken and it is space efficient because it maintains a copy of only the file system data that has been changed since the snapshot was created. This is done using a copy-on-write techniques. The snapshot function allows a backup program, for example, to run while the file system is in use and still obtain a consistent copy of the file system as it was when the snapshot was created. In addition, SONAS Snapshots provide an online backup capability that allows files to be recovered easily from common problems such as accidental deletion of a file. It is a known requirement to increase the snapshot granularity to include filesets, directories, and individual files, and IBM intends to address these requirements in a future SONAS release.

Chapter 3. Software architecture

101

SONAS storage pools


SONAS Storage pools are a collection of storage resources that allow you to group storage LUNs taken from multiple storage subsystems into a single file system. SONAS storage pools allow you to perform complex operations such as moving, mirroring, or deleting files across multiple storage devices, providing storage virtualization and a single management context. Storage pools also provide you with a method to partition file system storage for such considerations as these: Storage optimization by matching the cost of storage to the value of the data Improved performance by: Reducing the contention for premium storage Reducing the impact of slower devices ti critical applications Allowing you to retrieve HSM-archived data when needed Improved reliability by providing for: Granular replication based on need Better failure containment Creation of new storage pools as needed There are two types of storage pools: internal storage pools and external storage pools.

Internal storage pools


Internal storage pools are used for managing online storage resources, SONAS supports a maximum of eight internal storage pools per file system. A minimum of one pool is required and is called the system storage pool. SONAS supports up to seven optional user pools. GPFS assigns file data to internal storage pools under these circumstances: During file creation the storage pool is determined by the file placement policy Attributes of the file, such as file size or access time, match the rules of a policy that directs the file to be migrated to another storage pool

External storage pools


External storage pools are intended for use as near-line storage and archival HSM operations. External storage pools require the use of a external storage management application and SONAS supports Tivoli Storage Manager. The Tivoli Storage Manager external storage manager is responsible for moving files from the SONAS filesystem and returning them upon the request of an application accessing the file system.

SONAS filesets
SONAS also utilizes a file system object called a fileset. A fileset is a directory subtree of a file system namespace that in many respects behaves like an independent file system. Filesets provide a means of partitioning the file system to allow administrative operations at a finer granularity than the entire file system. Filesets allow the following operations: Define quotas on both data blocks and inodes. Can be specified in a policy to control initial data placement, migration, and replication of the files data. SONAS supports a maximum of 1000 filesets per file system.

102

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

High performance scan engine


The most important step in file management operations is processing the file metadata. The SONAS high performance metadata scan interface allows you to efficiently process the metadata for billions of files. After the candidate list of files is identified, data movement operations can be done by multiple nodes in the cluster. SONAS has the ability to spread rule evaluation and data movement responsibilities over multiple nodes in the cluster providing a very scalable, high performance rule processing engine. The SONAS file system implements a high performance scan engine, which can be used to rapidly identify files that need to be managed within the SONAS file system concept of logical tiers storage pools. SONAS file system can transparently perform physical movement of data between pools of logical tiered storage, and can also perform HSM to external storage, using an external Tivoli Storage Manager server.

Access control
The SONAS filesystem uses NFSv4 enhanced access control to allow all SONAS users, regardless of the NAS protocol by which they access the SONAS, to be able to take advantage of this robust level of central security and access control to protect directories and files. SONAS file system implements NFSv4 access control lists (ACLs) in addition to traditional ACL support. SONAS ACLs are based on the POSIX model. Access control lists (ACLs) extend the base permissions, or standard file access modes, of read (r), write (w), and execute (x) beyond the three categories of file owner, file group, and other users, to allow the definition of additional users and user groups. In addition, SONAS introduces a fourth access mode, control (c), which can be used to govern who can manage the ACL itself.

Exporting or sharing the SONAS file system


The SONAS file system is exported so it can be accessed trough NFS, CIFS, FTP or HTTPS to SONAS users through the clustered capability of the SONAS Cluster Manager. The SONAS Cluster Manager function works in conjunction with the SONAS file system to provide clustered NFS, clustered CIFS, clustered FTP, and clustered HTTPS. With the SONAS Cluster Manager, SONAS provides a super-scalable, high performance file system capability with simultaneous access to a common set of data from multiple interface nodes, The SONAS file system works in conjunction with the rest of the SONAS Software to provide a comprehensive, integrated set of storage management tools including monitoring of file services, load balancing and IP address fail over.

File system high availability


The SONAS clustered file system architecture provides high availability, parallel cluster fault tolerance. The SONAS file system provides for continuous access to data, even if cluster nodes or storage systems fail. This is accomplished though robust clustering features together with internal or external data replication. The SONAS file system continuously monitors the health of the file system components. If failures are detected, appropriate recovery action is taken automatically. Extensive logging and recovery capabilities are provided which maintain metadata consistency when nodes holding locks or performing services fail. Internal data replication, SONAS Software RAID-1 mirroring, can optionally be configured to provide further protection over and above the SONAS hardware storage redundancy and RAID. In addition, the SONAS file system automatically self-replicates and internally mirrors file system journal logs and metadata to assure hot failover and redundancy and continuous operation of the rest of the file system, even if all paths to a disk or storage pod fail.
Chapter 3. Software architecture

103

SONAS file system Information Lifecycle Management (ILM)


The SONAS file system provides the foundation for the Data Management services that we discuss in SONAS data management services on page 107. The SONAS file system is designed to help achieve data lifecycle management efficiencies through policy-driven automation and tiered storage management.

Logical storage pools


The SONAS file system implements the concept of logical storage pools, filesets and user-defined policies to provide the ability to better match the cost of your storage to the value of your data. SONAS logical storage pools allow you to create groups of disks within a file system. Using logical storage pools, you can create tiers of storage by grouping disks based on performance, locality or reliability characteristics. For example, one pool can be high performance SAS disks and another more economical Nearline SAS storage. These types of internal logical storage pools are the constructs within which all of the data management is done within SONAS. In addition to internal storage pools, SONAS supports external storage pools, by an external Tivoli Storage Management HSM server. Standard, commonly available Tivoli Storage Manager skills and servers are used to provide this HSM function, and especially helps those who are already using Tivoli Storage Manager to further utilize their Tivoli Storage Manager investment. When moving data to an external pool, SONAS file system handles all the metadata processing through the SONAS high performance scan engine, and then hands a list of the data to be moved to the Tivoli Storage Manager Server for backup, restore, or HSM to external storage on any of the Tivoli Storage Manager supported external storage devices, including external disk storage, de-duplication devices, VTLs, or tape libraries for example. Data can be retrieved from the external HSM storage pools on demand, as a result of an application opening a file.

Fileset
SONAS file system provides the concept of a fileset, which is a sub-tree of the file system namespace and provides a way to partition the global namespace into smaller, more manageable units. Filesets provide an administrative boundary that can be used to set quotas and be specified in a user defined policy to control initial data placement or data migration. Data in a single fileset can reside in one or more storage pools. Where the file data resides and how it is migrated is based on a set of SONAS file system rules in a user defined policy.

User defined policies


There are two types of user defined policies in SONAS: file placement and file management. File placement policies determine which storage pool file data is initially placed in. File placement rules are determined by attributes known when a file is created such as file name, user, group or the fileset. An example might include place all files that end in .avi onto the platinum storage pool, place all files created by the CEO on the gold storage pool, or place all files in the fileset development in the bronze pool. After files exist in a file system, SONAS file management policies allow you to move, change the replication status or delete files. You can use file management policies to move data from one pool to another without changing the files location in the directory structure.

104

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Filesets can be used to change the replication (mirroring) status at the file level, allowing fine grained control over the space used for data availability. You can use a policy that says:

replicate all files in /database/payroll which have the extension *.dat and are greater than 1 MB in size to storage pool #2.
In addition, file management policies allow you to prune the file system, deleting files as defined by policy rules. File management policies can use more attributes of a file than placement policies because after a file exists, there is more known about the file. In addition to the file placement attributes you can now utilize attributes such as last access time, size of the file or a combination of user and file size. This can result in policies such as: delete all files with a name ending in .temp that have not been accessed in 30 days, move all files that are larger than 2 GB to pool2, or migrate all files owned by Sally that are larger than 4 GB to the Nearline SAS storage pool. Rules can include attributes related to a pool instead of a single file using the threshold option. Using thresholds you can create a rule that moves files out of the high performance pool if it is more than 80% full, for example. The threshold option comes with the ability to set high low and pre-migrate thresholds. This means that GPFS begins migrating data at the high threshold, until the low threshold is reached. If a pre-migrate threshold is set GPFS continues to copy data to Tivoli Storage Manager until the pre-migrate threshold is reached. This allows the data to continue to be accessed in the original pool until it is quickly deleted to free up space the next time the high threshold is reached. SONAS file system policy rule syntax is based on the SQL 92 syntax standard and supports multiple complex statements in a single rule enabling powerful policies. Multiple levels of rules can be applied because the complete policy rule set is evaluated for each file when the policy engine executes. All of these data management functions are described in more detail in SONAS data management services on page 107.

SONAS cluster two-tier configuration


The SONAS file system is built in a two-tiered architecture, wherein the interface nodes are not directly attached to the storage. In this configuration, SONAS file makes use of the GPFS-based network block device capability. SONAS uses the GPFS-provided block level interface, called the Network Shared Disk (NSD) protocol, which operates over the internal SONAS InfiniBand network. In this configuration, the interface nodes are 'GPFS Network Shared Disk (NSD) clients', in that they make GPFS NSD storage read and write requests over the internal InfiniBand network to the storage nodes, which are 'GPFS Network Shared Disk (NSD) servers'. The internal SONAS file system thus transparently handles I/O requests between the interface nodes and the storage nodes. SONAS clusters use this GPFS-based Network Shared Disk (NSD) protocol to provide high speed data access from the interface nodes to the storage nodes. Storage is direct attached to the storage nodes (the GPFS NSD storage servers). Each storage node (NSD server) provides storage serving to its own particular section of the overall SONAS file system disk collection. Note that every SONAS storage pod has two storage nodes (two GPFS NSD servers) that provide dual paths to serve each disk, thus avoiding single points of failure in the disk hardware. The internal SONAS file system cluster uses the internal InfiniBand network for the transfer of both file system control information between all nodes, as well as for all data transfer between the interface nodes (GPFS NSD clients) and the storage nodes (GPFS NSD servers).

Chapter 3. Software architecture

105

The SONAS file system internal architecture is shown in Figure 3-18.

Interface Interface Node Node NSD NSD client client


SONAS Storage nodes are NSD servers
NSD NSD server server Storage Storage Node Node NSD NSD server server Storage Storage Node Node

Interface Interface Node Node NSD NSD client client

Interface Interface Node Node NSD NSD client client


SONAS Interface nodes are NSD clients
NSD NSD server server Storage Storage Node Node NSD NSD server server Storage Storage Node Node

InfiniBand network interconnect


NSD NSD server server Storage Storage Node Node NSD NSD server server Storage Storage Node Node NSD NSD server server Storage Storage Node Node NSD NSD server server Storage Storage Node Node

High Density Storage Array High Density Storage Array

High Density Storage Array High Density Storage Array

High Density Storage Array High Density Storage Array

High Density Storage Array High Density Storage Array

Storage Pod

Storage Pod

Storage Pod

Storage Pod

Figure 3-18 SONAS file system two-tier architecture with internal GPFS NSD clients and NSD servers

As shown previously, the fact that the disks are remote to the interface nodes, is transparent to the interface nodes themselves, to the users. The storage nodes, NSD server nodes, are responsible for the serving of disk data blocks across the internal InfiniBand network. The SONAS file system is thus composed of storage pod 'building blocks' for storage, in which a balanced number of storage nodes, NSD storage servers, are preconfigured within the SONAS storage pod, to provide optimal performance from the disk storage. The SONAS cluster runs on enterprise class commercial Intel-based servers - and these run on a Linux-based kernel. he SONAS file system nodes use a native InfiniBand protocol built on Remote Memory Direct Access (RDMA) technology to transfer data directly between the interface node NSD client memory and the storage node NSD server memory thus exploiting the 20 Gbit/sec per port data transfer rate of the current SONAS internal InfiniBand switches, maximizing throughput, and minimizing node CPU utilization.

SONAS file system summary


The SONAS file system is at the heart of the SONAS Software stack. Based on IBM GPFS, SONAS file is highly scalable: Symmetric, scalable software architecture Distributed metadata management Allows for incremental scaling of system in terms of nodes and disk space with ease Based on GPFS technology which today runs 10s of thousands of nodes in a single cluster The SONAS file system is a high performance file system: Large and tunable block size support with wide striping across nodes and disks Parallel access to files from multiple nodes Supports byte-range locking and distributed token locking management Efficient deep prefetching: read ahead, write behind Recognize access patterns with adaptable mechanisms Highly multi threaded

106

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The SONAS file system is highly available and fault tolerant: Data protection mechanisms include journaling, replication, mirroring. Internal peer-to-peer global cluster heartbeat mechanism allows recovery from multiple disk, node, and connectivity failures. Recovery software mechanisms are implemented in all layers. Let us now examine the SONAS data management services in more detail.

3.6 SONAS data management services


We now turn our attention to describing the operational concepts of SONAS data management that uses the central policy engine to automatically place and move files on tiered storage using the integrated HSM and ILM capabilities. The SONAS Software functions that we examine in this section, as shown in Figure 3-19. These services are supplied by the policy and scan engines in the parallel file system and the data movement, copy and replication functions such as HSM and ILM, backup, and replication.

CIFS

NFS

FTP

HTTPS

future

SONAS Cluster Manager


HSM and ILM Backup & Restore Snapshots and Replication Monitoring Agents GUI/CLI mgmt Interfaces Security

Parallel File System Policy Engine

Scan Engine

Enterprise Linux IBM Servers

Figure 3-19 SONAS Software data management services components

We also discuss the role and usage of Tivoli Storage Manager together with external Tivoli Storage Manager servers to provide accelerated backup and restore, and tor provide HSM to external storage pools. Finally, we describe local data resiliency using Snapshots, and remote resiliency using asynchronous replication.

3.6.1 SONAS: Using the central policy engine and automatic tiered storage
SONAS uses policies to control the lifecycle of files that it manages and consequently control the costs of storing data by automatically aligning data to the appropriate storage tier based on the policy rules setup in by the SONAS administrator.

Chapter 3. Software architecture

107

Figure 3-20 illustrates a tiered storage environment that contains multiple storage tiers. Each tier has specific performance characteristics and associated costs, for example, poolfast contains fast and expensive disk, whereas pooltape contains relatively inexpensive tapes.

Figure 3-20 Policy-based storage tiering

Performance comes at a price, and is the main cost differentiator in storage acquisitions. For this reason setting policies can help control costs by using the appropriate storage tier for a specific sets of data and making room on the more expensive tiers for new data with higher performance requirements. The SONAS policy implementation is based on and uses the GPFS policy implementation. File reside in SONAS storage pools and policies are assigned to files and control the placement and movement of files between storage pools. A SONAS policy consists in a collection of rules and the rules control what actions are executed and against what files the actions are performed. So the smallest entity controlled by a rule is a file. SONAS policy rules are single statements that define an operation such as migrate and replicate a file. SONAS has three types of policies: Initial file placement These rules control the placement of newly created files in a specific storage pool. File management These rules control movement of existing files between storage pools and the deletion of old files. Migration policies are used to transfer data between the SONAS storage pools and to the external HSM storage pool and to control replication of SONAS data. These rules control what happens when data gets restored to a SONAS file system.

Restore of file data

Policy rules are SQL-like statement that specify conditions that, when true, cause the rule to be applied. Conditions that cause GPFS to apply a rule include these: Date and time when the rule is evaluated, that is, the current date and time Date and time when the file was last accessed Date and time when the file was last modified Fileset name File name or extension File size User ID and group ID

108

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

SONAS evaluates policy rules in order, from first to last, as they appear in the policy. The first rule that matches determines what is to be done with that file. Example 3-1 shows sample rule syntax.
Example 3-1 Sample rule syntax

RULE 'mig1' MIGRATE FROM POOL 'pool_1' THRESHOLD(90,80,70) WEIGHT(KB_ALLOCATED) TO POOL 'pool_2' RULE 'del1' DELETE FROM POOL 'pool_1' WHERE (DAYS(CURRENT_TIMESTAMP) DAYS(ACCESS_TIME) > 30) AND lower(NAME) LIKE '%.tmp' Each SONAS filesystem is mapped to storage pools. The default pool for a filesystem is the system pool also called pool1. A file system can have one or more additional storage pools after the system pool. Each storage pool is associated with one or more NSDs or LUNs. SONAS also manages

external storage pools. An external storage pool is not mapped to standard NSD devices, it is
a mechanism for SONAS to store data in an external manager such as Tivoli Storage Manager. SONAS interfaces with the external manager using a standard protocol called Data Management API (DMAPI) that is implemented in the SONAS GPFS filesystem. Policies control the location of files among storage pools in the same filesystem. Figure 3-21 shows a conceptual representation of a filesystem, pools, and NSDs:

Figure 3-21 SONAs filesystem and policies

Chapter 3. Software architecture

109

A filesystem is managed by one active policy, policy1 in the example. The initial file placement policies control the placement of new files. File placement policies are evaluated and applied at file creation time. If placement policies are not defined all new files are placed in the system storage pool. Migration and deletion rules, or file management rules, control the movement of files between SONAS disk storage pools and external storage pools like Tivoli Storage Manager HSM and the deletion of old files. Migration and deletion rules can be scheduled using the cron scheduler. File migration between pools can also be controlled by specifying thresholds. Figure 3-22 shows a conceptual representation of these rules.

Figure 3-22 File placement and migration rules

SONAS introduces the concept of tiered and peered storage pools: Tiered Pools The pools that NSDs are assigned to can be tiered in a hierarchy using GPFS file management policies. These hierarchies are typically used to transfer data between a fast pool and a slower pool (Pool1 Pool2) using migration. When coupled with HSM, data flows in a hierarchy from Pool1 Pool2 Pool3 (HSM). The pools that NSDs are assigned to can be operated as peers in a hierarchy using GPFS initial file placement policies. These policies allow files to be placed according to rules in either the fast pool Pool1or the slower pool Pool2. When coupled with HSM data flows to either Pool1 or Pool2 pool based on initial file placement policies, then from both Pool1 and Pool2 pools the data flows to Pool3 (HSM) based on file management policies.

Peered Pools

To simplify implementation of HSM and storage pooling, SONAS provides templates for various standard usage cases. Customized cases can be created from the default templates by using the SONAS CLI. The standard usage cases, also called ILM profiles, are shown in the diagram in Figure 3-23.

110

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

new file

pool1

Default pool: All NSDs in the same pool

new file

pool1

Peered pools: Placement policies only pool2 Tiered pools: Files placed in pool1 and then moved to pool2 Default pool and HSM: Files placed in pool1 then moved to TSM HSM pool3 Peered pools and HSM: Placement policies for pool1,2 and migration from pool1,2 to pool3 Tiered pools and HSM: Files placed in pool1, then migrated to pool2 and then to TSM HSM pool3

new file

pool1

pool2

new file

pool1

pool3

new file

pool1

pool2

pool3

new file

pool1

pool2

pool3

Figure 3-23 Standard ILM policy profiles

The standard ILM policy profiles are based on the assumption that pool1 is the fastest pool using the fastest storage devices such as SAS disks and pool2 is based on less expensive disk such as Nearline SAS. SONAS GPFS metadata must always reside in the fastest storage pool, pool1 in our examples as it is the data that has the highest IO requirements when SONAS GPFS file system scan operations are performed. For additional information about configuration of SONAS policy rules, see SONAS policies on page 159.

3.6.2 Using and configuring Tivoli Storage Manager HSM with SONAS basics
The use of SONAS HSM provides the following advantages: It frees administrators and users from manual file system pruning tasks, and defers the need to purchase additional disk storage. It allows Tivoli Storage Manager HSM to extend the SONAS disk space and automates the movement of seldom-used files to and from external near line storage. It allows pre-migration, a method that sends a copy of the file to be migrated to the Tivoli Storage Manager server prior to migration, allowing threshold migration to quickly provide space by simply stubbing the premigrated files. To use the Tivoli Storage Manager HSM client, you must provide a Tivoli Storage Manager server external to the SONAS system, and the server is accessed through the Ethernet connections on the interface nodes. See SONAS and Tivoli Storage Manager integration on page 119 for more information about the configuration requirements and connection of a SONAS and Tivoli Storage Manager server. The current version of SONAS requires that HSM be configured and managed using the CLI as at the time of writing GUI support is not present for HSM. HSM migration work can cause

Chapter 3. Software architecture

111

additional overhead on the SONAS interface nodes, especially in environments that regularly create large amounts of data and want to migrate it early, so take care when planning the timing and frequency of migration jobs. When using HSM space management on a filesystem, each file in the filesystem can be in one of three states:

Resident when the file resides on disk in the SONAS appliance Premigrated when the file resides both on the disk in the SONAS and in Tivoli Storage
Manager HSM

Migrated when the file resides only in Tivoli Storage Manager


Files are created and modified on the SONAS filesystem and when they are physically present in the filesystem they are said to be in the resident state. Files in an HSM managed filesystem can be migrated to Tivoli Storage Manager HSM storage for a variety of reasons, such as when a predefined file system utilization threshold is exceeded. Migrated files are copied to Tivoli Storage Manager and replaced by a stub file that has a preset size. Using a stub file leaves a specified amount of file data at the front of the file on the SONAS disk, allowing it to be read without triggering a recall. In a SONAS GPFS environment, a small file that is less than the 1/32 of the filesystem blocksize, or one subblock, can become larger after an HSM migration because SONAS GPFS adds meta information to the file during the migration. Because another block on the file system is allocated for the meta information, this increases the space allocated for the file. If a file system is filled to its maximum capacity with many small files, it is possible that the file system can run out of space during the file migration. A recall is triggered when the first byte of storage not on the SONAS disk is accessed. When a migrated file is accessed, it is recalled from the external Tivoli Storage Manager storage into the SONAS storage. If you have files with headers that will be periodically accessed and do not want to trigger recalls on those header accesses, use the appropriate stub file size to ensure that an appropriate amount of file header stays on the SONAS disk. Stub file size: At the time of writing, SONAS only supports a stub file size of zero, so migrated files will be recalled as soon as the first byte of the file is accessed. Be careful with the kind of utilities you run on HSM enabled filesystems. As data is accessed by CIFS or NFS, when a migrated file is opened and a byte of data that is not in the SONAS cache is accessed, that access triggers a Data Management API (DMAPI) event in the SONAS. That event is sent to the primary Tivoli Storage Manager client, that resides on one of the interface nodes, and it triggers a recall. If the primary Tivoli Storage Manager client is not overloaded, it issues the recall itself, otherwise it sends the recall to another Tivoli Storage Manager client node. In practice most recalls will be performed by the primary Tivoli Storage Manager client interface node. Because a recall from physical tape requires waiting for cartridge fetching, tape drive loading and tape movement to the desired file, physical tape recalls can take significant numbers of seconds to start, so the application needs to plan for this delay. The Tivoli Storage Manager requirements for HSM are as follows: You must supply a Tivoli Storage Manager server that can be accessed by the SONAS interface nodes.

112

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

You must ensure that sufficient network bandwidth and connectivity exists between the interface nodes they select to run HSM on to the external storage server they are providing. The Tivoli Storage Manager server has to be prepared for use by the SONAS system. A Tivoli Storage Manager storage pool using the to store migrated data must be set up. Server time needs to be synchronized with the SONAS system, and both systems must access the same NTP server. Set Tivoli Storage Manager server authentication to ON (set auth on). HSM can be added to a filesystem at the time of filesystem creation or at a later time. Attention: HSM cannot be removed from a file system through CLI commands; services need to be engaged. The diagram in Figure 3-24 shows the steps that need to be performed to add HSM to a SONAS filesystem using the SONAS CLI.

mkfs/chfs create/change fs

startbackup verify TSM connect

cfghsmnode create TSM parms

mkpolicy create policy

cfghsmfs connect fs to HSM

mkpolicytask schedule policy

setpolicy apply policy to fs

Figure 3-24 Steps for adding HSM to a filesystem

The mkfs and chfs commands are used to create a new filesystem or modify a filesystem for HSM usage, as these commands allow you to add multiple NSDs and storage pools to the filesystem. The cfghsmnode command is used to validate the connection to Tivoli Storage Manager and sets up HSM parameters. The startbackup command can optionally be used to verify the Tivoli Storage Manager connection for a specific filesystem. If startbackup executes correctly, you know you have a valid connection to Tivoli Storage Manager for use by HSM. The cfghsmfs command adds HSM support for a given filesystem, it enables SONAS CIFS component HSM support and stores HSM configuration information to the CTDB registry. You then create a policy with the mkpolicy command and set the policy for a filesystem with the setpolicy command. For more information about creating and managing policies, see Figure 10-149, Call Home test on page 434.

Chapter 3. Software architecture

113

After creation of the policy, you can schedule the policy execution with the SONAS scheduler by using the mkpolicyrule command. SONAS HSM also provides the lshsmlog command to view HSM errors, as well as the lshsmstatus command to verify HSM execution status.

3.7 SONAS resiliency using Snapshots


In this section, we overview how SONAS Software implements space-efficient Snapshots. Snapshots are a standard, included feature of the SONAS Software and do not require any additional licensing. SONAS Snapshot enables online backups to be maintained, providing near instantaneousness access to previous versions of data without requiring complete, separate copies or resorting to offline backups. SONAS Snapshots can be scheduled or performed by authorized users or by the SONAS administrator, with the capability of up to 256 active Snapshots, per file system, at any one time. SONAS Snapshot technology makes efficient use of storage by storing only block-level changes between each successive Snapshot. Only the changes made to the original file system consume additional physical storage, thus minimizing physical space requirements and maximizing recoverability.

3.7.1 SONAS Snapshots


At the current release level, a SONAS Snapshot is a read-only, point-in-time consistent version of a entire SONAS file system, frozen at a point in time: Each SONAS file system can maintain up to 256 Snapshots concurrently. Snapshots only consume space when the file system changes. Snapshots uses no additional disk space when first taken. Snapshots are enforced to be consistent across the file system to a single point in time. Snapshots can be taken manually or automatically on a schedule. For CIFS users, SONAS Snapshots are readily accessible by Microsoft Volume Shadow Services (VSS) integration into the Windows Explorer interface.

114

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Snapshots can be made by administrators with proper authority through the SONAS Management GUI, or through the SONAS Command Line Interface (CLI). The SnapShot appears as a special directory called .snapshots and located in the filesystem root directory, as shown in Figure 3-25.

Filesystem Before Snapshot

/fs1/file1 /fs1/file2 /fs1/subdir1/file3 /fs1/subdir1/file4 /fs1/subdir2/file5

Read-only copy of directory structure and files

Filesystem After Snapshot

/fs1/file1 /fs1/file2 /fs1/subdir1/file3 /fs1/subdir1/file4 /fs1/subdir2/file5 /fs1/.snapshots/snap1/file1 /fs1/.snapshots/snap1/file2 /fs1/.snapshots/snap1/subdir1/file3 /fs1/.snapshots/snap1/subdir1/file4 /fs1/.snapshots/snap1/subdir2/file5

Only changes to the original file consume disk space

Figure 3-25 SONAS Snapshot appears as a special directory in the file system

Snapshots of a SONAS file system are read-only; changes are made only to the active (that is, normal, non-snapshot) files and directories. Snapshots are only made of active file systems, you cannot make a Snapshot of an existing snapshot. Individual files, groups of files, or entire directories can be restored or copied back from Snapshots. For additional information about configuring snapshots, see Snapshots on page 193.

Chapter 3. Software architecture

115

3.7.2 Integration with Windows


SONAS Snapshot supports the Microsoft Volume Shadow copy Services (VSS) function to allow display of older file and folder versions, from within the Windows Explorer. Snapshots are exported to Windows CIFS clients by the Volume Shadow copy Service (VSS) API. This means that SONAS Snapshot data can be accessed and copied back, through the previous versions dialog in the Microsoft Windows Explorer. Figure 3-26 shows the previous versions dialog.

Use these buttons copy or restore the snapshot

Figure 3-26 SONAS Snapshots are accessible for Windows CIFS users by Windows Explorer

SONAS Snapshots are intended as a point in time copy of an entire SONAS file system, and preserves the contents of the file system at a single point in time. The snapshot function allows a backup or mirror program to run concurrently with user updates and still obtain a consistent copy of the file system as of the time that the snapshot was created. SONAS Snapshots also provide an online backup capability that allows easy recovery from common problems such as accidental deletion of a file, and comparison with older versions of a file.

3.8 SONAS resiliency using asynchronous replication


In this section, we overview how SONAS asynchronous replication is designed to provide a bandwidth-friendly mechanism that is tolerant of telecommunication bandwidth shortages. This implementation is space efficient, transferring only the changed blocks of a file, not the whole file again. Resource efficiency and high performance is achieved by using multiple interface nodes in parallel, to transfer the data. SONAS asynchronous replication can also be useful for the idea of backup-less backup disaster recovery, in other words, using direct disk to disk incremental change replication to a disaster recovery remote site. This is particularly important when the raw amount of data for backup/restore for large amounts of storage, is so large that a tape restore at a disaster recovery site might be unfeasible from a time-to-restore standpoint.

116

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Next, we discuss the SONAS asynchronous replication capability that is designed to address these requirements. At a high level, SONAS asynchronous replication works as follows: 1. The first step is to execute a central policy engine scan for async replication. The SONAS high performance scan engine is used for this scan. As part of the asynchronous replication, an internal snapshot will be made of both the source file system and the target file system. The first step is shown in Figure 3-27.

Async replication begins by executing a policy


IBM Scale Out NAS 1. Read policies

Global Namespace
Interface node

Policy Engine

Interface node

Remote Scale Out NAS

Target file system 1 snapshot

Storage node

Storage node

Target file system 2 snapshot File system 1 snapshot File system 2 snapshot

Figure 3-27 SONAS async replication step 1 - execute a policy, makes snapshots

2. The next step is to make a mathematical hash of the source and target snapshots, and compare them, as shown in Figure 3-28.

Hash compare: determines incremental changes to send


IBM Scale Out NAS

Global Namespace
Policy Engine
Interface node

hash

Interface node

hash

Remote Scale Out NAS

hash 2. Scan, hash compare


Storage node

Target file system 1 snapshot

Storage node

hash

Target file system 2 snapshot File system 1 snapshot File system 2 snapshot

Figure 3-28 SONAS async replication step 2 - compare mathematical hash of snapshots

Chapter 3. Software architecture

117

The final step is to exploit the parallel data transfer capabilities of SONAS by having multiple nodes participate in the transfer of the async replication changed blocks to the target remote file systems, as shown in Figure 3-29.

IBM Scale Out NAS

3. Parallel transmit to remote site(s)

Global Namespace
Policy Engine
Interface node

Interface node

Remote Scale Out NAS

Target file system 1 snapshot

Storage node

Storage node

Target file system 2 snapshot File system 1 snapshot File system 2 snapshot

Figure 3-29 SONAS async replication step 3 - transfer data using multiple interface nodes

The internal snapshot at the source side assures that data being transmitted is in data integrity and consistency, and is at a single point in time. The internal snapshot at the target is there to provide a backout point in time capability, if for any reason the drain of the changes from source to target fails before it is complete. Let us review a few more details about the SONAS asynchronous replication. SONAS asynchronous replication is designed to cope with connections that provide low bandwidth, high latency and low reliability. The basic steps in of SONAS asynchronous replication are as follows: 1. Take a snapshot of both the local and remote file system(s). This ensures first that we are replicating a frozen and consistent state of the source file system. 2. Collect a file path list with corresponding stat information, by comparing the two with a mathematical hash, in order to identify changed blocks 3. Distribute the changed file list to a specified list of source interface node(s) 4. Run a scheduled process that performs rsync operations on the set of interface nodes, for a given file list, to the destination SONAS. Rsync is a well-understood open source utility, that will pick-up the changed blocks on the source SONAS file system, and stream those changes in parallel to the remote, and write them to the target SONAS file system. 5. The snapshot at the remote SONAS system insures that a safety fallback point is available if there is a failure in the drain of the new updates. 6. When the drain is complete, then the remote file system is ready for use. 7. Both snapshots are automatically deleted after a successful replication run. The target SONAS system is an independent SONAS cluster that might be thousands of miles away. At the current release level SONAS R1.1.1, asynchronous replication is available for replicating incremental changes at the file system level to one other site. Asynchronous replication is done using an IBM enhanced and IBM supported version of open source 'rsync'. The enhancements include the ability to have multiple SONAS nodes in parallel work on the rsync transfer of the files.

118

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The asynchronous replication is unidirectional; changes on the target site are not replicated back. The replication schedule is configured thru the SONAS GUI or by CLI. Depending on the number of files included in the replication, the minimal interval will vary depending on the amount of data and files to be sent. For additional information about how to configure asynchronous replication, see Local and remote replication on page 198.

3.9 SONAS and Tivoli Storage Manager integration


In this section we provide more SONAS configuration details on how the SONAS and Tivoli Storage Manager work together and are configured. You can choose to use the SONAS-specific integration and exploitation with Tivoli Storage Manager for either or both of these two functions: Protect the data in SONAS with backup and restore functionality to guarantee data availability in case of data corruption, accidental deletion or hardware loss Migrate low access data from SONAS to Tivoli Storage Manager managed storage devices such as tape to free up space inside the SONAS system The SONAS to Tivoli Storage Manager integration is designed to support accelerated backups on the file-level of entire GPFS filesystems using a Tivoli Storage Manager client to an external Tivoli Storage Manager server, and to provide a file-level restore capability. The SONAS to Tivoli Storage Manager integration also offers SONAS customers the ability to perform HSM to external Tivoli Storage Manager managed storage devices to free up space in the SONAS system. Files that have been moved by HSM to external, Tivoli Storage Manager managed, storage are called migrated files. When a migrated file is accessed SONAS initiates a recall operation to bring the file back from Tivoli Storage Manager storage to SONAS disk and this recall is transparent to the SONAS client accessing the file, the client will only notice a delay proportional to the time required to recall the file from Tivoli Storage Manager. The Tivoli Storage Manager to SONAS backup integration is file based, that means that Tivoli Storage Manager performs backup and restore at the file level and handles individual files as individual Tivoli Storage Manager objects. This offers the flexibility of incremental backup and gives us the ability to restore individual files. With this architecture the Tivoli Storage Manager database needs to be sized appropriately, it will have an entry for each file managed, to hold entries for all files being backed-up. The SONAS system runs the Tivoli Storage Manager clients on all or a subset of interface nodes. These interface nodes connect to an external, customer supplied, Tivoli Storage Manager server through the customer LAN network. The Tivoli Storage Manager server contains a database that inventories all files that have been backed up or migrated to Tivoli Storage Manager and owns the storage devices where backed up and migrated data is stored.

Chapter 3. Software architecture

119

Figure 3-30 shows a diagram of the SONAS and Tivoli Storage Manager configuration.

TSM client code is preinstalled


Mgmt Node Interface Node Interface Node

Storage Pod

SONAS to TSM is Ethernet only

IBM SONAS

disk

TSM server external to SONAS

Figure 3-30 SONAS and Tivoli Storage Manager configuration

As compared to normal, conventional backup software, SONAS and Tivoli Storage Manager integration is designed to provide significantly accelerated backup elapsed times or high performance HSM to external storage, by exploiting the following technologies: The fast SONAS scan engine is used to identify files for Tivoli Storage Manager to back up or migrate. This is much faster compared to standard Tivoli Storage Manager backups, or other conventional backup software, that needs to traverse potentially large filesystems and checking each file against the Tivoli Storage Manager server. The SONAS scan engine is part of the SONAS file system and knows exactly which files to back up and migrate and will build a list of files, the filelist, to back up or migrate. The list is then passed to Tivoli Storage Manager for processing. Instead the standard operation of Tivoli Storage Manager requires that it traverse all files in the file system and send the information to the Tivoli Storage Manager server to determine which files need a backup and which files are already present in the file server. Multiple SONAS interface nodes can be configured to work in parallel so that multiple Tivoli Storage Manager clients can stream data to the Tivoli Storage Manager server at an accelerated rate. The SONAS Software will distribute parts of the filelist as backup jobs to multiple Tivoli Storage Manager clients configured on a given set of interface nodes. Each interface node then operates in parallel on its own subset of the files in the filelist. Each Tivoli Storage Manager process can establish multiple sessions to the Tivoli Storage Manager server. Tivoli Storage Manager customers can make use of their existing Tivoli Storage Manager servers to back up SONAS, if the server has enough capacity to accommodate the new workload. Configuring SONAS to perform Tivoli Storage Manager functions requires only a few commands, these commands need to be issued both on the Tivoli Storage Manager server and the SONAS system. These command perform both the initial configuration and the scheduling of the backup operations. HSM migration operations are configured separately using the policy engine, as discussed in SONAS data management services on page 107. In SONAS, Tivoli Storage Manager backup is performed over LAN through one or more interface nodes and these connect to one or more Tivoli Storage Manager servers. It is not 120
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

possible to do LAN-free backup at this time from SONAS directly to storage devices managed the Tivoli Storage Manager server. For more information about how to configure SONAS with Tivoli Storage Manager, see Backup and restore of file data on page 185.

3.9.1 General Tivoli Storage Manager and SONAS guidelines


A SONAS system can accommodate large quantities of data, both in terms of space used and also in terms of number of files. SONAS supports to 256 filesystems and each filesystem can have hundreds of millions of files. Whether you are considering backup or HSM of the SONAS primary space, you have to take into account your expected workload characteristics. The Tivoli Storage Manager server has a DB2 database that inventories all files that are stored in Tivoli Storage Manager and each file requires around 1 KB of Tivoli Storage Manager database space. As a general rule, keep the Tivoli Storage Manager database limited to a size of 1 TB and consequently it can accommodate around 1 billion files. Currently an individual SONAS filesystem can be backed up by a single Tivoli Storage Manager server, you cannot split a filesystem to back up to various Tivoli Storage Manager servers. You can configure multiple or all filesystems to use the same Tivoli Storage Manager server. If your filesystems have large numbers of files, in the order of a billion or more, plan the filesystem to Tivoli Storage Manager server association so as not to overwhelm the Tivoli Storage Manager server with files and probably you will require multiple Tivoli Storage Manager servers. Also consider the required throughput of the backup system in terms of files and amount of data per unit of time. Assume that you have 10 TB of data, an average file size of 100k, and 100 million files, and a daily change rate of 10%. This gives 1 TB/day and 10 million files to back up. Assuming that you have a 4 hour backup window, your backup environment will need to accommodate something like 250 GB/h or 70 MB/sec and 2.5 million files/hour or around 700 files/sec. The data rate can be easily accommodated, but the amount of files to handle can be a challenge and might require multiple Tivoli Storage Manager servers. Tivoli Storage Manager manages multiple storage devices, these can be disk and tape technologies. Disk has good random and sequential performance characteristics and low latency. Tivoli Storage Manager disk storage can accommodate multiple read and write streams in parallel. Also consider disk contention because multiple parallel streams can cause disk contention, and the aggregate throughput can be less than that of a single stream. Tape storage devices offer various characteristics, they can store data for long periods of time and are very energy efficient as tapes consume no energy at rest. Current tape technologies have high sequential data rates in the order of 100-200MB/sec. With Tivoli Storage Manager, each backup session uses an individual tape drive. Tapes are usually mounted automatically by a tape library and the mount time depends on the library model, but in general, you can assume it to be between 40 and 120 seconds. Tapes then need to be positioned and this can take around 30 seconds depending on the drive technology. During this time the application using the tape sees a delay, in the case of backup this delay is generally a small part of the total backup operation duration. In the case of HSM it is felt directly by the application that uses the file because it waits until the data is recalled to SONAS disk. Next, we give general guidelines regarding the use of SONAS with Tivoli Storage Manager. These have to be taken in the context of your specific data characteristics such as file size and amount of data, your workload requirements in terms of daily backup traffic, and your restore speed expectations: If you have many small files to back up on a daily basis and you need to send multiple backup streams to the Tivoli Storage Manager server consider using Tivoli Storage

Chapter 3. Software architecture

121

Manager disk-pool as the primary pool to store data. If you configure the disk pool larger than the normal amount of backup data that gets backed up per backup-run, so that all data first gets copied to disk, then no tape mount is required during a backup Depending on the amount of data in SONAS, it might be necessary to have one dedicated Tivoli Storage Manager server per filesystem considering that one SONAS filesystem can contain 2 billion files. If you need to back up large files to Tivoli Storage Manager, larger than 1 MB, then you might consider sending them directly to tape without storing them on a disk storage pool. You will need to configure as many tape drives as the number of parallel sessions you have configured to Tivoli Storage Manager in SONAS. When using SONAS HSM that migrates data outside the SONAS environment, probably consider using tape as the final destination of the data because if you use disk, you defeat the purpose of migration. When using HSM to tape, remember to plan for the application delay in accessing the data because of the time required to mount and position the tape and then the time required to recall the data to SONAS disk. The Tivoli Storage Manager backup is not using the classical process to traverse the filesystem, compare the client contents with those on the server, and identify the changes, because this is time-consuming due to the interaction between the filesystem, Tivoli Storage Manager client, and remote Tivoli Storage Manager server. Instead, the SONAS Software is called to use the high performance scan engine and the policy engine to identify changes in the filesystem, and to generate the list of files that need to be expired, and the list of files that need to be backed up. Various scripts are provided with the SONAS Software to define the interface nodes involved in the backup, the relationship of which filesystem needs to be backed up to which Tivoli Storage Manager server, and to schedule, start and stop backup and restores operations. Do not consider the use of SONAS HSM with Tivoli Storage Manager as a replacement for backups. HSM must be viewed as an external storage extension of local SONAS disk storage. A Tivoli Storage Manager backup implies two concepts, the first is that the backup is a copy of the original file, regardless of where the original file is and that can be either inside a SONAS filesystem or in Tivoli Storage Manager external storage. The second concept is that the backup file can exist in multiple versions inside Tivoli Storage Manager storage, based o the Tivoli Storage Manager backup policies you configure. Tivoli Storage Manager backups will allow you to restore a file that has been damaged or lost, either because of deletion or logical corruption of the original file or because of media failure either in SONAS storage or in Tivoli Storage Manager storage. When a file is migrated using the Tivoli Storage Manager HSM server to the external Tivoli Storage Manager HSM storage, there is still only one copy of the file available, because the original is deleted on the SONAS file system itself, and replaced by the Tivoli Storage Manager/HSM stub file. Also, HSM with Tivoli Storage Manager maintains only the current copy of the file, giving no opportunity to store multiple versions. In comparison, Tivoli Storage Manager backup/archive (or typically any backup/archive software) gives you the full ability to storage multiple backup versions of a file, and to track and manage these backup copies in an automated way. It is a Tivoli Storage Manager best practice to back up a file before the file has been migrated by Tivoli Storage Manager HSM to external storage. With proper configuration, you can specify in Tivoli Storage Manager management classes that a file is not eligible for HSM migration unless a backup has been made first with the Tivoli Storage Manager backup-archive capability. Generally an HSM managed file lifecycle implies file creation, the 122
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

backup of the file shortly after creation, the file then stays on disk for a given amount of time and is later migrated to Tivoli Storage Manager HSM storage. If the file becomes a candidate for migration very shortly after creation the following two scenarios can occur: If you specify in Tivoli Storage Manager that migration requires backup then the file will not be migrated until a backup cycle has successfully completed for the file. The file will be copied from SONAS to Tivoli Storage Manager two times: one time for backup and one time for migration. If you specify in Tivoli Storage Manager that migration does not require backup then the file will be migrated and a subsequent backup cycle will cause the file to be copied inside Tivoli Storage Manager from Tivoli Storage Manager HSM storage to Tivoli Storage Manager backup storage. The file will be copied from SONAS to Tivoli Storage Manager only one time and the second copy will be made by the Tivoli Storage Manager server. Migration: If ACL data of a premigrated file is modified, these changes are not written to the Tivoli Storage Manager server, if the file will be migrated after this change. To avoid losing the modified ACL data, use the option migraterequiresbackup yes. This setting will not allow the migration of files whose ACL data has been modified and no current backup version exists on the server. You can back up and migrate your files to the same Tivoli Storage Manager server or to various Tivoli Storage Manager servers. If you back up and migrate files to the same server, the HSM client can verify that current backup versions of your files exist before you migrate them. For this purpose, the same server stanza for backup and migration must be used. For example, if you are using the defaultserver and migrateserver Tivoli Storage Manager options, they must both point to the same server stanza within the Tivoli Storage Manager dsm.sys file. You cannot point to various server stanzas, even if they are pointing to the same Tivoli Storage Manager server. To restore stub files rather than backup versions of your files, for example, if one or more of your local file systems is damaged or lost, use the Tivoli Storage Manager backup-archive client restore command with the restoremigstate option. Your migrated and premigrated files remain intact on the Tivoli Storage Manager server, and you need only restore the stub files on your local system. However you cannot use the backup-archive client to restore stub files for your migrated files, if they have been backed up before the migration. Instead use the Tivoli Storage Manager HSM dsmmigundelete command to recreate stub files for any migrated or premigrated files that are lost. If you back up and migrate data to tape volumes in the same library, make sure that there are always a few tape drives available for space management. You can achieve this by limiting the number of tape drives which can be used simultaneously by backup and archive operations. Specify a number for the mountlimit, which is less than the total number of drives available in the library (see the mountlimit option of the define devclass command in the IBM Tivoli Storage Manager Administrator's Reference for your operating system). Using disk storage as your primary storage pool for space management might, depending on the average size of your files, result in a better performance than using tape storage pools. If you back up files to one Tivoli Storage Manager server and migrate them to another Tivoli Storage Manager server, or if you are using various Tivoli Storage Manager server stanzas for backup and migration, the HSM function cannot verify that current backup versions of your files exist before you migrate them. Use the backup-archive client to restore the actual backup versions only.

Chapter 3. Software architecture

123

Archiving and retrieving files


Tivoli Storage Manager archiving of files refers to the operation of storing a copy of the files in Tivoli Storage Manager that is then retained for a specific period of time, as specified in the Tivoli Storage Manager management class associated to the file. Tivoli Storage Manager archived files are not subject to versioning, the file exists in Tivoli Storage Manager regardless of what happens to the file in primary SONAS storage. Archiving is only used to retain a copy of the file for long periods of time. SONAS does not support archiving of files, this means that the SONAS Tivoli Storage Manager client interface does not allow you to specify archiving of files. If you want to use the Tivoli Storage Manager archiving function, install a Tivoli Storage Manager client on a datamover system, a server external to SONAS, mount the SONAS exported filesystems you want to archive to Tivoli Storage Manager on this server, and then initiate the archive operation using the Tivoli Storage Manager archive command on this server. The same process can be used to retrieve files archived to Tivoli Storage Manager. Note that the performance of the backup operation can be impacted if you need to archive large numbers of small files. Ensure that the user on the datamove system has the necessary authority to read and write the files.

Restoring file systems overview


If you lose an entire file system and you attempt to restore backup versions of all your files, including those that are migrated and premigrated, do proper planning to avoid your file system running out of space. If your file system runs out of space during the restore process, the HSM function must begin migrating files to storage to make room for additional restored files, thereby slowing the restore process. Evaluate the dsmmigundelete command to restore migrated files as stub files.

Tivoli Storage Manager manuals and information


You can find more information about Tivoli Storage Manager at the online Tivoli Storage Manager information center at: http://publib.boulder.ibm.com/infocenter/tsminfo/v6/index.jsp

3.9.2 Basic SONAS to Tivoli Storage Manager setup procedure


In a SONAS environment, the basic setup procedure for connecting SONAS to Tivoli Storage Manager is as follows: 1. Make sure that the Tivoli Storage Manager servers (Tivoli Storage Manager server V5.5 or above are supported) are connected to the network and reachable from the SONAS interface nodes. 2. Set up each Tivoli Storage Manager server used for SONAS with a backup pool to use the backup feature. 3. For each Tivoli Storage Manager server used for SONAS, have one node name registered for each interface node that is being used for backup from SONAS and an additional virtual pr proxy node to represent the SONAS system. 4. Set up to have one SONAS file system backed up against Tivoli Storage Manager server1, while another SONAS file system is backed up against Tivoli Storage Manager server2. 5. Create a proxy node name on the Tivoli Storage Manager Server. 6. Grant access for each of the cluster nodes to back up data to this proxy node name, because the virtual proxy node can be used from more than one node in SONAS.

124

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

7. Set Tivoli Storage Manager server authentication to ON (set auth on). 8. Make sure that Tivoli Storage Manager server date/time and SONAS nodes data/time are in sync. 9. Create a new SONAS backup schedule using the StartBackupTSM template.

3.9.3 Tivoli Storage Manager software licensing


The Tivoli Storage Manager client software code is pre-installed within SONAS and is resident in the code from the factory so there is no need to order it and install it separately. If you are not using Tivoli Storage Manager functions, there is no charge for the fact that this Tivoli Storage Manager client code is present in the SONAS Software. The Tivoli Storage Manager client code is installed only on the interface nodes and not on the storage nodes and, as of the writing of this book, the Tivoli Storage Manager client version 6.2.2 is used internally in SONAS Software. There are two separate Tivoli Storage Manager clients installed in SONAS: the Tivoli Storage Manager backup/archive client used for backup and restore of SONAS files and the Tivoli Storage Manager HSM client used for space management by offloading files from SONAS storage to Tivoli Storage Manager storage and offering transparent recall. The Tivoli Storage Manager backup/archive client is supplied as part of the IBM Tivoli Storage Manager Standard Edition and the IBM Tivoli Storage Manager Extended Edition products. The Tivoli Storage Manager HSM client is part of the IBM Tivoli Storage Manager for Space Management product. These Tivoli Storage Manager client components will need to connect to a Tivoli Storage Manager server that is external to SONAS. The Tivoli Storage Manager server also needs to be licensed. You can use an existing Tivoli Storage Manager server if your installation has one already.

Licensing considerations
You are required to pay a license charge for the Tivoli Storage Manager client code only if you are using Tivoli Storage Manager functions, and you only pay the license charge for the interface nodes that are attached to Tivoli Storage Manager servers and actively run Tivoli Storage Manager client code. The Tivoli Storage Manager HSM client requires the Tivoli Storage Manager backup/archive client, so to use HSM functionality, both clients must be licensed for each interface node running the code. Even though Tivoli Storage Manager can be licensed for a subset of interface nodes, it is best to license the function on all interface nodes for the following reasons: A SONAS filesystem can be mounted on a subset of nodes or on all nodes. Mounting the file system on all nodes guarantees the maximum level of availability of the resource in case of failover. To manage a file system, Tivoli Storage Manager code must run on at least one of the nodes where the file system is mounted. It is best to run Tivoli Storage Manager code on multiple nodes where the filesystem is mounted to guarantee service during failover. The Tivoli Storage Manager client can execute parallel backup streams from multiple nodes for the same filesystem thus increasing backup and restore throughput. When using Tivoli Storage Manager HSM, file recalls can occur on any node and need to be serviced by a local Tivoli Storage Manager HSM client.

Chapter 3. Software architecture

125

Processor value units licensing


The Tivoli Storage Manager licensing is calculated on the processor value units (PVU) of the SONAS interface node or group of nodes that run the Tivoli Storage Manager code. Tivoli Storage Manager client licensing is not calculated for the terabytes of storage that can be on the SONAS system. For a more detailed explanation of Tivoli Storage Manager PVU licensing, see the following website: http://www-01.ibm.com/software/lotus/passportadvantage/pvu_licensing_for_customers .html At the time of writing, each SONAS interface node has two sockets, each with a quadcore processor for a total of 8 cores. The interface node has Xeon Nehalem EP processors that imply a value of 70PVU per core. So the required Tivoli Storage Manager PVUs for each interface node running Tivoli Storage Manager code are 560PVU that corresponds to 8 cores times 70 PVU per core. If you choose to run Tivoli Storage Manager code on 3 interface nodes, you will be required to license 1680PVUs. For additional information about Tivoli Storage Manager, Tivoli Storage Manager sizing guidelines, Tivoli Storage Manager performance optimization, and tuning knobs, see the many Redbooks publications and white papers that are available at the Redbooks publications website: http://www.redbooks.ibm.com/

3.9.4 How to protect SONAS files without Tivoli Storage Manager


If you do not have Tivoli Storage Manager in your environment and want to back up the SONAS data, you need to use an external datamover system that can mount the SONAS file system exports. This is similar to the procedure discussed in Archiving and retrieving files on page 124. Install a backup client of your choice on the external datamover server. Ensure that the user on the datamove system has the necessary authority to read and write the files. You can then start the backup and restore operations using your backup software. Note that the performance of the backup operation can be impacted if you need to back up large filesystems with large numbers of files.

3.10 SONAS system management services


SONAS provides a comprehensive set of facilities for globally managing and centrally deploying SONAS storage. In this section we provide an overview of the Management GUI, the Command Line Interface and the Health Center. For information about accessing the GUI and command line, see Using the management interface on page 314. The SONAS GUI and command line (CLI) connect to a server that runs on the SONAS management node, as illustrated in Figure 3-31. The server collects data from the interface and storage nodes and stores the data in a database. It can also run data collection tasks on the SONAS nodes and also this data gets stored in the database. The data is then served to the CLI by the CLI server component and to the GUI through the ISC controller. Data displayed on the GUI and CLI is mainly retrieved form the database.

126

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

CIM Agent
(handles SNMP using an Adaptor)

Lightweight Infrastructure Framework (LWI)


CIM Service

SSH Daemon

Interface Nodes

Tasks SSH Client

CLI Server

Business Layer

CIM Agent
(handles SNMP sing an Adaptor)

SSH Daemon DB

GUI
(ISC Controller)

Storage Nodes
SONAS Backend

Management Node

Web Browser

Figure 3-31 SONAS GUI and CLI back-end

SONAS uses multiple specialized gatherer tasks to collect data and update the database, as shown in Figure 3-32. For example, clicking the Refresh button on the File Systems GUI page starts a File System gatherer task, which will get the needed information from the nodes attached to the cluster. The last time the gather was run is displayed on the bottom right button in the file system GUI window.

Gatherer Tasks File Systems SSH Client Exports SONAS Backend SSH SSH Daemon Daemon DB

Node Node

Figure 3-32 SONAS back-end gather tasks

3.10.1 Management GUI


SONAS provides a centralized web-based graphical user interface and Health Center for configuration and monitoring tasks. Users access the GUI / Health Center by a standard web browser. There is a command line interface (CLI) as well. SONAS Management GUI server runs on the SONAS Management Node and is web-based, you can access it from a remote web browser using the https protocol. It provides role-based authorization for users, and enables the administrator to maintain the SONAS cluster. These

Chapter 3. Software architecture

127

roles are used to segregate GUI administrator users according to their working scope within the Management GUI. These defined roles are as follows: Administrator: This role has access to all features and functions provided by the GUI. This role is the only one that can manage GUI users and roles. Operator: The operator can do the following tasks: Check the health of the cluster. View the cluster configuration. Verify the system and file system utilization. Manage to set thresholds and notifications settings.

Export administrator: The export administrator is allowed to create and manage shares, plus perform the tasks the operator can execute. Storage administrator: The storage administrator is allowed to manage disks and storage pools, plus perform the tasks the operator can execute. System administrator: The system administrator is allowed to manage nodes and tasks, plus perform the tasks the operator can execute. For additional information about administration roles and defining administrators, see User management on page 399. SONAS has a central database that stores configuration information and events. This information is used and displayed by the management node and collected to the management node from the other nodes in the cluster. The SONAS Management GUI and Health Center provide panels for most functions; a partial list follows: Storage management File system management Pool management Fileset management Policy management Access control list (ACL) management Synchronous replication management Heiarchical Storage management Tivoli Storage Manager backup management Async replication management Snapshot management Quota management Cluster management Protocol management (CIFS, NFS, HTTPS, FTP) Export management Event log Node availability Node utilization (CPU, memory, I/O) Performance management (CPU, memory, I/O) File system utilization (capacity) Pool / disk utilization (capacity) Notifications / call-home Hardware monitoring File access services such as NFS, HTTPS, FTP, and CIFS File system services Nodes including CPUs, memory DIMMs, VRM, disk drives, power supplies, fans and onboard network interface ports I/O adapters including storage and network access Storage utilization

128

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Panels are available for most of the major functions, as shown in Figure 3-33.

Figure 3-33 SONAS Management GUI has panels for most aspects of SONAS

Chapter 3. Software architecture

129

SONAS has a complete Topology Viewer, that shows in graphical format, the internal components of the SONAS, reports on their activity, and provides a central place to monitor and display alerts. You can click an icon and drill down into the details of the particular component, this function is especially useful when drilling down to solve a problem. In Figure 3-34, we see an example of the SONAS Management GUI Topology Viewer.

Exports / shares status External interface network throughput

File systems status

At a glance look at all interface node status

Internal data network performance

Storage node status and performance

Figure 3-34 SONAS Management GUI - Topology Viewer

Each of the icons is clickable, and will expand to show status of an individual components. The SONAS Management GUI is the focal point for extended monitoring facilities and the SONAS Health Center.

130

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

3.10.2 Health Center


The SONAS Health Center provides a central place to view the overall SONAS health, including examining the System Log, Alert Log, and System Utilization Reports and graphs. Through the SONAS Management GUI, repeating tasks can be set up, utilization thresholds set, notification settings refined and notification recipients defined. SONAS tracks historical performance and utilization information, and provides the ability to graphically display the current and historical trends. This is shown in Figure 3-35.
Interface node memory utilization

Interface node CPU utilization

Example historical reports for a SONAS interface node

Interface node network utilization

Storage node disk utilization

Figure 3-35 SONAS Health Center historical system utilization graphical reports

The length of time that can be reported is determined by the amount of log space set aside to capture data. For additional information about the Health Center, see Health Center on page 420.

3.10.3 Command Line Interface


The SONAS Command Line Interface (CLI) runs on the SONAS Management Node. The CLI provides the ability to perform SONAS administrative tasks, and implements about 110 CLI commands. The focus is on enabling scripting of administrative tasks. CLI primarily for installation and setup commands, with additional configuration functionality The CLI includes commands for all SONAS functions: Cluster configuration Authentication Network Files File Systems Exports
Chapter 3. Software architecture

131

File Sets Quotas Snapshots Replication ILM automatic tiered storage HSM Physical management of disk storage Performance and Reports System Utilization SONAS Console Settings Scheduled Tasks The SONAS CLI is designed to be familiar to the standard UNIX, Windows, and NAS administrator.

3.10.4 External notifications


SONAS collects data and can send event information to external recipients. To get proactive notifications for events which need to be supervised, the administrator can configure thresholds, the events to trigger a notification and who must be the notification recipient. This ensures that the administrator is informed when the incident takes place. Figure 3-36 shows the SONAS notification monitoring architecture.

SNMP IMM DDN SMC Voltaire Mutipath Network


CIFS Samba

Log System Checkout CIM Health Center MgmtNode SNMP


Gatherer

Call Home

CTDB GPFS ... Syslog

SMTP

GUI

Figure 3-36 Notification monitoring architecture

SONAS supports the following kinds of notifications: Summary email that collects all messages and sends out the list on a regular basis Immediate email and SNMP traps: Contents are the same for both email and SNMP. Log messages are instantly forwarded. A maximum number of messages can be defined after that number is reached further messages are collected and a summary is sent.

132

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

These messages originate from multiple sources including Syslog, GUI gatherer (gpfs status, ctdb status), CIM messages from providers in cluster and SNMP messages from nodes in cluster. SONAS also allows you to set utilization thresholds, when the threshold is reached a notification gets sent, for various resources including: CPU usage File system usage GPFS usage Memory usage Network errors

3.11 Grouping concepts in SONAS


SONAS is a scale out architecture where multiple interface nodes and storage pods act together in a coordinated way to deliver service. SONAS commands and processes can be configured and run on all nodes or on a subset of nodes, a group of nodes. We discuss these SONAS grouping concepts, present where they are applied and discuss grouping dependencies. In a SONAS cluster we have multiple interface nodes that can export CIFS and NFS shares for the same set of underlying SONAS filesystems. After creating a filesystem this can be mounted on a subset of nodes or on all nodes using the mountfs command. The example in Figure 3-37 shows that filesys#1 is mounted on node#1, node#2 and node#3 whereas filesys#3 is mounted on nodes #2, #3 and #4. To make the filesystem available to users it is then exported using the mkexport command, which does not allow you to specify a given subset of nodes but it makes the export available on all nodes. This is why the diagram shows an exports box across all interface nodes with three exports for all the filesystems: exportfs#1 for filesys#1 and so on. When the CTDB manages exports it will give a warning for nodes that do not have the filesystem mounted as the export of a filesystem depends on mount of that filesystem being present. The network group concept represents a named collection of interface nodes; in our example in Figure 3-37 on page 134 we have three network groups, netwkgrp#1 associated with interface nodes #1 #2 and #3, netwkgrp#2 associated with interface nodes #2 #3 and #4 and lastly the default netwkgrp associated with all interface nodes. The network object is a collection of common properties that describe a network, such as the subnet mask, gateway, VLAN ID, and so on. A network aggregates a pool of IP addresses. These IP addresses are assigned to the raw interfaces or to the bonds of the associated interface nodes. The example in Figure 3-37 shows that network#A is associated to IP addresses IPA1, IPA2 and IOA3 an so on. Networks are connected to one single network group using the attachnw command. Our example in Figure 3-37 shows three networks: network#A, network#B and network#C attached respectively to network group netwgrp#1, netwgrp#2 and the default netwkgrp. A DNS alias can be created for each network that resolves to all the IP addresses in the given network. For example, if network#A has IP addresses IPA1, IPA2 and IPA3 we create a DNS alias called, for example, SONAS#A that resolves to the three IP addresses. We also create a DNS alias for network#B and network#C that resolves to network#B and network#C IP addresses.

Chapter 3. Software architecture

133

Network#A Network#A

Network#B Network#B

Network#C Network#C

DNS DNS SONAS#A SONAS#A IPA1, IPA1, IPA2, IPA2, IPA3 IPA3 SONAS#B SONAS#B IPB1, IPB1, IPB2, IPB2, IPB3 IPB3 SONAS#C SONAS#C IPC1, IPC1, IPC2, IPC2, IPC3, IPC3, IPC4,IPC5,IPC6 IPC4,IPC5,IPC6

Netwkgrp#1 Netwkgrp#1

Netwkgrp#2 Netwkgrp#2

Default Default netwkgrp netwkgrp

IPA1

IPC4

IPA2

IPB1 IPC5

IPB2 IPC6 IPA3

IPB3

IPC1

IPC2

IPC3

Interface Interface Node Node #1 #1

Interface Interface Node Node #2 #2

Interface Interface Node Node #3 #3

Interface Interface Node Node #4 #4

Interface Interface Node Node #5 #5

Interface Interface Node Node #6 #6

Exports: Exports: exportfs#1, exportfs#1, exportfs#2, exportfs#2, exportfs#3 exportfs#3

filesys #1

filesys #3

filesys #2

Figure 3-37 Filesystems, exports and networks

In the example in Figure 3-37, filesystem filesys#2 is accessible through the export exportfs#2 through all networks. Filesystem filesys#1 instead will be accessible only through network#A and the DNS alias SONAS#A. Accessing filesys#1 over network#B and DNS alias SONAS#B can cause problems because the DNS might return IP address IPB3, which is associated with node#4 that does not mount filesys#1. When creating network groups, take care to ensure that filesystems accessed through a given network are mounted on all that networks network group nodes. In failover situations the IP address will be taken over by another node in that specific network group, and this ensures that in case of failover that specific IP address will allow you to access all filesystems associated with, or mounted on the nodes of, that network group. One way to ensure that there are no mismatches between mounted filesystems and network groups is to mount the share on all interface nodes and access it only on a given network group. Network groups can be used for multiple reasons: When we limit the client access to two or three nodes, we increase the probability of finding data in cache more so than if the access were spread across many nodes, so that it gives a performance benefit. Another use is to segregate workloads such as production and test in the same SONAS cluster.

3.11.1 Node grouping


Grouping concepts apply also to SONAS backups with Tivoli Storage Manager as illustrated in Figure 3-38. In this case we have three filesystems, filesys#1 , #2 and #3 that we want to back up to two Tivoli Storage Manager servers, TSMs#1 for filesys#1 and TSMs#2 for filesys#2 and filesys#3. Because a SONAS GPFS parallel filesystem can be accessed by multiple interface nodes and also backed up from multiple interface nodes, we have to define that multiple Tivoli Storage Manager client requests are made to the Tivoli Storage Manager server for the same filesystem.

134

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Node grouping and Tivoli Storage Manager


To accommodate for this behavior on the Tivoli Storage Manager server we have a grouping concept: on the Tivoli Storage Manager server we define or register individual node names, one for each interface node that will connect to the Tivoli Storage Manager server and then we define a target node; the interface node definitions are a proxy of the target node. In our example in Figure 3-38, we back up filesys#1 to Tivoli Storage Manager server TSMs#1, as filesys#1 is accessed through interface nodes node#1, #2 and #3 in the Tivoli Storage Manager server we define the Tivoli Storage Manager client nodes node#1, #2 and #3 and these nodes will act as a proxy for the target node fs1tsm. In Tivoli Storage Manager terms all data received from node#1, #2 and #3 will be stored under the name of the proxy target fs1tsm. This allows for data backed up on any agent node to be restored on any other agent node as all requests to the Tivoli Storage Manager server from the foregoing three nodes are serviced by the common proxy target fs1tsm. We then have Tivoli Storage Manager client configuration parameters, also called Tivoli Storage Manager stanzas, on the interface nodes. These are configured using the cfgtsmnode SONAS command, after executing the command on multiple nodes. This command configures a named Tivoli Storage Manager server instance for the interface node Tivoli Storage Manager client that contains server connection parameters, Tivoli Storage Manager client name and password and Tivoli Storage Manager server proxy target. For example, on node#2 we use the cfgtsmnode command twice as we use this node to back up of both filesys#1 and filesys#3. For filesys#1 we configure a stanza called tsms#1 that points to Tivoli Storage Manager server tsms#1 and proxy target fs1tsm and for filesys#3 we configure a stanza called tsms#2 that points to Tivoli Storage Manager server tsms#2 and Tivoli Storage Manager proxy target fs3tsm. Using these definitions a Tivoli Storage Manager client on the interface nodes can connect to the Tivoli Storage Manager server, that means a Tivoli Storage Manager client running on the interface node is enabled to connect to the assigned Tivoli Storage Manager server, but where the client actually runs is another matter. Backup execution is controlled using the cfgbackupfs command. The diagram in Figure 3-38 on page 136 shows that, for example, filesystem filesys#2 is enabled for backup on nodes node#2 to node#6 as there is a Tivoli Storage Manager server stanza defined on each interface node, but there is no stanza definition on node#1 even though this node has filesys#2 mounted. We can execute the backups for filesys#2 on nodes node#2 to node#6 but we decide to segregate backup operations only to nodes node#5 and node#6 so we execute the cfgbackpfs command for filesys#2 specifying only nodes node#5 and node#6.

Chapter 3. Software architecture

135

TSM TSM Server: Server: TSMs#1 TSMs#1 proxy proxy target: target:fs1tsm fs1tsm nodes:node#1,node#2,node#3 nodes:node#1,node#2,node#3

TSM TSM Server: Server: TSMs#2 TSMs#2 proxy proxy target: target:fs2tsm fs2tsm nodes:node#1,node#2,node#3,node#4,node#5,node#6 nodes:node#1,node#2,node#3,node#4,node#5,node#6 proxy: proxy:fs3tsm fs3tsm nodes:node#2, nodes:node#2, node#3, node#3, node#4 node#4

cfgbackupfs cfgbackupfs filesys#1 filesys#1 node#1 node#1

cfgbackupfs cfgbackupfs filesys#3 filesys#3 node#2,node#3,node4 node#2,node#3,node4

cfgbackupfs cfgbackupfs filesys#2 filesys#2 node#5,node#6 node#5,node#6

Interface Interface Node Node #1 #1


Stanza: Stanza: tsms#1 tsms#1

Interface Interface Node Node #2 #2


Stanza: Stanza: tsms#1 tsms#1 tsms#2 tsms#2

Interface Interface Node Node #3 #3


Stanza: Stanza: tsms#1 tsms#1 tsms#2 tsms#2

Interface Interface Node Node #4 #4


Stanza: Stanza: tsms#2 tsms#2

Interface Interface Node Node #5 #5


Stanza: Stanza: tsms#2 tsms#2

Interface Interface Node Node #6 #6


Stanza: Stanza: tsms#2 tsms#2

filesys #1

filesys #3

filesys #2

Figure 3-38 Tivoli Storage Manager grouping concepts

We have multiple grouping concepts in action here. On the Tivoli Storage Manager server side we define one proxy target for each filesystem and this proxy target is associated with multiple proxy agent nodes. You can define a subset of nodes as proxy agents but this might lead to errors if a backup is run from a node that is not defined as a proxy agent so, to avoid such errors, define all interface nodes as proxy agents for Tivoli Storage Manager. The cfgtsmnode command will create a Tivoli Storage Manager server definitions or stanzas on the node where the command is run, and running the command on multiple nodes will create a group of definitions for the same server. To avoid missing Tivoli Storage Manager server stanzas on a node you can define all available Tivoli Storage Manager servers to all nodes. The cfgbackupfs command configures the backup to run on a subset group of nodes. To execute the backup of a filesystem on a node the following requirements must be met: The filesystem must be mounted on that node A Tivoli Storage Manager server stanza must have been defined on the node for the target Tivoli Storage Manager server Tivoli Storage Manager server proxy target and agent node definitions need to be in place for that node The interface node mist have network connectivity to the Tivoli Storage Manager server The (green) arrows in Figure 3-38 show the data path for the backups. Data flows from the filesystem to the group of interface nodes defined with the cfgtsmnode and cfgbackupfs commands and to the Tivoli Storage Manager server through the network, we see that groups of nodes can perform backup operations in parallel, for example, backups for filesys#3 are executed by nodes node#2, #3 nd #4. 136
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The network must be available to access the Tivoli Storage Manager servers, as the network is accessed from the interface nodes using network groups, the Tivoli Storage Manager server used to back up a given filesystem must be accessible from the interface nodes where the filesystem is mounted and where the Tivoli Storage Manager server stanza has been defined.

Node grouping and HSM


Data movement to external storage devices in SONAS is managed by and integrated Tivoli Storage Manager HSM client that connects to a Tivoli Storage Manager server to migrate and recall data. As a filesystem can be mounted on and exported by multiple interface nodes the HSM component needs to be installed and active on all these interface nodes, this is because a recall request to a migrated file can be started from any interface node. SONAS uses the cfghsmnode CLI command to configure HSM connectivity to a given Tivoli Storage Manager server on a group of interface nodes as follows: cfghsmnode <TSMserver_alias> <intNode1,intNode2,...,intNodeN> So with HSM, you have a group of nodes that can connect to a Tivoli Storage Manager server as Tivoli Storage Manager HSM clients. HSM is then enabled for a given filesystem using the cfghsmfs command.

3.11.2 Node grouping and async replication


Asynchronous replication processes run on a group of one or more interface nodes that mount the filesystem to be replicated. You use the cfgrepl command to configure asynchronous replication and you specify one or more source-target interface node pairs that will run the replication operation. The number or group of source-destination pairs can be scaled up as the amount of data to replicate grows.

3.12 Summary: SONAS Software


As we close this chapter, we have seen that SONAS Software provides a comprehensive, integrated software functionality stack that includes in one software license, all capabilities required to manage a SONAS system from the very small to the very large.

3.12.1 SONAS features


SONAS Software is designed to provide: A single software license that provides all capabilities required to manage a simple, expandable Storage appliance, including ease of ordering, deployment, and management Centralized Management of all files, single namespace, which provides reduced Administrative costs, faster response time to end users File Placement policies including automation, which provides optimized storage costs and reduced administrative costs No individual chargeable add-on software, which provides reduced TCO, simpler procurement process Automated policy based HSM: HSM, ILM, which provides reduced administrative costs, optimized storage costs

Chapter 3. Software architecture

137

Independent scalability of storage and nodes, which provides simple but flexible configurations tailored to your specific workload characteristics, yet remains flexible and reconfigurable for the future Concurrent access to files from all nodes, distributed token management, and automatic self-tuning and workload balancing, and high availability by the Cluster Manager; these combine to provide very high performance, reduced administrative costs related to migrating hot spot files Storage Pool striping, which provides very high performance, fast access to data High performance metadata scanning across all available resources/nodes, integrated Tivoli Storage Manager clients, which provides ability to perform HSM and automatic tiered storage at high scalability, and well as accelerate faster backups of files Snapshots, Asynchronous Replication, which provide robust data protection and disaster Recovery SONAS Software provides the ability for central management of storage, providing the functionality for a highly automated, extremely flexible, and highly scalable self-managing system. You can start with a small SONAS with less than 100 TB, and continue to seamlessly grow and linearly scale and increased performance, using the SONAS Software to manage scalability at petabytes. SONAS Software supports the full capability of the current SONAS to scale is up to 30 interface nodes and 60 storage nodes. The current largest SONAS storage subsystem, capable of supporting up to 14.4 petabytes of raw storage. One copy of SONAS Software runs on each node of a SONAS. A current maximum SONAS configuration is shown in Figure 3-39.

/hom e /appl /data /web

/home/appl/data/web/ /home/appl/data/web/important_big_spreadsheet.xls important_big_spreadsheet.xls /home/appl/data/web/ /home/appl/data/web/big_architecture_drawing.ppt big_architecture_drawing.ppt /home/appl/data/web/ /home/appl/data/web/unstructured_big_video.mpg unstructured_big_video.mpg

Logical

IBM Scale Out NAS Physical

Global Namespace
Policy Engine
Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Interfac e nodes

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e node s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e no de s

Stor ag e node s

Stor ag e no de s

Stor ag e node s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e node s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e no de s

Stor ag e node s

Stor ag e node s

Stor ag e node s

> scale > out > > scale > out >
Stor ag e no de s Stor ag e no de s

> > > >


etc.. Storage Pool 1 etc.. Storage Pool 2 etc.. Storage Pool 3.. etc.

scale out

Figure 3-39 SONAS Software manages all aspects of a maximum size SONAS

As storage needs continue to grow over time, the SONAS Software is designed to continue to scale out and support even larger configurations, while still maintaining all the storage management and high performance storage characteristics that we discussed in this chapter.

138

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

3.12.2 SONAS goals


In summary, SONAS Software provides the software foundation to meet the goals of IBM SONAS stated here: Unified management of petabytes of storage: Automated tiered storage Centrally managed and deployed Global access to data, from anywhere: Single global namespace Across petabytes of data Based on standard, open architectures: Not proprietary Avoids lock-ins Utilize worldwide open source innovative technology Provides and exceeds todays needed requirements for: Scale-out capacity, performance, global virtual file server Extreme scalability with modular expansion High ROI: Significant cost savings Due to auto-tune, auto-balance, automatic tiered storage Position to exploit the next generation technology: Superb foundation for cloud storage Many other promising applications In the remaining chapters of this book, we continue to explore all aspects of SONAS in more detail.

Chapter 3. Software architecture

139

140

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 4.

Networking considerations
In this chapter we provide information about networking as related to SONAS implementation and configuration. We begin with a brief review of Network Attached Storage concepts and terminology. Following that, we discuss technical networking implementation details for SONAS.

Copyright IBM Corp. 2010. All rights reserved.

141

4.1 Review of network attached storage concepts


In this section we offer a brief review of network attached storage concepts as it pertains to SONAS, for those readers who are more familiar with block I/O SAN-attached storage and terminology.

4.1.1 File systems


A file system is the physical structure an operating system uses to store and organize files on a storage device. To manage how data is laid out on the disk, an operating system adds a hierarchical directory structure. Many file systems have been developed to operate with various operating systems. They reflect various OS requirements and performance assumptions. Certain file systems work well on small computers; others are designed to exploit large, powerful servers. An early PC file system is the File Allocation Table (FAT) file system used by the MS-DOS operating system. Other file systems include the High Performance File System (HPFS), initially developed for IBM OS/2, Windows NT File System (NTFS), Journal File System (JFS) developed for the IBM AIX OS, and General Parallel File System (GPFS), also developed by IBM for AIX. A file system does not work directly with the disk device. Rather, a file system works with abstract logical views of the disk storage. The file system maintains a map of the data on the disk storage. From this map the file system finds space which is available to store the file. The file system also creates metadata (data describing the file) which is used for systems and storage management purposes, and determines access rights to the file. The file system is usually tightly integrated with the operating system. However, in network attached storage, it is physically separated from the OS and distributed to multiple remote platforms. This is to allow a remote file system (or part of a file system) to be accessed as if it were part of a local file system. This is what happens with Network File System (NFS) and Common Internet File System (CIFS).

4.1.2 Redirecting I/O over the network to a NAS device


In the case of network-attached storage, input/output (I/O) is redirected out through the network interface card (NIC) attachment to the network.

Network file protocols


The NIC contains a network protocol driver in firmware, which describes the operations exchanged over the underlying network protocol (such as TCP/IP). Now one of the network file protocols (such as NFS or CIFS) comes into play. The I/O operation is transferred using this network protocol to the remote network attached storage. With Windows operating systems, the file protocol is usually CIFS; with UNIX and Linux, it is usually NFS. Or it might be File Transfer Protocol (FTP). When the remote server, or NAS appliance, receives the redirected I/O, the I/O requests are unbundled from their TCP/IP network protocols. The I/O request is submitted to the NAS appliances operating system, which manages the scheduling of the I/O, and security processes to the local disk. From then on, the I/O is handled as a local I/O. It is routed by the appliances file system, which establishes the files identity and directory, and eventually converts the I/O request to a storage system protocol (that is, a block I/O operation). Finally, the I/O request is routed to the physical storage device itself to satisfy the I/O request.

142

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The receiving NAS system keeps track of the initiating clients details, so that the response can be directed back to the correct network address. The route for the returning I/O follows, more or less, the reverse path outlined previously.

Network file I/O differences from local SAN I/O


One of the key differences of a NAS device, compared to direct attached storage or SAN storage, is that all I/O operations use file-level I/O protocols. The network access methods such as NFS and CIFS can only handle file I/O requests to the remote file system located in the operating system of the NAS device. This is because they have no knowledge of the characteristics of the remote storage device. I/O requests are transferred across the network, and it is the NAS OS file system which converts the request to block I/O and reads or writes the data to the NAS disk storage. Clearly, network file I/O process involves many more steps than storage protocol (block) I/O, and it is this software stack overhead that is a factor in comparing performance of a NAS I/O to a DAS or SAN-attached I/O. An example of network attached storage file I/O is shown in Figure 4-1.

Figure 4-1 Tracing the path of a network file I/O operation

It is important to note that a database application accessing a remote file located on a NAS device, by default, must be configured to run with file system I/O. As we can see from the previous diagram, it cannot use raw I/O to achieve improved performance (that is only possible with locally attached storage).

Chapter 4. Networking considerations

143

4.1.3 Network file system protocols


Network File System (NFS) is a network-based file protocol that is typically used by UNIX and Linux operating systems. NFS is designed to be machine-independent, operating system independent, and transport protocol independent. Common Internet File System (CIFS) is a network-based file protocol that was designed by Microsoft to work on Windows workstations. In the next section, we provide a high-level comparison of NFS and CIFS.

Making file systems available to clients


NFS servers make their file systems available to other systems in the network by exporting directories and files over the network. An NFS client mounts a remote file system from the exported directory location. NFS controls access by giving client-system level user authorization. The assumption is that a user who is authorized to the system must be trustworthy. Although this type of security is adequate for many environments, it can be abused by knowledgeable users who can access a UNIX system through the network. On the other hand, CIFS systems create file shares which are accessible by authorized users. CIFS authorizes users at the server level, and can use Windows domain controllers (Windows Active Directory is a common example) for this purpose. CIFS security can be generally considered to be stronger than NFS in this regard.

Stateless versus stateful


NFS is a stateless service. In other words, NFS it is not aware of the activities of its clients. Any failure in the link will be transparent to both client and server. When the session is re-established the two can immediately continue to work together again. CIFS is session-oriented and stateful. This means that both client and server share a history of what is happening during a session, and they are aware of the activities occurring. If there is a problem, and the session has to be re-initiated, a new authentication process has to be completed.

Security
For directory and file level security, NFS uses UNIX concepts of User, Groups (sets of users sharing a common ID), and Other (meaning no associated ID). For every NFS request, these IDs are checked against the UNIX file systems security. However, even if the IDs do not match, a user can still have access to the files. CIFS, however, uses access control lists that are associated with the shares, directories, and files, and authentication is required for access.

Locking
The locking mechanism principles vary. When a file is in use NFS provides advisory lock information to subsequent access requests. These inform subsequent applications that the file is in use by another application, and for what it is being used. The later applications can decide if they want to abide by the lock request or not. So UNIX or Linux applications can access any file at any time. The system relies on good neighbor responsibility and proper system administration is clearly essential. CIFS, on the other hand, effectively locks the file in use. During a CIFS session, the lock manager has historical information concerning which client has opened the file, for what purpose, and in which sequence. The first access must complete before a second application can access the file. 144
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

4.1.4 Domain Name Server


At the machine level, IP network connections use numeric IP addresses. However, because these addresses are obviously much harder to remember (and manage) than just the name of a system, modern networking uses symbolic host names. For example, instead of typing: http://10.12.7.14 You can type: http://www.ibm.com In this case, the network Domain Name Servers (DNS) handle the mappings between symbolic name (www.ibm.com) and the actual IP addresses. The DNS will take www.ibm.com, and translate that to the IP address 10.12.7.14. By using DNS, a powerful variety of network balancing and management functions can be achieved.

4.1.5 Authentication
SONAS supports the following authentication methods: Microsoft Active Directory LDAP (Lightweight Directory Access Protocol) NIS (Network Information Service) Samba PDC / NT4 mode At the current release level, a single SONAS system can support only one of these authentication methods at a time. In order to access a SONAS system, the user must be authenticated using the authentication method that is implemented on a particular SONAS machine.

4.2 Domain Name Server as used by SONAS


SONAS uses the Domain Name Server (DNS) function to perform round-robin IP address balancing for spreading workload equitably on a IP address basis, across the SONAS interface nodes. As shown in Figure 4-2, when a user requests SONAS.virtual.com, the Domain Name Server (DNS) must have been previously defined with a list of IP addresses that DNS is to balance across the SONAS interface nodes. The user request for SONAS.virtual.com is translated by DNS to a physical IP address, and DNS then allocates that user to the SONAS interface node associated with that IP address. Subsequent requests from other users for SONAS.virtual.com are allocated equitably (on an IP address basis) to other SONAS interface nodes. Each interface node can handle multiple clients on the same IP address; the unique pairing of the client IP address and the SONAS interface node IP address is the determines the connection. See Figure 4-2.

Chapter 4. Networking considerations

145

SONAS.virtual.com

SONAS.virtual.com

Client I Client II Client n DNS Server


(name resolution)

SONAS.virtual.com
10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15 10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15

Figure 4-2 SONAS interface node workload allocation

As shown in Figure 4-2, in SONAS each network client is allocated to one and only one interface node, in order to minimize cluster overhead. SONAS Software does not rotate a single clients workload across interface nodes. That is not only unsupported by DNS or CIFS, but will also decrease performance, as caching and read-ahead is per done per SONAS interface node. At the same time, workload from multiple users, numbering into the thousands or more, is equitably spread across as many SONAS interface nodes as are available. If more user network capacity is required, you simply add more interface nodes. SONAS scale out architecture provides linear scalability as the numbers of users grow. SONAS requires an external server which runs an instance of the domain name server (DNS). Based on using the DNS, SONAS will round robin each incoming request for a file to the next available public IP interface on an available interface node. SONAS serves multiple IP addresses and client gets one of these IP addresses in a round robin manner. If one of the interface node goes down, another interface node starts serving the same IP address.

4.2.1 Domain Name Server configuration best practices


It is best to use multiple public IP addresses per interface node, for better load balancing in the even of an interface node outage. DNS round robin provides the IP address load balancing, while the workload failover is performed internally by SONAS Software. Using the host name in SONAS is best, but you can use IP addresses instead of DNS host name. In this case, clients will be bound to provided static IP address. SONAS offers global namespace, so SONAS clients are using one host name, which is spread across independent interface nodes and logical storage pools. One client is connected to one interface node until reboot or other interruption, there is no IP address caching mechanism for clients on DNS level. Connecting a client simultaneously to multiple interface nodes will decrease

146

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

performance due to cache misses; moreover, this is not supported by DNS and CIFS. When you expand your SONAS system and add new interface nodes, add the new IP addresses to the DNS server and the load will be distributed across newly configured nodes.

4.2.2 Domain Name Server balances incoming workload


In Figure 4-3, you can find steps what are performed to balance SONAS incoming client request.

Figure 4-3 DNS load balancing in SONAS

The external DNS server contains duplicated address records (A records) with various IP addresses. These IP addresses are configured on interface nodes. The name server rotates addresses for the name, which has multiple A records. The diagram in Figure 4-3 lists these DNS load balancing steps: Step 1: First SONAS Client sends request to external DNS Server about sonas.pl.ibm.com IP address. Step 2: DNS Server rotates addresses for the name and provides first available in this moment IP address for the client - 192.168.0.11. Step 3: Client 1 connects to data through Interface Node 1. Step 4: SONAS Client 2 send request to external DNS Server about sonas.pl.ibm.com IP address. Step 5: DNS Server rotates address for the name and provides first available in this moment - it means next - IP address for the client - 192.168.0.12 Step 6: Second client connects to data through Interface Node 2.
Chapter 4. Networking considerations

147

4.2.3 Interface node failover / failback


Interface nodes can be dynamically removed and re-inserted into a cluster. The method of upgrade, or repair of a interface node is to take the interface node out of the cluster. The remaining interface nodes assume the workload. The interface node can then be upgraded or repaired, and then re-inserted into the cluster, and workload will then be automatically rebalanced across the interface nodes in the SONAS. When an interface node is removed from the cluster, or if there is an interface node failure, healthy interface nodes take over the load of the failed node. In this case: The SONAS Software Cluster Manager will automatically terminate old network connections and move the network connections to a healthy interface node. The IP addresses are automatically re-allocated to a healthy interface node: Session and state information that was kept in the Cluster Manager is used to support re-establishment of the session and maintaining IP addresses, ports, and so on. This state and session information and metadata for each user and connection is stored in memory in each node in a high performance clustered design, along with appropriate shared locking and any byte-range locking requests, as well as other information needed to maintain cross-platform coherency between CIFS, NFS, FTP, and HTTP users. Notification technologies are used to tickle the application and cause a reset of the network connection. This process is shown in Figure 4-4.

SONAS.virtual.com

SONAS.virtual.com

Client I Client II Client n DNS Server


(name resolution)

SONAS.virtual.com
10.0.0.13 10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.14 10.0.0.15 10.0.0.10 10.0.0.11 10.0.0.12 10.0.0.13 10.0.0.14 10.0.0.15

Figure 4-4 SONAS interface node failover

148

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

At the time of the failover of the node, if the session or application is not actively in a connection transferring data, the failover can usually be transparent to the client. If the client is transferring data, depending on the protocol and application, the application service failover might be transparent to the client, depending on nature of the application, and depending on what is occurring at the time of the failover. In particular, if the client application, in response to the SONAS failover and SONAS notifications, automatically does a retry of the network connection, then it is possible that the user will not see an interruption of service. Examples of software that do this can include many NFS-based applications, as well as Windows applications that do retries of the network connection, such as the Windows XCOPY utility. If the application does not do automatic network connection retries, or the protocol in question is stateful (that is, CIFS) then a client side reconnection might be necessary to re-establish the session. Unfortunately for most CIFS connections, this will be the likely case. In case of failure of an interface node, all configured IP addresses on the node are taken over and balanced by remaining interface nodes. IP balancing is done by round robin algorithm, so it means that SONAS does not check which node is more loaded in cache or bandwidth aspect. This is illustrated in Figure 4-5. IP addresses configured on interface node 2 are moved by SONAS to interface nodes 1 and 2. It means that from the SONAS client point of view host name and IP address are still the same. Failure of the node is almost transparent for the client and now it accesses data through interface node 3 as indicated by Step 6.

Figure 4-5 Interface node failure - failover and load balancing

Chapter 4. Networking considerations

149

DNS host names: NFS consists of multiple separate services, protocols, and daemons that need to share metadata among each other. If due to client crash, on reboot, the client is redirected to another interface node, there is a remote possibility that the locks might be lost from the client but are still present on the previous interface node, creating problems for connection. Therefore, the use of DNS host names for mounting NFS shares is not supported. In order to balance the load on SONAS, it is best to mount shares using various IP addresses. This is an NFS limitation; for example, CIFS uses only a single session, so DNS host names can be used.

4.3 Bonding
Bonding is a method in which multiple network interfaces are combined to function as one logical bonded interface for redundancy or increased throughput. SONAS network ports can be bonded into two configurations using standard IBM SONAS bonding tools. Before creating a bond interface you have to be sure that no network is assigned to the slaves and there is no active IP address on any of the slaves. When network interfaces are bonded, a new logical interface is created, which consists of slave physical interfaces. The bonded devices can be monitored through the IBM SONAS GUI Topology pages. The MAC address of the bonding device is taken from first added slave device, then the MAC address is passed to all following slaves and remains persistent until the bonding logical device is brought down or deconfigured. The bonding interface has a hardware address of 00:00:00:00:00:00 until the first slave is added.

4.3.1 Bonding modes


Currently SONAS supports the following two bonding modes, both of them do not require any specific configuration of your switch. Mode 1 - active backup configuration: Only one slave in the bond configuration is active at a time. Other slaves become inactive until the active, primary slave fails. To avoid switch confusion, the MAC address is externally visible only on one port. This mode provides fault tolerance. Currently, 10 Gbit Converged Network Adapters (CNAs) in interface nodes for external data connectivity are configured to handle IP over InfiniBand in this mode. Moreover, all internal management Network Interface Cards (NICs) and internal data InfiniBand Host Channel Adapters (HCAs) are configured in SONAS by default in this active backup configuration. It means that all internal SONAS networks share a single IP address and work in hot standby configuration. Mode 6 -adaptive load balancing: The outgoing traffic is redistributed between all slaves working in bond configuration according to the current load on each slave. The receive load balancing is achieved through ARP negotiation.The receive load is redistributed using a round robin algorithm among the group of slaves in the bond. Effectively, this configuration combines bandwidth into a single connection, so provides fault tolerance and load balancing. Currently by default 1 Gb Network Interface Cards (NIC) in Interface Nodes for external data connectivity are configured in this mode. Important: The current SONAS version does not support bonding network interfaces in management nodes for external administrator connectivity. This means that in case of failure of the link, the administrative IP address will not be moved automatically to a backup interface. In this case, SONAS will be serving data for clients, but the SONAS administrator will not be able to reconfigure the SONAS system through GUI and CLI.

150

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

4.3.2 Monitoring bonded ports


SONAS uses a tool for monitoring bonded ports. The tool periodically checks the carrier state of each slave and in case of failure, SONAS marks the device as down and takes appropriate actions. The tool monitors links from devices to the nearest connected switch, so it is important to understand that this tool cannot detect network failure if it occurred beyond the nearest switch or if a switch is refusing to pass traffic while still maintaining carrier on. This issue is especially important for external data networks. SONAS internally always uses two switch configurations, so in case of failure internal single link or a switch SONAS is still up and running. When you are planning external data connectivity, it is best to assure multi-switch configuration to SONAS.

4.4 Network groups


A network group is a set of SONAS interface nodes that use the same network configuration. You can separate traffic between SONAS and external clients by using network groups. To do that, create a new global name space in DNS with its own IP address ranges, which contain only interface nodes belonging to the network group. You can use various physical adapters or various VLAN IDs. The network group concept is shown in Figure 4-6.

`
SONAS Client 1

1
sonas1.pl.ibm.com ?

DNS Server

2 3
10.0.0.3

10.0.0.3 sonas2.pl.ibm.com. IN A 10.0.0.1 sonas2.pl.ibm.com. IN A 10.0.0.2

NETWORK GROUP 1

NETWORK GROUP 2

Interface node1 10.0.0.3 10.0.0.4

Interface node2 10.0.0.5 10.0.0.6

Interface node3 10.0.0.1 10.0.0.2

sonas1.pl.ibm.com. sonas1.pl.ibm.com. sonas1.pl.ibm.com. sonas1.pl.ibm.com.

IN IN IN IN

A A A A

10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.6

Figure 4-6 Network group concept in SONAS

Chapter 4. Networking considerations

151

In case of failure of an interface node in the network group, IP addresses will be taken over only by the remaining interface nodes in this network group. This is shown in Figure 4-7.

`
SONAS Client 1

1
sonas1.pl.ibm.com ?

2 3
10.0.0.3

10.0.0.3

DNS Server

sonas2.pl.ibm.com. IN A 10.0.0.1 sonas2.pl.ibm.com. IN A 10.0.0.2

NETWORK GROUP 1 Interface node2 Interface node1 10.0.0.5 10.0.0.6 10.0.0.3 10.0.0.4

NETWORK GROUP 2 sonas1.pl.ibm.com. sonas1.pl.ibm.com. sonas1.pl.ibm.com. sonas1.pl.ibm.com. IN IN IN IN A A A A 10.0.0.3 10.0.0.4 10.0.0.5 10.0.0.6

Interface node3 10.0.0.1 10.0.0.2

Figure 4-7 Failure a node in network group

This concept can be useful to separate traffic between the production and test environment, or between two applications. It is important to understand that you can separate only network traffic, and cannot separate internal data traffic. All interface nodes have access to all exports and file systems data is accessible by interface nodes through all storage pods. To limit data placement, you can use policies as described in SONAS: Using the central policy engine and automatic tiered storage on page 107, but still it might be impossible to effectively separate traffic between two environments. You can limit or separate only network traffic to/from SONAS front-end (interface nodes). All data can be written/read to/from all storage pods, according the logical storage pool configuration and policy engine rules that are in effect. By default, a single group will contain all interface nodes; that group is called the default network group. You are allowed to configure and add nodes to custom network groups only when these nodes are detached from the default network group. It is not possible to configure a node in both default and custom network groups. It is not possible to remove the default network group, but it can be empty.

152

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

4.5 Implementation networking considerations


In this section we discuss network considerations when implementing your SONAS system.

4.5.1 Network interface names


In the SONAS installation process, network interfaces are created. They have preconfigured names, and when you create a new bond configuration, SONAS creates new predefined names for the new interfaces. In SONAS you can find the following interfaces: ethX0...ethXn - bonded interfaces for public network ethXsl0...ethsln - slave interfaces for public network ethXn.vlanid - VLAN interface created on top a n interface eth0...ethn - interfaces without bond configuration mgmt0...mgmtn - bonded interfaces for management network mgmtsl0...mgmtsln - slave interfaces for management network data0...datan - bonded interfaces of InfiniBand network ib0...ibn - slave interfaces of InfiniBand network

4.5.2 Virtual Local Area Networks


In SONAS it is possible to configure Virtual Local Area Networks (VLANs). Packet tagging is supported on the protocol level. This means that VLAN interfaces must be added on top of a bonding or a physical interface: VLAN trunking is supported. Multiple networks, including VLAN IDs, can be defined in SONAS, and networks can be assigned to physical ports in an n:n relationship. Overlapping VLANs are supported. Multiple VLANs can be assigned to a single adapter A VLAN ID must be in the range from 1 to 4095. You can create many VLAN IDs and logically separate your traffic to and from SONAS cluster. The VLAN concept can be useful with Network Group configurations. With VLAN, you can separate traffic between SONAS Network Groups and external clients. SONAS CLI commands can define various aggregates such as networks, port groups, and VLANs. These can be changed with single commands and mapped to each other. Assure that all IP addresses which belong to the VLAN id can communicate with your external clients. In addition, VLAN tagging can be used for backup your SONAS.

4.5.3 IP address ranges for internal connectivity


SONAS is preconfigured to use the following IP address ranges for internal connectivity: 172.31.*.* 192.168.*.* 10.254.*.* You can choose the range during SONAS installation. The range you select must not conflict with the IP addresses used for the customer Ethernet connections to the management nodes and interface nodes (see Planning IP addresses on page 267).

Chapter 4. Networking considerations

153

4.5.4 Use of Network Address Translation


When a node becomes unhealthy, all public IP addresses are withdrawn by SONAS and thus all routes disappear. This might lead to a situation where the node might not be able to become healthy again, if a required service (such as a winbind) requires a route to an external server (for example, Active Directory). To address this issue, one solution is to have static public addresses assigned to each node, thus allowing a node to always be able to route traffic to the external network. This is the most simple solution, but it uses up a large number of additional IP addresses. A more sophisticated solution, and the one used in SONAS, is to use the Network Address Translation (NAT). In this mode, only one additional external IP address is needed. One of the nodes in the SONAS cluster is elected to be hosting this IP address, so it can reach the external services; this node is called the NAT Gateway. All other nodes, if their external IP address is not accessible, route data through the NAT Gateway node to external networks for authentication and authorization purposes. In this way, Network Address Translation (NAT) is used in SONAS to remove the need for external access from the internal private SONAS network. NAT (a technique used with network routers) is used to allow a single external IP address to be mapped to one or more private network IP address ranges. In SONAS, NAT is used to allow a single customer IP address to be used to access the Management Node and interface nodes in the internal private network IP addresses. To the external network, we define SONAS with IP addresses on the external customer network. These addresses are mapped to the internal private network addresses, and through the network address translation, authorized external users than can gain access to the Management Node(s) and Interface Nodes in the internal network. A network router is configured to translate the external IP address and port on the customer network to a corresponding IP address and port on the internal SONAS private network. Only one IP address is used in whole SONAS cluster on the customer network, while the port is used to specify the various servers accessed using that single IP address. The SONAS Management Node and Interface Nodes are assigned their private IP addresses on the internal SONAS network during SONAS installation. Note that this external IP address is not a Data Path connection; it is not used to transfer data from and to Interface Nodes. Rather, this external IP address is used to provide a path from the Management Node and Interface Node to the customer network, for authentication and authorization purposes. Even if a node has its Data Path ports disabled (example: an Interface Node with a hardware problem can have its Data Path ports disabled under the control of the software), the node can still access the Active Directory or LDAP server for authentication / authorization. This mechanism assures that in case of a hardware problem or an administrative mistake, if a required service (such as a winbind) requires a route to an external server (for example, Active Directory), a route is still available. This provides additional assurance that SONAS is still able to be working in a healthy state, and data clients will not be affected.

4.5.5 Management node as NTP server


In SONAS, all nodes are configured by default to use the management node as Network Time Protocol (NTP) Server - this assures that all SONAS nodes are operating on a common time. You can assign an external NTP server on the management node to propagate configuration on the whole SONAS cluster. 154
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

4.5.6 Maximum transmission unit


A default of 1500 bytes is used for the maximum transmission unit (MTU). MTU sizes can be configured, and jumbo frames are supported with the 10 GbE ports (users can configure MTUs up to 9000). In case of a VLAN interfaces, we subtract 4 bytes from the MTU of the corresponding regular interfaces to compensate for the additional 48 bit VLAN tag.

4.5.7 Considerations and restrictions


Note that SONAS does not yet support the following networking functions, but this is not an all-inclusive list. These are known requirements for SONAS: IP v6 NFS v4 LDAP signing

4.6 The impact of network latency on throughput


Network latency can impact throughput negatively, and the impact will be higher, the higher the bandwidth of the network link, so adverse latency effects will be felt more with 10 GigE links than with 1 GigE links. The effect discussed here is true for a single client sending requests to a file server such as the SONAS server. Figure 4-8 illustrates a typical IO request from an application client to the file server. We have the following times: t_lat t1 t2 t3 The time spent in network latency getting the request from the application client to the file server The time to transfer the request over the network The latency inside the file server, 0 in our example The time to transfer the response back to the server

Figure 4-8 Schematic of an IO request with latency

So the total time taken for an IO request is given by the sum of t_lat, t1, t2 and t3, we call this sum t_sum. Figure 4-9 shows the time it takes to transfer requests and responses over the network links, for example, a 61140 byte response will require 0.457764 msec over a 1GigE link, that can transfer 134217728 bytes/second, and 10 times less or 0.045776 msec on a 10GigE link.

Size bytes IO type 117 t1 request 61440 t3 response

1GigE ms/req 10GigE ms/req 134217728 1342177280 0.000872 0.000087 0.457764 0.045776

Figure 4-9 request time on network link

Chapter 4. Networking considerations

155

The faster the request transfer time over the link, the more requests (such as requests/sec or IO/sec) you can get over the link per unit of time, and consequently the greater the amount of data that can be transferred over the link per unit of time (such as MB/sec). Now introduce network latency into the equation; each IO will be delayed by a a given amount of latency milliseconds, t_lat, and so each request from the application client will have periods of data transfer, t1and t3, and idle periods measured by t_lat. During the t_lat periods the network bandwidth is not used by the application client and so it is effectively wasted, the bandwidth really available to the application client will thus be diminished by the sum of the idle periods. The table shown in Figure 4-10 calculates how the reduction of effective bandwidth is correlated with increasing network latency, and how this changes over 1 GigE and 10 GbitE links. The last four lines show a 10 GigE link, with latency - t_lat - increasing 0 to 0.001, 0.01 and 0.1 msec, t1 and t3 msec are the times spent on the network link, function of bandwidth or bytes/sec and t2, the internal latency in the server is assumed to be zero. The t_sum value is the sum of t_lat+t1+t2+t3 representing the request response time. So, for the 10 GigE case with 0.01 msec l_lat we have a response time t_sum of 0.055864 msec and so we can drive 17901 IO/sec. Each IO transfers 117 bytes request plus 61440 bytes response in total or 61557 bytes in total, and at 17901 IO/sec, we can drive a throughput of 61557 x 17901 or 1051 MB/sec (tot). Considering only the effective data transferred back to the server, 61440 bytes per IO, we also have 61440 x 17901 or 1049 MB/sec.

1g 1g 1g 1g 10g 10g 10g 10g

0 0.001 0.01 0.1 0 0.001 0.01 0.1

0.000872 0.000872 0.000872 0.000872 0.000087 0.000087 0.000087 0.000087

0 0 0 0 0 0 0 0

0.457764 0.457764 0.457764 0.457764 0.045776 0.045776 0.045776 0.045776

0.458635 2180 0.459635 2176 0.468635 2134 0.558635 1790 0.045864 21804 0.046864 21339 0.055864 17901 0.145864 6856

128 128 125 105 1280 1253 1051 402

128 127 125 105 1278 1250 1049 402

Figure 4-10 Latency to throughput correlation

We can see that with a latency value of 0 on a 10 GigE link we can get a throughput of 1278MB/sec and adding in a network latency of 0.1 msec we get a a throughput of 402MB/sec that represents a 69% reduction in effective bandwidth. This reduction might appear surprising given the theoretical bandwidth available.

156

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

MB/sec (resp)

MB/sec (tot)

Nework link

t_lat ms

IO/sec

t_sum

t1 ms

t2 ms

t3 ms

The charts in Figure 4-11 show how bandwidth decreases for a single client accessing a server as latency increases: The first chart shows that the drop is much greater at higher bandwidth values: The 10 G MB/s line drops much more sharply as latency increases than does the 1 G MB/sec line. This means that the adverse effect of latency is more pronounced, the greater the link bandwidth. The second chart shows the effect of latency on a workload with a smaller blocksize or request size: 30720 bytes instead of 61440 bytes. The chart shows that at 0.1 msec latency, the throughput drops to just over 200 MB/sec with a 30720 byte response size instead of the 400 MB/sec that we get with the same latency of 0.1 msec, but a request size of 61440 bytes.
1400

1G MB/s
1200 Throughput [MB/sec] 1000 800 600 400 200 0 0 1400 0.001 latency [msec] 0.01

10G M B/s

Effect of network latency on throughput: 117 byte requests and 61440 byte response s

0.1

1G MB/s
1200 Throughput [MB/sec] 1000 800 600 400 200 0 0 0.001 latency [msec] 0.01

10G M B/s

Effect of network latency on throughput: 117 byte requests and 30720 byte response s

0.1

4 Figure 4-11 Effect of latency on network throughput

To summarize, evaluate your network latency to understand the effect that it can have on expected throughput for single client applications. Latency has a greater impact with larger network bandwidth links and smaller request sizes. These adverse effects can be offset by having multiple various clients access the server in parallel so they can take advantage of the unused bandwidth.

Chapter 4. Networking considerations

157

158

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 5.

SONAS policies
In this chapter we provide information about how you can create and use SONAS policies. We discuss the following topics: Creating and managing policies Policy command line syntax Policy rules and best practices Sample policy creation walkthrough

Copyright IBM Corp. 2010. All rights reserved.

159

5.1 Creating and managing policies


In this section we discuss what policies and rules consist of, including examples of policies and rules, and we discuss the SONAS commands that manage policies and rules. We illustrate how to create a storage pool and extend a filesystem to use the storage pool. We then show how to create and apply data allocation policies.

5.1.1 File policy types


File placement policies for a filesystem are set using the setpolicy command evaluated when a file is created. If no file placement rule is in place, GPFS will store data on the system pool also called pool1. File management policies are used to control the space utilization of online storage pools. They can be tied to file attributes such as age and size and also to pool utilization thresholds. The file management rules are evaluated periodically when the runpolicy command is executed or when a task scheduled with the mkpolicytask is executed.

5.1.2 Rule overview


A policy consists in a list of one or more policy rules. Each policy rule, or rule for short, is an SQL-like statement that instructs SONAS GPFS what to do with a file in a specific storage pool if the file meets specific criteria. A rule can apply to a single file, a fileset or a whole filesystem. A rule specifies conditions that, when true, apply the action stated in the rule. A sample file placement rule statement to put all text files in pool2 looks like this: RULE textfiles SET POOL pool2 WHERE UPPER(name) LIKE %.TXT A rule can specify many various types of conditions, for example: File creation, access or modification date and time Date and time when rule is evaluated Fileset name File name and extension File size and attributes such as user and group IDs

5.1.3 Rule types


SONAS supports eight kinds of rules: File placement rule File migration rule File deletion rule File exclusion rule File list rule File restore rule External list definition rule Controls allocation pool of new files Controls file movement between pools Controls file deletion Excludes files from placement in a pool Generates list of file that match a criteria Controls where to restore files Creates list of files

External storage pool definition rule Creates list of files for Tivoli Storage Manager server

160

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Rules must adhere to a specific syntax as documented in the Managing Policies chapter of the IBM Scale Out Network Attached Storage Administrators Guide, GA32-0713. This syntax is similar to the SQL language because it contains statements such as WHEN (TimeBooleanExpression) and WHERE SqlExpression. Rules also contain SQL expression clauses that allow you to reference various file attributes as SQL variables and combine them with SQL functions and operators. Depending on the clause, an SQL expression must evaluate to either true or false, a numeric value, or a character string. Not all file attributes are available to all rules.

5.1.4 SCAN engine


Pool selection and error checking for file-placement policies is performed in the following phases: When you install a new policy, the basic syntax of all the rules in the policy will be checked. Also all references to storage pools will be checked. If a rule in the policy refers to a storage pool that does not exist, the policy is not installed and an error is returned. When a new file is created, the rules in the active policy are evaluated in order. If an error is detected, an error will be written to the log, all subsequent rules will be skipped, and an EINVAL error code will be returned to the application. Otherwise, the first applicable rule is used to store the file data. File management policies are executed and evaluated by the runpolicy command. A sample file management template policy is shown in Example 5-1. This policy will migrate files from the silver storage pool to the HSM storage pool if the silver pool is more than 90% full and will stop at 70% pool utilization. It will exclude migrated files and predefined excludes from the migration and perform migration in the order established by the weight expression.
Example 5-1 SONAS HSM template policy
[root@plasma]# lspolicy -P TEMPLATE-HSM Policy Name Declaration Name Default Declarations TEMPLATE-HSM stub_size N define(stub_size,0) TEMPLATE-HSM is_premigrated N define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size)) TEMPLATE-HSM is_migrated N define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size)) TEMPLATE-HSM access_age N define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))) TEMPLATE-HSM mb_allocated N define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024))) TEMPLATE-HSM exclude_list N define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE '%/.ctdb/%')) TEMPLATE-HSM weight_expression N define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END)) TEMPLATE-HSM hsmexternalpool N RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC' TEMPLATE-HSM hsmcandidatesList N RULE 'hsmcandidatesList' EXTERNAL LIST 'candidatesList' EXEC 'HSMLIST' TEMPLATE-HSM systemtotape N RULE 'systemtotape' MIGRATE FROM POOL 'silver' THRESHOLD(80,70) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated)

Chapter 5. SONAS policies

161

A file can be a potential candidate for only one migration or deletion operation during one runpolicy run; only one action will be performed. The SONAS runpolicy command uses the SONAS scan engine to determine the files on which to apply specific actions. The SONAS scan engine is based on the GPFS mmapplypolicy command in the background, and mmapplypolicy runs in three phases.

mmapplypolicy command phases: Phase one


Phase one selects candidate files. All files in the selected filesystem device are scanned and all policy rules are evaluated in order for each file. Files are either excluded or made candidates for migration or deletion, and each candidate file is assigned a weight or priority. Thresholds are also determined and all the candidate files are sent as input to the next phase.

mmapplypolicy command phases: Phase two


Phase two chooses and schedules files. It takes the output of phase one and orders it so that candidates with higher weights are chosen before those with lower weights. Files are grouped into batches for processing, generally according to weight, and the process is repeated until threshold objectives are met or until the file list is finished. generally files are not chosen in this phase after the occupancy level of the source pool falls below the low threshold or when the occupancy of the target pool is above the limit or 99% of total capacity.

mmapplypolicy command phases: Phase three


Phase three performs the actual file migration and deletion: the candidate files that were chosen and scheduled by the second phase are migrated or deleted, each according to its applicable rule. For migrations, if the applicable rule had a REPLICATE clause, the replication factors are also adjusted accordingly. It is also possible for the source and destination pools to be the same because it can be used to adjust the replication factors of files without necessarily moving them from one pool to another.

5.1.5 Threshold implementation


For SONAS R1.1.1, the threshold implementation is a single policy per filesystem. This means that a single threshold is managed at a time. For example, in a Pool1 Pool2 Pool3 the setup data can transfer from Pool1 Pool2, causing Pool2 to exceed its threshold. In that case, Pool2 will not start transferring data to Pool3 until Pool1 has finished its transfer to Pool2. Sufficient headroom must be provided in secondary and lower tiers of storage to deal with this delayed threshold management. Policies: The policy rules, examples, and tips section in the IBM Scale Out Network Attached Storage Administrators Guide, GA32-0713 contains good advice and tips on how to get started in writing policies and rules.

5.2 SONAS CLI policy commands


The SONAS CLI has multiple commands to create and manage policies. Policies are created using the mkpolicy and mkpolicyrule commands.

162

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 5-1 shows the CLI policy commands and their interaction.

SONAS db mkpolicy create policy chpolicy change policy rmpolicy remove policy policy1 rule1 rule2 policy7 rule3 rule4

lspolicy list policies 1-all defined in db or 2-specific policy details 3-applied to all filesys SONAS cluster setpolicy apply policy to fs for new files runpolicy execute policy on fs for existing files cron Filesys44 when to run default applied filesys policy? filesys22 policy1

filesys44 policy1

mkpolicytask schedule policy rmpolicytask remove schedule policy

Figure 5-1 CLI policy commands and their interaction

5.2.1 mkpolicy command


The mkpolicy command creates a new policy template with a name and a list of one or more rules, the policy and rules are stored in the SONAS management database and a validation of the rules is not performed at this time. The command is invoked as follows: mkpolicy policyName [-CP <policyName> | -R <rules>] [-D] The policy has a name and a set of rules specified with the -R switch. The -D switch sets the default policy for a filesystem. Optionally a policy can be created by copying an existing policy or a predefined policy template with the mkpolicy command and the -CP oldpolicy option. The policy will be later applied to a SONAS filesystem. The rules for a policy must be entered as a single string and separated by semicolons and there must be no leading or trailing blanks surrounding the semicolon(s). This can be accomplished one of two various ways: The first method is to enter the rule as a single long string mkpolicy ilmtest -R "RULE 'gtktosilver' SET POOL 'silver' WHERE NAME LIKE '%gtk %';RULE 'ftktosystem' SET POOL 'system' WHERE NAME LIKE '%ftk%';RULE 'default' SET POOL 'system'" The second method uses the Linux line continuation character (backslash) to enter rules. mkpolicy ilmtest -R "\ > RULE 'gtktosilver' SET POOL 'silver' WHERE NAME LIKE '%gtk%';\ > RULE 'ftktosystem' SET POOL 'system' WHERE NAME LIKE '%ftk%';\ > RULE 'default' SET POOL 'system'" Here we show sample uses of the mkpolicy command: Create a policy with the name test with two rules assigned. mkpolicy test -R "set pool 'system';DELETE WHERE NAME LIKE '%temp%'"
Chapter 5. SONAS policies

163

Create a policy with the name test_copy as a copy of the existing policy test mkpolicy test_copy -CP test Create a policy with the name default with two rules assigned and marks it as the default policy mkpolicy default -R "set pool 'system';DELETE WHERE NAME LIKE '%temp%'" -D

5.2.2 Changing policies using chpolicy command


The chpolicy command modifies an existing policy by adding, appending or deleting rules and the rmpolicy can remove a policy from the SONAS database but it does not remove a policy from a filesystem. The chkpolicy command allows you to check policy syntax and to test the policy as follows: chkpolicy device [-c <cluster name or id>] -P <policyName> [-T]

Where <device> specifies the filesystem and <policyName> the policy contained in the database to be tested. Without the -T option, the policy will only be checked for correctness against the file system. Using the -T option will do a test run of the policy, outputting the result of applying the policy to the file system and showing which files will be migrated, as shown in Example 5-2.
Example 5-2 Checking policies for correctness [root@plasma.mgmt001st001 ~]# chkpolicy gpfs0 -P HSM_external -T ... WEIGHT(inf) MIGRATE /ibm/gpfs0/mike/fset2/sonaspb26/wv_4k/dir1/test184/f937.blt TO POOL hsm SHOW() ... [I] GPFS Policy Decisions and File Choice Totals: Chose to migrate 311667184KB: 558034 of 558039 candidates; Chose to premigrate 0KB: 0 candidates; Already co-managed 0KB: 5 candidates; Chose to delete 0KB: 0 of 0 candidates; Chose to list 0KB: 0 of 0 candidates; 0KB of chosen data is illplaced or illreplicated; Predicted Data Pool Utilization in KB and %: silver 4608 6694109184 0.000069% system 46334172 6694109184 6.921624% EFSSG1000I Command successfully completed.

5.2.3 Listing policies using the lspolicy command


Multiple named policies can be stored in the SONAS database. Policies can be listed with the lspolicy command. Using lspolicy without arguments returns the name of all the policies stored in the SONAS database. Specifying -P policyname lists all the rules in a policy and specifying lspolicy -A lists filesystems with applied policies. Example 5-3 shows examples of the list command:
Example 5-3 Listing policies
[root@plasma.mgmt001st001 ~]# lspolicy Policy Name Declarations (define/RULE) TEMPLATE-HSM stub_size,is_migrated,access_age,weight_expression,hsmexternalpool,hsmcandidatesList,systemtotape TEMPLATE-ILM stub_size,is_premigrated,is_migrated,access_age,mb_allocated,exclude_list,weight_expression gtkilmhack gtktosilver,ftktosystem,default gtkpolicyhsm stub_size,is_premigrated,is_migrated gtkpolicyhsm_flushat2000 stub_size,is_premigrated,is_migrated,access_age

164

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

[root@plasma]# lspolicy -P TEMPLATE-HSM Policy Name Declaration Name Default Declarations TEMPLATE-HSM stub_size N define(stub_size,0) TEMPLATE-HSM is_premigrated N define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size)) TEMPLATE-HSM is_migrated N define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size)) TEMPLATE-HSM access_age N define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))) TEMPLATE-HSM mb_allocated N define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024))) TEMPLATE-HSM exclude_list N define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE '%/.ctdb/%')) TEMPLATE-HSM weight_expression N define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END)) TEMPLATE-HSM hsmexternalpool N RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC' TEMPLATE-HSM hsmcandidatesList N RULE 'hsmcandidatesList' EXTERNAL LIST 'candidatesList' EXEC 'HSMLIST' TEMPLATE-HSM systemtotape N RULE 'systemtotape' MIGRATE FROM POOL 'silver' THRESHOLD(80,70) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated) [root@plasma.mgmt001st001 ~]# lspolicy -A Cluster Device Policy Set Name Policies Applied Time Who applied it? plasma.storage.tucson.ibm.com testsas gtkpolicyhsm_flushat_4_12_20hr gtkpolicyhsm_flushat_4_12_20hr 4/26/10 11:17 PM root

5.2.4 Applying policies using the setpolicy command


A named policy stored in the SONAS database can be applied to a filesystem using the setpolicy command. Policies set with the setpolicy command become the active policy for a filesystem. The active policy controls the allocation and placement of new files in the filesystem. The setpolicy -D command can also be used to remove an active policy for a filesystem.

5.2.5 Running policies using runpolicy command


The runpolicy command executes or runs a policy on a filesystem. Either the default policy, the one set on the filesystem using the setpolicy command, can be run by specifying the -D option, or another policy stored in the SONAS database can be run by specifying the -P option. The runpolicy command executes migration and deletion rules.

5.2.6 Creating policies using mkpolicytask command


The mkpolicytask command creates a SONAS cron job, a scheduled operation, which applies the currently applied policy on a filesystem at a specified time. The mkpolicytask command takes the filesystem as an argument. To remove scheduled policy tasks from a filesystem you can use the rmpolicytask command with filesystem as the argument.

5.3 SONAS policy best practices


We now introduce policy considerations and best practices to keep in mind when developing and coding SONAS policies.

5.3.1 Cron job considerations


Here we analyze a sample scenario assuming that the filesystem has already been set up for ILM with tiered or peered pools, with or without HSM. The cron jobs here are used to move data between storage tiers. In the case, Pool1 to Pool2 (tiered) to Pool3 (Tivoli Storage Manager HSM), a cron job is typically used on either the Pool1 to Pool2 or the Pool2 to Pool3 migration. In the peered case, where we have Pool1 and Pool2 (peered)) that then migrate to Pool3 (Tivoli Storage Manager HSM), a cron job is typically used on the Pool1 to Pool3 and Pool2 to Pool3 migration.

Chapter 5. SONAS policies

165

A typical use case for a cron job is to transfer large amounts of data at known periods of low activity so that the migration thresholds set in the filesystem policy are rarely activated. If the files being transferred are going to external Tivoli Storage Manager storage and will be accessed at a later time the files can be premigrated by the cron job, otherwise they can be migrated by the cron job. Now assume that the filesystem is a single pool called Pool1 and migrates data to an external Tivoli Storage Manager pool called Pool3. The thresholds for this pool are 80,75 so that if the filesystem is over 80% full, then HSM will migrate data until the pool is 75% full. Assume for discussion a usage pattern that is heavy write activity from 8AM to 12PM, then heavy mixed activity (reads and writes) from 12PM to 6PM, then activity tapers off, and the system is essentially idle at 10PM. With normal threshold processing, the 80% threshold is most likely to be hit between 8AM and 12PM when Pool1 is receiving the most new data. Hitting this threshold will cause the filesystem to respond by migrating data to Pool3. The read activity associated with this migration will compete with the current host activity, slowing down the host jobs and lengthening the host processing window. If the daily write activity consisted of 10-20% of the size of the disk pool, migration will not be required during the host window if the pool started at 80%-20%=60% full. A 5% margin might be reasonable to ensure that the threshold is never hit in normal circumstances. A reasonable cron job for this system is to have a migration policy set for 10PM that has a migration threshold of 60,55 so if the filesystem is over 60% full, migrate to 55%. In addition a cron job must be registered to trigger the policy at 10PM. The cron job will activate the policy that is currently active on the filesystem. The policy will need to include two migration clauses to implement these rules, a standard threshold migration rule using threshold 80,75: RULE defaultmig MIGRATE FROM POOL 'system' THRESHOLD (80,75) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated) And a specific 10PM migration rule using threshold 60,55: RULE deepmig MIGRATE FROM POOL 'system' THRESHOLD (60,55) WEIGHT(weight_expression) TO POOL 'hsm' WHERE NOT (exclude_list) AND NOT (is_migrated) AND HOUR(CURRENT_TIMESTAMP)=22 This scenario has an issue, SONAS filesystem will use the lowest of the two configured thresholds to trigger its lowDiskSpace event: RULE defaultmig MIGRATE FROM POOL 'system' THRESHOLD (80,75) RULE deepmig MIGRATE FROM POOL 'system' THRESHOLD (60,55) In this case the SONAS filesystem will trigger a policy scan at 60% and this operation will happen every 2 minutes, and generally it will not be at 10PM. The scan will traverse all files in the filesystem and, as it is not 10PM, it will not find any candidates but it will create al lot of wasted metadata activity, the policy will work, just burn lots of CPU and disk IOPs. How can this behavior be avoided? There are two solutions, either avoid threshold in the cron job call or use backup coupled with HSM. To avoid the threshold, consider your storage usage, determine a time period that accomplishes your goal without using a threshold, for example, using a rule that states migrate all files that have not been accessed in the last 3 days using a statement like this: (DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)) > 2 166
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

This method has the advantage that it avoids threshold spin but has the disadvantage that it cannot premigrate files. To avoid the current cron job current limitation that only allows you to run the active filesystem policy, the one that was put there with setpolicy, you can use an external scheduler to execute a SONAS command using ssh and do a runpolicy <mySpecificPolicy> using a command similar to the following example: ssh <SONASuser@mgmtnode.customerdomain.com> runpolicy <mySpecificPolicy>

5.3.2 Policy rules


We now illustrate the mechanics of SONAS policy rules, explain the rule syntax and give rule coding best practices. For a detailed explanation on SONAS policy rules, see Chapter 2., Information Lifecycle Management for GPFS in the GPFS Advanced Administration Guide Version 3 Release 3, SC23-5182. You can start creating policies from the SONAS supplied templates called TEMPLATE-ILM and TEMPLATE-HSM. You can list the HSM template using the following command: lspolicy -P TEMPLATE-HSM You will see a policy similar to that shown in Figure 5-2. Make sure that you use HSMEXEC, HSMLIST statements as coded in the templates and ensure that you keep the file exclusion rules that were in the sample policy.
define(stub_size,0) define(is_premigrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED > stub_size)) define(is_migrated,(MISC_ATTRIBUTES LIKE '%M%' AND KB_ALLOCATED == stub_size)) Keep these define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME))) rules/defines define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024))) define(exclude_list,(PATH_NAME LIKE '%/.SpaceMan/%' OR Modify this NAME LIKE '%dsmerror.log%' OR PATH_NAME LIKE '%/.ctdb/%')) Weight define(weight_expression,(CASE WHEN access_age < 1 THEN 0 Expression WHEN mb_allocated < 1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END)) RULE 'hsmexternalpool' EXTERNAL POOL 'hsm' EXEC 'HSMEXEC' Tweak these RULE 'hsmcandidatesList' EXTERNAL POOL 'candidatesList' EXEC 'HSMLIST' thresholds RULE 'systemtotape' MIGRATE FROM POOL 'silver' THRESHOLD(80,70) Keep this WEIGHT(weight_expression) TO POOL 'hsm' clause WHERE NOT (exclude_list) AND NOT (is_migrated) Add a default RULE 'default' set pool 'system'

Placement rule

Figure 5-2 Sample policy syntax constructs

Placement rules
All SONAS policies must end with a default placement rule. If you are running with HSM, consider using default = system to set the system pool as default. Data will be probably configured to cascade, so put most of the data in the fastest pool, then let it cascade through tiers, using the following statement: RULE 'default' set pool 'system'

Chapter 5. SONAS policies

167

If you are running with ILM and tiered storage, consider default = pool2 where pool2 is a slower pool. The files default to slower pool and select files for faster pool explicitly. That way, if you forget a filter, it goes into the slower, and hopefully larger, pool. Use a statement such as this one: RULE 'default' set pool 'pool2' Remember placement rules only apply to files created after placement rule is applied and that placement rules do not affect recalled files as they will return to the pool they migrated from.

Macro defines
Policies can be coded using defines, also called macro defines. These are essentially named variables used to make rules easier to read. For example, the statement creates a define named mb_allocated and sets it to the size of the file in MB. define(mb_allocated,(INTEGER(KB_ALLOCATED / 1024))) Defines offer a convenient way to encapsulate weight expressions so as to provide common definitions across the policy. These are typical common exclusions: special file migration exclusion definition: Always use this when migrating. migrated file migration exclusion definition: Always use this when migrating.

Summary
A policy is a set of rules; macros can be used to make rules easier to read. Rules determine what the policy does, and the first rule matched applies to a file, so order will matter. There are two major types of rules: placement rules determine what pool a file is placed in when it first appears in the filesystem, and migration rules specify the conditions under which a file that exists in the filesystem is moved to another pool. Migration policies must include the special file exclusion clause and migrated exclusion clause.

5.3.3 Peered policies


Peered policies contain placement rules only. Defines are generally not be required for peered ILM policies. Placement rules select files by user defined criterion or policy, for example: RULE 'P1' set pool 'system' where upper(name) like '%SO%' RULE 'P1' set pool 'system' where upper(name) like '%TOTALLY%' Peered pools must contain a default placement rule, that by default puts files in the lower performance pool, and then select groups of files using rules for placement into the higher performance pool. For example: RULE 'default' set pool 'slowpool'

5.3.4 Tiered policies


Tiered policies contain both migration rules and optional placement rules. This type of policy requires the defines contained in the sample TEMPLATE-ILM policy. You can also encapsulate weight expression as a define. Optional placement rules select files by policy. Here we list best practices for migration rules: Make sure at least one threshold exists as a Safety Net, even if using other rules. Include exclusion clauses for migrated and special files in migration rules even if not using HSM, so they can be added later.

168

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Non-threshold migration will need an associated cron job to trigger it, as discussed later for migration filters. The policy is terminated by the default placement rule: RULE 'default' set pool 'system' We used a default of a higher performance pool because subsequent tiering will cascade data from high performance to low performance pools.

5.3.5 HSM policies


Use the defines from the TEMPLATE-HSM rules. You can again encapsulate weight expression as a define and optionally have placement rules to select files by policy. Follow these best practices for the migration rules: External pool rules: Use rules from template (HSMEXEC, HSMLIST). Threshold: Make sure at least one exists as a safety net even if using other rules. Always include exclusion clauses (migrated, special files) in migration rules. Non-threshold migration: This needs an associated cron job to trigger; you might want to have a time clause to prevent running on threshold trigger. Define at least one rule for each migration level (system->pool2, pool2->hsm). External pool rules: Use rules from template (HSMEXEC, HSMLIST) Remember to terminate the policy with a default placement rule.

5.3.6 Policy triggers


Policies can be applied to a filesystem or only reside in the SONAS database. Filesystem policy: active: one per filesystem, loaded from database (setpolicy) Database policy inactive: they are not running default = quick path to recalling a policy; this is a db state only Triggers control when policies are activated. Policies only do something if triggered. We have the following kinds of triggers: Manual trigger: The runpolicy command allows a database policy to be run. Automated triggers, also referred to as callbacks, triggered by a threshold: The SONAS GPFS file system manager detects that disk space is running below the low threshold specified in the current policy rule, and raises a lowDiskSpace event. The lowDiskSpace event initiates a SONAS GPFS migration callback procedure. The SONAS GPFS migration callback executes the SONAS script defined for that callback. The SONAS script executes the active filesystem policy. Cron: In SONAS R1.1.1 cron activates the default filesystem policy. Later releases might allow another database policy to be selected and not the default policy for the filesystem.
Chapter 5. SONAS policies

169

When SONAS identifies that a threshold has been reached, it will trigger a new lowspace event every two minutes so long as the fill level of the filesystem is above the threshold. SONAS knows that a migration was already triggered, so it ignores the new trigger and it will not do any additional processing, the migration that started earlier continues execution.

5.3.7 Weight expressions


Weight expressions are used with threshold migration rules. The threshold limits the amount of data moved and the weight expression determines the order of files being migrated so that files with the highest weight are moved first and until the threshold is satisfied. Code the weight expression as a define because it makes rule easier to read, as the following rule shows: RULE 'systemtosilver' MIGRATE FROM POOL 'system' THRESHOLD(15,10) WEIGHT(weight_expression) TO POOL 'silver' WHERE NOT (exclude_list) AND NOT (is_migrated) Where weight expression is: define(weight_expression,(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END)) The previous two statements are simpler to read than the combined statements: RULE 'systemtosilver' MIGRATE FROM POOL 'system' THRESHOLD(15,10) WEIGHT(CASE WHEN access_age < 1 THEN 0 WHEN mb_allocated < 1 THEN access_age WHEN is_premigrated THEN mb_allocated * access_age * 10 ELSE mb_allocated * access_age END) TO POOL 'silver' WHERE NOT (exclude_list) AND NOT (is_migrated)

5.3.8 Migration filters


Migration filters are used to control what gets migrated and when. Exclusion rules, or filters, need to include the following files: Migrated and special files: These must be used from the templates. Optionally, small files: Leave small files behind for efficiency if they can fit on disk (threshold + weight rule might do this anyway, so this might not be a useful rule). The fine print: This means that small files will not be migrated to offline storage, and cannot be recovered from the offline storage. Although HSM can be used to recover files, it is not desirable and is not supported as a customer action. Customers need to be using backup/restore; in that case, if they run coupled with backup, the small files will be backed up, just not migrated. Time filters can be useful when coupled with cron jobs, for example, running a cron every Sunday at 4:05 AM; perhaps we are flushing a lot of files not accessed for a week.

5.3.9 General considerations


Understand your SONAS and Tivoli Storage Manager throughputs and loads, make sure your thresholds leave sufficient freespace to finish without running out of disk space. Note that bandwidth to Tivoli Storage Manager might only reduce the rate the filesystem fills during peak usage and not necessarily at a fast enough rate depending on your configuration.

170

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The filesystem high threshold must allow the peak use period to finish without filling the filesystem 100%. Always use a threshold if you are using Information Lifecycle Management/HSM. Even if you do not expect to hit the threshold, this will provide a safety net in case your other policies have bugs, or in case your usage profile changes. Be aware that a cron job that exploits a low threshold rule will cause metadata spin. Migrations rules with no threshold do not trigger automatically but need a cron job for that. Tivoli Storage Manager will clone backups if HSM migration is done first, migration will still take the same amount of time to move data from SONAS to Tivoli Storage Manager, but backups might be faster depending on server throughput. The migrequiresbackup option can be set at the Tivoli Storage Manager server and the option can be used to prevent the following scenario: If ACL data of a premigrated file are modified, these changes are not written to the Tivoli Storage Manager server, if the file will be migrated after this change. To avoid loosing the modified ACL data, use the option migrequiresbackup yes. This setting will not allow you to migrate files whose ACL data has been modified and for which no current backup version exists on the server. When using migrequiresbackup, you must back up files or you might run out of space because HSM will not move files.

5.4 Policy creation and execution walkthrough


We now illustrate the operational steps required to set up and execute SONAS policies, both using the SONAS GUI and with the SONAS CLI.

5.4.1 Creating a storage pool using the GUI


To create a storage pool using the GUI, connect to the SONAS GUI and navigate to Storage Disks. You will see a display of all the NSD disks that are available as shown in Figure 5-3. We see that all disks are in the system pool.

Figure 5-3 List of available NSD devices

You can also list the available storage pools for a specific filesystem by selecting Storage Storage Pools as shown in Figure 5-4. Note that you have only one storage pool, system, for our file system. The name, system, is the default storage pool name and cannot be removed.

Chapter 5. SONAS policies

171

Figure 5-4 Storage pools details for selected file system

To assign a disk to a filesystem, proceed to the Files Files Systems panel. Select the redbooks file system to which you want to assign a new NSD disk with another storage pool. After selecting that filesystem, you will see the File System Disks window as shown in Figure 5-5.

Figure 5-5 :File Systems Disks window pol3

Click the Add a disk to the file system button and a panel like that in Figure 5-6 is shown. Select the disks to add, choose a disk type, specify a storage pool name, and click OK.

Figure 5-6 Add a disk to the filesystem panel pol4

172

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

After the task completes, you will see that the filesystem now resides on two disks, with the file system and storage pool usage as shown in Figure 5-7.

Figure 5-7 File system disk and usage display

5.4.2 Creating a storage pool using the CLI


Connect to the SONAS CLI. You can list the NSD volumes using the lsdisk command as shown in Example 5-4.
Example 5-4 Listing NSDs
[sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs5nsd redbook2 1 gpfs4nsd 1 gpfs6nsd 1 Type dataAndMetadata dataAndMetadata dataAndMetadata dataAndMetadata dataAndMetadata Pool system system system system system system Status ready ready ready ready ready ready Availability up up up up Timestamp 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/21/10 10:50 PM 4/22/10 10:42 PM

Modify the storage pool assignment for the NSD called gpfs4nsd using the chdisk and lsdisk commands as shown in Example 5-5. Attributes such as pool name, usage type, and failure group cannot be changed for disks that ar active in a filesystem.
Example 5-5 Change storage pool and data type assignment
[sonas02.virtual.com]$ chdisk gpfs4nsd --pool silver --usagetype dataonly EFSSG0122I The disk(s) are changed successfully! [sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs5nsd redbook2 1 gpfs4nsd 1 gpfs6nsd 1

Type dataAndMetadata dataAndMetadata dataAndMetadata dataAndMetadata dataOnly

Pool system system system system silver system

Status ready ready ready ready ready ready

Availability up up up up

Timestamp 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/21/10 10:50 PM 4/22/10 10:46 PM

Chapter 5. SONAS policies

173

To add the gpfs4nsd to the redbook2 filesystem use the chfs command as shown in Example 5-6.
Example 5-6 Add a disk to the redbook2 filesystem
[sonas02.virtual.com]$ chfs --add gpfs4nsd redbook2 The following disks of redbook2 will be formatted on node strg002st002.virtual.com: gpfs4nsd: size 1048576 KB Extending Allocation Map Creating Allocation Map for storage pool 'silver' 31 % complete on Thu Apr 22 22:53:20 2010 88 % complete on Thu Apr 22 22:53:25 2010 100 % complete on Thu Apr 22 22:53:26 2010 Flushing Allocation Map for storage pool 'silver' Disks up to size 24 GB can be added to storage pool 'silver'. Checking Allocation Map for storage pool 'silver' 83 % complete on Thu Apr 22 22:53:32 2010 100 % complete on Thu Apr 22 22:53:33 2010 Completed adding disks to file system redbook2. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. EFSSG0020I The filesystem redbook2 has been successfully changed.

You can verify the storage pools and NSD assignment with the lspool command as shown in Example 5-1:
Example 5-7 Listing storage pools
[sonas02.virtual.com]$ Filesystem Name Size gpfs0 system 2.00 redbook system 1.00 redbook2 silver 1.00 redbook2 system 1.00 lspool Usage GB 4.2% GB 14.7% GB 0.2% GB 14.7% Available fragments Available blocks 350 kB 1.91 GB 696 kB 873.00 MB 14 kB 1021.98 MB 704 kB 873.00 MB Disk list gpfs1nsd;gpfs2nsd gpfs3nsd gpfs4nsd gpfs5nsd

Repeat the lsdisk command to confirm the correct filesystem to disk assignments as shown in Example 5-8:
Example 5-8 Listing NSD disks
[sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs4nsd redbook2 1 gpfs5nsd redbook2 1 gpfs6nsd 1 Type dataAndMetadata dataAndMetadata dataAndMetadata dataOnly dataAndMetadata Pool system system system silver system system Status ready ready ready ready ready ready Availability up up up up up Timestamp 4/22/10 3:03 AM 4/22/10 3:03 AM 4/22/10 3:03 AM 4/21/10 10:50 PM 4/22/10 3:03 AM 4/22/10 10:59 PM

174

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

5.4.3 Creating and applying policies using the GUI


To create and apply a policy using the SONAS GUI select Files Policies. You will see the Policies List window with a list of file systems as shown in Figure 5-8. Selecting a file system you will see the Policy Details window for that filesystem below.

Figure 5-8 Policy list window

In the policy details section of the window type your policy. Note that you can also load the policy from a file on your computer by pressing the Load policy button. Click the Set policy button and choose apply at the prompt to set the policy. After this click the Apply policy button and choose apply at the prompt to apply the policy. After applying the policy you will see a panel as shown in Figure 5-9 showing a summary of the policy that will be applied. Policies are now active.

Chapter 5. SONAS policies

175

Figure 5-9 Apply policy task progress window

5.4.4 Creating and applying policies using the CLI


We create a new policy called redpolicy using the CLI that contains the rules shown in Example 5-9.
Example 5-9 Policy rules
RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT' RULE 'movepdf' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' RULE 'default' set POOL 'system'

Important: The CLI mkpolicy and mkpolicyrule commands do not accept the RULE statement, so the RULE statement must be removed from all policies statements.

176

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

We create the policy and add the first rule using the mkpolicy command to create the policy with the first rule and the mkpolicyrule command to append policy rules to the redpolicy policy as shown in Example 5-10.
Example 5-10 Create a new policy
[sonas02]# mkpolicy -P "redpolicy" -R " set POOL 'silver' WHERE UPPER(name) like '%.TXT' ;" [sonas02]# mkpolicyrule -P "redpolicy" -R " MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ;" [sonas02]# mkpolicyrule -P "redpolicy" -R " set POOL 'system' "

We list all policies defined using the CLI with the lspolicy -P all command (Example 5-11.)
Example 5-11 List all policies
[sonas02]# lspolicy Policy Name Rule Number redpolicy 1 redpolicy 2 redpolicy 3 -P all Rule Is Default set POOL 'silver' WHERE UPPER(name) like '%.TXT' N MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' N set POOL 'system' N

Important: You cannot list policies created using the GUI with lspolicy. Now we validate the policy using the chkpolicy command as shown in Example 5-12.
Example 5-12 Validate the policy
[sonas02]# chkpolicy -P "redpolicy" -T validate -d redbook2 -c sonas02.virtual.com No error found. All the placement rules have been validated.

After successful validation, we set the policy for filesystem redbook2 using the setpolicy command as shown in Example 5-13. We then run the lspolicy -A command to verify what filesystems have policies.
Example 5-13 Set the policy
[sonas02]# setpolicy -P "redpolicy" -d redbook2 -c sonas02.virtual.com

[root@sonas02.mgmt001st002 ~]# lspolicy -A Cluster Device Policy Name Applied Time Who applied it? sonas02.virtual.com redbook2 redpolicy 4/26/10 11:00 PM root sonas02.virtual.com gpfs0 N/A sonas02.virtual.com redbook N/A

Attention: Policies created with the GUI do not appear in the SONAS CLI lspolicy -A command. The redbook filesystem does have a valid policy that was set using the GUI as shown in Example 5-14. It was created using the GUI because it has RULE statements with word comments that are not allowed by the CLI.
Example 5-14 Policies applied to filesystems
[sonas02]# lspolicy -d redbook Cluster Device Policy Last update sonas02.virtual.com redbook RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT' ; RULE 'movepdf' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ; RULE 'default' set POOL 'system' 4/26/10 10:59 PM [sonas02]# lspolicy -d redbook2 Cluster Device Policy Last update sonas02.virtual.com redbook2 /* POLICY NAME: redpolicy */ ; RULE '1' set POOL 'silver' WHERE UPPER(name) like '%.TXT' ; RULE '2' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' ; RULE '3' set POOL 'system' 4/26/10 11:10 PM

Chapter 5. SONAS policies

177

5.4.5 Testing policy execution


We now connect to the SONAS management node as root to run the GPFS mmlsattr command. Policies will be verified in the redbooks filesystem.The policy being executed is shown in Example 5-15.
Example 5-15 Sample policy
RULE 'txtfiles' set POOL 'silver' WHERE UPPER(name) like '%.TXT' RULE 'movepdf' MIGRATE FROM POOL 'system' TO POOL 'silver' WHERE UPPER(name) like '%.PDF' /* This is a new policy for Lukasz & John */ RULE 'default' set POOL 'system'

We verify that files ending with the .txt extension are placed in the silver pool, other files go to the system pool and .pdf files are allocated in the system pool and subsequently moved to the silver pool. We have created three files, we list them with ls -la and then run the GPFS mmlsattr command to verify file placement, as shown in Example 5-16. The files are placed as follows: test1.mp3 on the system pool test2.txt on the silver pool test3.pdf on the system pool
Example 5-16 Files allocated and managed by policies
[root@sonas02.mgmt001st002 export2]# ls -la drwxr-xr-x 2 VIRTUAL\administrator root 8192 Apr 23 drwxr-xr-x 4 root root 32768 Apr 22 -rw-r--r-- 1 root root 0 Apr 23 -rw-r--r-- 1 root root 0 Apr 23 -rw-r--r-- 1 root root 0 Apr 23 04:52 02:32 04:51 04:51 04:52 . .. test1.mp3 test2.txt test3.pdf

[root@sonas02.mgmt001st002 export2]# mmlsattr -L test* file name: test1.mp3 metadata replication: 1 max 2 data replication: 1 max 2 immutable: no flags: storage pool name: system fileset name: root snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: test2.txt 1 max 2 1 max 2 no silver root

test3.pdf 1 max 2 1 max 2 no system root

178

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Now we can apply the policy using the GUI by going to Files Policies, selecting our file system and clicking the Apply policy button and choosing Apply. The apply policy causes the migration rule to be applied. After policy execution, we verify the correct placement of the files using the mmlsattr command as shown in Example 5-17. The files will now be placed on storage pools as follows: test1.mp3 remains on the system pool. test2.txt remains on the silver pool. test3.pdf has been moved to the silver pool.
Example 5-17 List file status
[root@sonas02.mgmt001st002 export2]# mmlsattr -L test* file name: test1.mp3 metadata replication: 1 max 2 data replication: 1 max 2 immutable: no flags: storage pool name: system fileset name: root snapshot name:

file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name:

test2.txt 1 max 2 1 max 2 no silver root

test3.pdf 1 max 2 1 max 2 no silver root

Important: The mmlsattr command is a GPFS command that must be run on SONAS using root authority. However, SONAS does not support running commands with root authority. SONAS development recognizes the need for an equivalent SONAS command to verify file placement of files in storage pools.

Chapter 5. SONAS policies

179

180

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 6.

Backup and recovery, availability, and resiliency functions


In this chapter we illustrate SONAS components and external products that can be used to guarantee data availability and resiliency. We also provide details of the Tivoli Storage Manager integration. We discuss the following topics: Backup and recovery of files in a SONAS cluster Configuring SONAS to use HSM Replication of SONAS data SONAS Snapshots

Copyright IBM Corp. 2010. All rights reserved.

181

6.1 High availability and data protection in base SONAS


A SONAS cluster offers many high availability and data protection features that are part of the base configuration and do not need to be ordered separately. SONAS is a grid-like storage solution. By design all the components in a SONAS cluster are redundant so there is no single point of failure; for example, we have multiple interface nodes for client access and data can be replicated cross multiple storage pods. The software components included in the SONAS cluster also offer high availability functions, for example, the SONAS GPFS filesystem is accessed concurrently from multiple interface nodes and offers data protection through synchronous replication and snapshots. See Chapter 3, Software architecture on page 73 for more details. The SONAS also includes Tivoli Storage Manager client software for data protection and backup to an external Tivoli Storage Manager server, and asynchronous replication functions to send data to a remote SONAS or file server. Data is accessed through interface nodes, and interface nodes are deployed in groups of two or more to guarantee data accessibility in the case that an interface node is no longer accessible. The SONAS Software stack manages services availability and access failover between multiple interface nodes. This allows clients to continue accessing data in the case that an interface node is unavailable. The SONAS Cluster Manager is comprised of three fundamental components for data access failover: The Cluster Trivial Database (CTDB) monitors services and restarts them on an available node, offering concurrent access from multiple nodes with locking for data integrity. DNS performs IP address resolution and round robin IP load balancing. File sharing protocol includes error retry mechanisms. These three components, together with a retry mechanism in the file sharing protocols, make SONAS a high availability file sharing solution. In this chapter we introduce the SONAS high availability and data protection functions and discuss how these features can be applied in your environment.

6.1.1 Cluster Trivial Database


CTDB is used for two major functions. First, it provides a clustered manager that can scale well to large numbers of nodes. The second function it offers is the control of the cluster, CTDB controls the public IP addresses used to publish the NAS services and moves them between nodes. Using monitoring scripts, CTDB determines the health state of a node. If a node has problems, like broken services or network links, the node becomes unhealthy. In this case, CTDB migrates all public IP addresses to healthy nodes and sends CTDB tickle-acks to the clients so that they reestablish the connection. CTDB also provides the API to manage cluster IP addresses, add and remove nodes, ban and disable nodes. CTDB must be healthy on each node of the cluster for SONAS to work correctly. When services are down for any reason, the state of CTDB might go down. CTDB services can be restarted on a node using either the SONAS GUI or the command line. It is also possible to change CTDB configuration parameters such as public addresses, log file information, and debug level.

182

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Suspending and resuming nodes


You can use the SONAS administrator GUI or command line to perform multiple operations on a node. The suspendnode and resumenode CLI commands provide control of the status of an interface node in the cluster. The suspendnode command suspends a specified interface node. It does this by banning the node at the CTDB level. A banned node does not participate in the cluster and does not host any records for the CTDB. The IP addresses for a suspended node are taken over by an other node and no services are hosted on the suspended node.

GUI example of suspendnode command


Following is an example of suspending and resuming a node from the GUI: 1. Select Clusters, Interface Nodes from the GUI. 2. Select the cluster from the Active Cluster pull-down menu. 3. Select cluster node int001st002.virtual.com by marking the check box and click the Suspend button to suspend the node. 4. After a short pause, the window shown in Figure 6-1 will appear showing that the status for node int001st002.virtual.com is stopped and that all active IP addresses are on node

int002st002.virtual.com.

Figure 6-1 Suspended node display

5. To re-enable activity on node int001st002.virtual.com, we select it and click the Resume button, Figure 6-2 shows the resulting status. Note that the public IP addresses have been rebalanced across the nodes and that the status for the node is active.

Chapter 6. Backup and recovery, availability, and resiliency functions

183

Figure 6-2 Interface node IP addresses after node resume

6.1.2 DNS performs IP address resolution and load balancing


What happens when a problem occurs on a SONAS interface node or on the network that connects the client to the SONAS interface node depends on multiple factors such as the file sharing protocol in use and on specific SONAS configuration parameters. We illustrate various failover considerations. All requests from a client to a SONAS cluster for data access is serviced through the SONAS public IP address. These public IP addresses are similar to virtual addresses because in general the client can access the same service, at various moments in time, over various public IP addresses. SONAS interface nodes can have multiple public IP addresses for load balancing and IP failover, for example, the lsnwinterface -x CLI command displays all public addresses in the interface nodes as shown in Figure 6-13. This figure shows two interface nodes: int001st002 and int001st002, each with two public IP addresses assigned on interfaces eth1 and eth2. The management node is also shown but it does not host any public IP addresses.
[[SONAS]$ lsnwinterface -x Node Interface int001st002.virtual.com eth0 int001st002.virtual.com eth1 int001st002.virtual.com eth2 int002st002.virtual.com eth0 int002st002.virtual.com eth1 int002st002.virtual.com eth2 mgmt001st002.virtual.com eth0 mgmt001st002.virtual.com eth1 mgmt001st002.virtual.com eth2

MAC Master/Slave Up/Down 02:1c:5b:00:01:01 UP 02:1c:5b:00:01:02 UP 02:1c:5b:00:01:03 UP 02:1c:5b:00:02:01 UP 02:1c:5b:00:02:02 UP 02:1c:5b:00:02:03 UP 02:1c:5b:00:00:01 UP 02:1c:5b:00:00:02 UP 02:1c:5b:00:00:03 UP

IP-Addresses 10.0.1.121 10.0.2.122 10.0.1.122 10.0.2.121

Figure 6-3 Public IP addresses before IP address failover

In Figure 6-3 we see that in normal operating conditions each interface node has two public IP addresses. Figure 6-4 shows that after a node failover, all public IP addresses have been moved to interface node int002st002, and node int001st002 is hosting no IP addresses.

184

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

[[SONAS]$ Node int001st002.virtual.com int001st002.virtual.com int001st002.virtual.com int002st002.virtual.com int002st002.virtual.com int002st002.virtual.com mgmt001st002.virtual.com mgmt001st002.virtual.com mgmt001st002.virtual.com

eth0 eth1 eth2 eth0 eth1 eth2 eth0 eth1 eth2

Interface MAC 02:1c:5b:00:01:01 02:1c:5b:00:01:02 02:1c:5b:00:01:03 02:1c:5b:00:02:01 02:1c:5b:00:02:02 02:1c:5b:00:02:03 02:1c:5b:00:00:01 02:1c:5b:00:00:02 02:1c:5b:00:00:03

Master/Slave Up/Down IP-Addresses UP UP UP UP UP 10.0.1.121,10.0.1.122 UP 10.0.2.121,10.0.2.122 UP UP UP

Figure 6-4 Public IP addresses after IP address failover

6.1.3 File sharing protocol error recovery


Depending on the data access protocol, various behaviors might be observed. The FTP and SFTP protocols typically fail as they do not survive failed TCP connection, in this case, the user has to restart the session, for example, by reconnecting to the file server, using the same IP address to get access to the new node. The CIFS protocol behaves well and demonstrates failover using either DNS name resolution or static IP addressing. The NFS protocol is not always successful on failover, to avoid reconnection problems, it is best for NFS shares to be accessed using static IP addresses and not DNS address resolution. The reason for this is that when an NFS client uses DNS addresses, after a client failure, it might get another IP address when it reconnects to the SONAS cluster but the NFS file locks are dependant on and still held by the NFS clients original IP address. In this situation the NFS client might hang indefinitely waiting for lock on the file. To clean up the NFS lock situation you can recover CTDB services on the failed node using the SONAS GUI using the Clusters Interface Node Restart option.

6.2 Backup and restore of file data


This section discusses the backup and restore methods and techniques for SONAS file data, but it does not address the protection of SONAS metadata. For a discussion of the latter topic see 6.5.1, Backup of SONAS configuration information on page 217.

6.2.1 Tivoli Storage Manager terminology and operational overview


IBM Tivoli Storage Manager, working together with IBM SONAS, provides an end-to-end comprehensive solution for backup/restore, archival, and HSM.

How IBM SONAS works with Tivoli Storage Manager


In order to best understand how IBM SONAS works together with IBM Tivoli Storage Manager, it is useful here to review and compare the specific Tivoli Storage Manager terminology and processes involved with the following activities: Backing up and restoring files Archiving and retrieving them Migrating and recalling them (HSM)

Chapter 6. Backup and recovery, availability, and resiliency functions

185

Tivoli Storage Manager terminology


If you use Tivoli Storage Manager to back up files (which will invoke the Tivoli Storage Manager backup/archive client code on the interface nodes), copies of the files are created on the Tivoli Storage Manager server external storage, and the original files remain in your local file system. To obtain a backed up file from Tivoli Storage Manager storage, for example, in case the file is accidentally deleted from the local file system, you restore the file. If you use Tivoli Storage Manager to archive files to Tivoli Storage Manager storage, those files are removed from your local file system, and if needed later, you retrieve them from Tivoli Storage Manager storage. If you use Tivoli Storage Manager to migrate SONAS files to external storage (which will invoke the Tivoli Storage Manager HSM client code on the interface nodes), you move the files to external storage attached to the Tivoli Storage Manager server, and Tivoli Storage Manager will replace the file with a stub file in the SONAS file system. You can accept the default stub file size, or if you want, specify the size of your Tivoli Storage Manager HSM stub files to accommodate needs or applications that want to read headers or read initial portions of the file. To users, the files appear to be online in the file system. If the migrated file is accessed, Tivoli Storage Manager HSM will automatically initiate a recall of the full files from their migration location in external Tivoli Storage Manager-attached storage. The effect on the user will simply be an elongated response time while the file is being recalled and reloaded into internal SONAS storage. You can also initiate recalls proactively if desired.

6.2.2 Methods to back up a SONAS cluster


SONAS is a storage device that stores you file data so it is important to develop an appropriate file data protection and backup plan to be able to recover data in case of disaster, accidental deletion or data corruption. We discuss how to back up the data contained in SONAS cluster using either Tivoli Storage Manager or other ISV backup products, we do not discuss the backup of SONAS configuration information. SONAS cluster configuration information is stored on the management node in multiple repositories. SONAS offers the backupmanagementnode command to back up SONAS cluster configuration information. The use of this command is described in 6.5, Disaster recovery methods. SONAS clusters are preloaded with Tivoli Storage Manager to act as a Tivoli Storage Manager client to back up filesystems. The SONAS Tivoli Storage Manager client requires an external, customer supplied and licensed, Tivoli Storage Manager server.

6.2.3 Tivoli Storage Manager client and server considerations


The Tivoli Storage Manager client integrated into the SONAS is at version 6.1 and this client version is compatible with Tivoli Storage Manager servers at versions 5.5, 6.1 and 6.2. The Tivoli Storage Manager client runs on the SONAS interface nodes and each interface node can open up to eight sessions to the Tivoli Storage Manager server and multiple interface nodes can initiate proportionally more sessions to the Tivoli Storage Manager server. For example, 10 interface nodes can initiate up to 80 Tivoli Storage Manager sessions. We suggest setting the Tivoli Storage Manager server maxsess parameter to a value of 100 for SONAS. If the Tivoli Storage Manager server cannot handle such a large number of sessions it might be necessary to reduce the number of interface nodes involved in a backup as server sessions that hang or are disconnected might result in incomplete or failed backups.

186

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Mount requests: As each node can start up to eight parallel sessions, the Tivoli Storage Manager client maxnummp parameter must be set to eight. This means that a Tivoli Storage Manager client node can initiate up to eight mount requests for Tivoli Storage Manager sequential media on the server.

LAS backup through Tivoli Storage Manager


SONAS currently supports LAN backup through the preinstalled Tivoli Storage Manager backup/archive client running on interface nodes and only LAN backup is supported, LAN free backup is not supported nor implemented. Tivoli Storage Manager uses the backup component, the archiving component is not used. All backup and restore operations are executed using the SONAS CLI commands, native Tivoli Storage Manager commands are not supported. The Tivoli Storage Manager client is configured to retry backup of open files and continue without backing the file up after a set number of retries. The Tivoli Storage Manager backup path length is limited to 1024 characters including both file and directory path length. File names must not use the following characters: " or ' or linefeed (0x0A). Databases must be shut down or frozen before a backup occurs to put them into a consistent state. Backup jobs are run serially, that is only one backup job for one filesystem can run at one point in time.

Tivoli Storage Manager database sizing


The Tivoli Storage Manager server and the Tivoli Storage Manager server database must be sized appropriately based on the number of files that will be backed up. Each file that is backed up is an entry in the Tivoli Storage Manager database and each file entry in the Tivoli Storage Manager database uses between 400 and 600 bytes or around 0.5 KB so we can give a rough estimate of the size of the database by multiplying the number of files by the average file entry size, for example, a total of 200 million files will consume around 100 GB of Tivoli Storage Manager database space. As of Tivoli Storage Manager 6.1, the maximum preferred size for one Tivoli Storage Manager database is 1000 GB. When very large numbers of files need to be backed up you might need to deploy multiple Tivoli Storage Manager servers. The smallest SONAS that can be handled by a Tivoli Storage Manager server is a file system so this means that only one given Tivoli Storage Manager server can backup and restore files for a given filesystem. When you have n filesystems you can have between 1 and n Tivoli Storage Manager servers.

6.2.4 Configuring interface nodes for Tivoli Storage Manager


You must set up the interface nodes to work with Tivoli Storage Manager before you can configure and perform a backup and restore operations. Before starting the configuration the following information is required: The Tivoli Storage Manager server name, IP address, and port for the Tivoli Storage Manager servers to be configured The host names of the interface nodes that will run the backups The following procedure also assumes you have already defined the Tivoli Storage Manager server configuration elements such as policy domain, management class and storage pools have been setup beforehand. For additional information about Tivoli Storage Manager configuration, see the IBM Tivoli Storage Manager Implementation Guide, SG24-5416, which can be downloaded at: http://www.redbooks.ibm.com/abstracts/sg245416.html?Open

Chapter 6. Backup and recovery, availability, and resiliency functions

187

Set up the SONAS client definitions on the Tivoli Storage Manager servers. You must execute these steps on all the Tivoli Storage Manager servers: 1. Connect to the first Tivoli Storage Manager server to be configured as a Tivoli Storage Manager administrator with the Tivoli Storage Manager command line interface (CLI) client by running the dsmadmc command on a system with the Tivoli Storage Manager administrative interface installed 2. Register a virtual node name for the SONAS cluster. You can choose any name you like providing it is not already registered to Tivoli Storage Manager. You can choose the SONAS cluster name, for example, sonas1 with password sonas1secret and register the node to a Tivoli Storage Manager domain called sonasdomain. Use the Tivoli Storage Manager register node command as follows: register node sonas1 sonas1secret domain=sonasdomain 3. Register one Tivoli Storage Manager client node for each SONAS interface node that will run the Tivoli Storage Manager client. Assuming we have the following three interface nodes: int1st2, int2st2 and int3st2 we register a separate Tivoli Storage Manager node and password for each one using the Tivoli Storage Manager register node command as follows: register node int1st2node int1st2pswd domain=sonasdomain register node int2st2node int1st2pswd domain=sonasdomain register node int3st2node int1st2pswd domain=sonasdomain 4. Grant all the Tivoli Storage Manager client nodes representing the interface nodes proxy access to the Tivoli Storage Manager virtual node representing the SONAS cluster using the Tivoli Storage Manager grant proxynode administrator command, assuming we have the following three interface nodes, Tivoli Storage Manager clients: int1st2node, int2st2node and int3st2node: and that the cluster is called sonas1, we run the following Tivoli Storage Manager administrator command: grant proxynode target=sonas1 agent=int1st2node,int2st2node,int3st2node 5. Now we create a Tivoli Storage Manager server stanza, an entry into the Tivoli Storage Manager configuration file, on all the SONAS interface nodes. Assuming the Tivoli Storage Manager server is called tsmsrv1 and has IP address tsmsrv1.com with port 1500 and we have the following three interface nodes to configure for backup: int1st2, int2st2 and int3st2. 6. Connect to node int1st2 using the SONAS CLI and issue the following command: cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int1st2node sonas1 int1st2 int1st2pswd 7. Connect to node int2st2 using the SONAS CLI and issue the following command: cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int2st2node sonas1 int2st2 int2st2pswd 8. Connect to node int3st2 using the SONAS CLI and issue the following command: cfgtsmnode tsmsrv1 tsmsrv1.com 1500 int3st2node sonas1 int3st2 int3st2pswd 9. Repeat steps from 1 to 8 for all the Tivoli Storage Manager servers you want to configure. Now the Tivoli Storage Manager servers will be configured on all interface nodes. You can verify this by issuing the SONAS lstsmnode command without arguments to see all Tivoli Storage Manager stanza information about all interface nodes.

188

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6.2.5 Performing Tivoli Storage Manager backup and restore operations


Prerequisite to performing backup and restore operations is that all SONAS interface nodes have been configured to connect to the Tivoli Storage Manager servers as outlined in 6.2.4, Configuring interface nodes for Tivoli Storage Manager. We now will configure individual filesystems to back up to a specific Tivoli Storage Manager server using the cfgbackupfs command to define what filesystem to back up, where to back it up to, and where to run the backup operation. This command does not perform the actual backup operation, it just configures the SONAS cluster for backing up a specific filesystem to a Tivoli Storage Manager server. For example, to back up filesystem gpfsjt to Tivoli Storage Manager server tsmsrv1 and execute the backup operation on the two nodes int1st2 and int2st2, you issue the following command from the SONAS CLI: cfgbackupfs gpfsjt tsmsrv1 int1st2,int2st2 The possible interface nodes were supplied when configuring the Tivoli Storage Manager server stanza with the cfgtsmnode SONAS command as described in 6.2.4, Configuring interface nodes for Tivoli Storage Manager on page 187. More than one interface node can be specified in a comma separated list, providing it has already been defined with the cfgtsmnode command. You can use the lsbackupfs command to list the configured backups as shown in Example 6-1.
Example 6-1 List filesystem backup configuration and status

# lsbackupfs File system TSM serv List of nodes Status gpfsjt tsmsrv1 int1st2,int2st2 NOT_STARTED

Start time End time N/A N/A

Now that the backup is fully configured we can run our first backup operation using the SONAS startbackup CLI command. The command accepts a list of one or more filesystems, and specifying no arguments makes the command backup all filesystems with configured backup destinations. For example, to start backing up the file system gpfsjt, issue: startbackup gpfsjt The command starts backup execution as a background operation and returns control to the caller. You will have to monitor the status and completion of the backup operation for the specific filesystem using the lsbackup SONAS command, as shown in Example 6-2.
Example 6-2 lsbackup command output # lsbackup gpfsjt Filesystem Date Message gpfsjt 20.01.2010 02:00:00 G0300IEFSSG0300I The filesys gpfsjt backup started. gpfsjt 19.01.2010 12:30:52 G0702IEFSSG0702I The filesys gpfsjt backup was done successfully. gpfsjt 18.01.2010 02:00:00 G0300IEFSSG0300I The filesys gpfsjt backup started.

You can also list the Tivoli Storage Manager server and backup interface node associations the status of the latest backup, and validate the backup configuration by using the lsbackupfs -validate SONAS command, for example, see Example 6-3.
Example 6-3 Listing backup configuration and status # lsbackupfs -validate File system TSM server List of nodes Status Start time gpfsjt tsmsrv1 int1st2,int2st2 COMPLETED_SUCCESSFULLY 1/21/10 04:26 (.. continuation of lines above ..) .. End time Message Validation Last update .. 1/21/10 04:27 INFO: backup ok (rc=0). Node is OK,Node is OK 1/21/10 04:27

Chapter 6. Backup and recovery, availability, and resiliency functions

189

Tivoli Storage Manager backups can be scheduled by using the CLI or GUI using the scheduled task called StartBackupTSM. To schedule a backup of all SONAS filesystems at 4:15 AM you use mktask as shown next: mktask StartBackupTSM --parameter sonas02.virtual.com --minute 15 --hour 4 Files backed up to Tivoli Storage Manager can be restored using the startrestore SONAS CLI command. The startrestore command takes the filename or pattern as an argument so you need to know the name of the files or directories to restore, you can also specify a restore date and time. Specifying no date time filters will return the most recent backup data. The files will be restored to the original location or to another location if desired and you can choose wether to replace the original files. An example of the restore command with the replace option follows. startrestore "/ibm/gpfsjt/dirjt/*" -R The lsbackupfs command will show if a restore is currently running by displaying RESTORE_RUNNING in the message field.

6.2.6 Using Tivoli Storage Manager HSM client


SONAS offers an HSM integration to send data to external storage devices managed by Tivoli Storage Manager. The Tivoli Storage Manager HSM clients run in the SONAS interface nodes and use the Ethernet connections within the interface nodes to connect to the external, customer provided, Tivoli Storage Manager server. The primary goal of the HSM support is to provide a high performance HSM link between a SONAS subsystem and an external tape subsystem. SONAS HSM support has the following requirements: One or more external Tivoli Storage Manager servers must be provided and the servers must be accessible through the external Ethernet connections on the interface nodes. The SONAS cfgtsmnode command must be run to configure the Tivoli Storage Manager environment. SONAS GPFS policies drive migration so Tivoli Storage Manager HSM automigration needs to be disabled. Every interface node has a Tivoli Storage Manager HSM client installed alongside with the standard Tivoli Storage Manager backup/archive client. An external Tivoli Storage Manager server is attached to the interface node through the interface node Ethernet connections. The Tivoli Storage Manager HSM client supports the SONAS GPFS filesystem through the use of the Data Management API (DMAPI). Before configuring HSM to a filesystem, you must complete the Tivoli Storage Manager initial setup using the cfgtsmnode command as illustrated in 6.2.4, Configuring interface nodes for Tivoli Storage Manager. SONAS HSM will use the same Tivoli Storage Manager server that was configured for the SONAS Tivoli Storage Manager backup client, and using the same server allows Tivoli Storage Manager to clone data between the Tivoli Storage Manager server backup storage pools and HSM storage pools. With the SONAS Tivoli Storage Manager client, one Tivoli Storage Manager server stanza is provided for each GPFS filesystem. Therefore, one GPFS filesystem can be connected to one single Tivoli Storage Manager server. Multiple GPFS filesystems can use either the same or various Tivoli Storage Manager servers. Multiple Tivoli Storage Manager servers might be needed when you have large number of files in a filesystem. Attention: At the time of writing, you cannot remove SONAS HSM without help from IBM.

190

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The SONAS HSM client must be configured to run on all the interface nodes in the SONAS cluster as migrated files can be accessed from any node and so the Tivoli Storage Manager HSM client needs to be active on all the nodes. All SONAS HSM configuration commands will be run using the SONAS CLI and not the GUI.

Configuring SONAS HSM


To configure SONAS HSM, use the cfghsmnode command to validate the connection to Tivoli Storage Manager and sets up HSM parameters. It validates the connection to the provided Tivoli Storage Manager server and it registers the migration callback. This script is invoked as follows: cfghsmnode <TSMserver_alias> <intNode1,intNode2,...,intNodeN> [ -c <clusterId | clusterName> ] Where: <TSMserver_alias> is the name of the Tivoli Storage Manager server set up by the backup/archive client, <intNode1,intNode2,...> is the list of interface nodes that will run HSM to the attached Tivoli Storage Manager server and <clusterId> or <clusterName> is the cluster identifier. You then use the cfghsmfs SONAS command as follows: cfghsmfs <TSMserv> <filesystem> [-P pool] [-T(TIER/PEER)] [-N <ifnodelist>] [-S stubsize] Where <TSMserv> is the name of the Tivoli Storage Manager server set up with the cfgtsmnode command, <filesystem> is the name of the SONAS filesystem to be managed by HSM, <pool> is the name of the user pool, TIER/PEER specifies if the system pool and the specified user pool are set up as TIERed or PEERed, <ifnodelist> is the list of interface nodes that will interface with the Tivoli Storage Manager server for this filesystem and <stubsize> is the HSM stub file size in bytes. For debugging purposes there are two commands that can be used: lshsmlog shows the HSM error log output (/var/log/dsmerror.log) and lshsmstatus shows the current HSM status.

SONAS HSM concepts


Using SONAS Hierarchical Storage Manager, new and most frequently used files remain on your local file systems, while those you use less often are automatically migrated to storage media managed by an external Tivoli Storage Manager server. Migrated files still appear local and are transparently migrated to and retrieved from the Tivoli Storage Manager Server. Files can also be prioritized for migration according to their size and/or the number of days since they were last accessed, which allows users to maximize local disk space. Enabling space management for a file system can provide the following benefits: Extends local disk space by utilizing storage on the Tivoli Storage Manager server Takes advantage of lower-cost storage resources that are available in your network environment Allows for automatic migration of old and/or large files to the Tivoli Storage Manager server Helps to avoid out-of-disk space conditions on client file systems

Chapter 6. Backup and recovery, availability, and resiliency functions

191

To migrate a file, HSM sends a copy of the file to a Tivoli Storage Manager server and replaces the original file with a stub file on the local file system. A stub file is a small file that contains the information required to locate and recall a migrated file from the Tivoli Storage Manager Server. It also makes it appear as though the file still resides on your local file system. Similar to backups and archives, migrating a file does not change the access time (atime) or permissions for that file. SONAS storage management policies control and automate the migration of files between storage pools and external storage. A feature of automatic migration is the premigration of eligible files. The HSM client will detect this condition and begin to automatically migrate eligible files to the Tivoli Storage Manager Server. This migration process will continue to migrate files until the file system utilization falls below the defined low threshold value. At that point, the HSM client will begin to premigrate files. To premigrate a file, HSM copies the file to Tivoli Storage Manager storage and leaves the original file intact on the local file system (that is, no stub file is created). An identical copy of the file resides both on the local file system and in Tivoli Storage Manager storage. The next time migration starts for this file system, HSM can quickly change premigrated files to migrated files without having to spend time copying the files to Tivoli Storage Manager storage. HSM verifies that the files have not changed since they were premigrated and replaces the copies of the files on the local file system with stub files. When automatic migration is performed, premigrated files are processed before resident files as this allows space to be freed in the file system more quickly A file managed by HSM can be in multiple states: Resident Migrated Premigrated A resident file resides on the local file system. For example, a newly created file is a resident file. A migrated file is a file that has been copied from the local file system to Tivoli Storage Manager storage and replaced with a stub file. A premigrated file is a file that has been copied from the local file system to Tivoli Storage Manager storage but has not been replaced with a stub file. An identical copy of the file resides both on the local file system and in Tivoli Storage Manager storage. A file can be in the premigrated state after premigration. If a file is recalled but not modified, it will also be in the premigrated state.

To return a migrated file to your workstation, access the file in the same way as you might access a file that resides on your local file system. The HSM recall daemon automatically recalls the migrated file from Tivoli Storage Manager storage. This process is referred to as transparent recall.

192

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6.3 Snapshots
SONAS offers filesystem level snapshots that allow you to create a point in time copy of all the user data in a filesystem. System data and currently existing snapshots are not copied with the snapshot operation. The snapshot function allows other programs, such as backups, to run concurrently with user updates and still obtain a consistent copy of the file system at the time the snapshot copy was created. Snapshots also provide an online backup capability that allows easy recovery from common problems such as accidental deletion of a file, and comparison with older versions of a file. One SONAS cluster supports a maximum of 256 snapshots for each filesystem. When you exceed the 256 snapshot limit you will not be able to create new snapshots and will receive an error until you remove one or more existing snapshots. The SONAS snapshots are space efficient because they only keep a copy of data blocks that have subsequently been changed or have been deleted from the filesystem after the snapshot has been taken.

6.3.1 Snapshot considerations


As snapshots are not copies of the entire file system so they must not be used as protection against media failure. A snapshot file is independent from the original file as it only contains the user data and user attributes of the original file. For Data Management API (DMAPI) managed file systems the snapshot will not be DMAPI managed, regardless of the DMAPI attributes of the original file because the DMAPI attributes are not inherited by the snapshot. For example, consider a base file that is a stub file because the file contents have been migrated by Tivoli Storage Manager HSM to offline media, the snapshot copy of the file will not be managed by DMAPI as it has not inherited any DMAPI attributes and consequently referencing a snapshot copy of a Tivoli Storage Manager HSM managed file will not cause Tivoli Storage Manager to initiate a file recall.

Chapter 6. Backup and recovery, availability, and resiliency functions

193

6.3.2 VSS snapshot integration


Snapshots can be integrated into a Microsoft Windows environment using windows Shadow Copy Services (VSS). For seamless SONAS integration with VSS the following snapshot naming convention must be followed: @GMT-yyyy.MM.dd-HH.mm.ss where the letter groupings indicate a unique date and time. Snapshots created using the CLI automatically adhere to this naming convention. Snapshots that are created with this name will be visible in the Previous version window of the Windows Explorer, as illustrated in Figure 6-5. Note that Windows displays the date and time based on the users date and time settings.

Figure 6-5 Example Windows Explorer folder previous versions tab

6.3.3 Snapshot creation and management


In this section, we show how to create and manage SONAS snapshots using both the command line and the GUI. SONAS snapshot commands create a snapshot of the entire file system at a specific point in time. Snapshots appear in a hidden subdirectory of the root directory called .snapshots.

194

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Creating snapshots from the GUI


To create a snapshot of a sample filesystem called gpfsjt through the SONAS GUI, proceed as follows: 1. Log in to the SONAS management GUI. 2. Select Files Snapshots. 3. Select the active cluster and the filesystem you want to snapshot as shown in Figure 6-6.

Figure 6-6 Select cluster and filesystem for snapshot

4. Click the Create new snapshot button. 5. You will be prompted for a name for the new snapshot; accept the default name if you want the snapshot to be integrated with Windows VSS previous versions and click OK to proceed. 6. You will see a task progress indicator window as shown in Figure 6-7. You can monitor task progression using this window.

Figure 6-7 Snapshot task progress indicator

7. You can close the task progress window by clicking the Close button.

Chapter 6. Backup and recovery, availability, and resiliency functions

195

8. You will now be presented with the list of available snapshots as shown in Figure 6-8.

Figure 6-8 List of completed snapshot

Creating and listing snapshots from the CLI


You can create snapshots from the SONAS CLI command line using the mksnapshot command, as shown in Figure 6-9.
[SONAS]$ mksnapshot gpfsjt EFSSG0019I The snapshot @GMT-2010.04.09-00.32.43 has been successfully created. Figure 6-9 Create a new snapshot

To list all snapshots from all filesystems, you can use the lssnapshot command as shown in Figure 6-10. The command retrieves data regarding the snapshots of a managed cluster from the database and returns a list of snapshots:
[SONAS]$ lssnapshot Cluster ID Device name Path Status 72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 72..77 gpfsjt @GMT-2010.04.08-23.58.37 Valid 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid Figure 6-10 List all snapshots for all filesystems

Creation Used (metadata) Used (data) 09.04.2010 02:32:43.000 16 0 5 09.04.2010 01:59:06.000 16 0 4 08.04.2010 22:52:56.000 64 1 1

ID Timestamp 20100409023246 20100409023246 20100409023246

Note the ID Timestamp field is the same for all snapshots, and this indicates the timestamp of the last SONAS database refresh. The lssnapshots command with the -r option forces a refresh of the snapshots data in the SONAS database by scanning all cluster snapshots before retrieving the data for the list from the database.

Removing snapshots
Snapshots can be removed using the rmsnapshot command or from the GUI. For example, to remove a snapshot for filesystem gpfsjt using the command line, proceed as shown in Figure 6-11 using the following steps: 1. Issue the lssnapshot command for filesystem gpfsjt and choose a snapshot to rem.ove by choosing that snapshots name, for example, @GMT-2010.04.08-23.58.37. 2. Issue the rmsnapshot command with the name of the filesystem and the name of the snapshot. 3. To verify if the snapshot has been removed, issue the lssnapshot command again and check that the removed snapshot is no longer present.

196

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

[SONAS]$ lssnapshot -d gpfsjt ClusID Devname Path Status Creation Used (metadata) 72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 72..77 gpfsjt @GMT-2010.04.08-23.58.37 Valid 09.04.2010 01:59:06.000 16 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 [SONAS]$ rmsnapshot gpfsjt @GMT-2010.04.08-23.58.37

Used (data) ... 0 ... 0 ... 1 ...

[SONAS]$ lssnapshot -d gpfsjt ClusID DevName Path Status Creation Used (metadata) Used (data) ... 72..77 gpfsjt @GMT-2010.04.09-00.32.43 Valid 09.04.2010 02:32:43.000 16 0 ... 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Valid 08.04.2010 22:52:56.000 64 1 ... Figure 6-11 Removing snapshots

Scheduling snapshots at regular intervals


To automate the task of creating snapshots ad regular intervals you can create a repeating SONAS task based on the snapshot task template called MkSnapshotCron. For example, to schedule a snapshot 5 minutes on filesystem gpfsjt, issue the command shown in Figure 6-12.
[SONAS]$ mktask MkSnapshotCron --parameter "sonas02.virtual.com gpfsjt" --minute */5 EFSSG0019I The task MkSnapshotCron has been successfully created. Figure 6-12 Create a task to schedule snapshots

Note that to create scheduled cron tasks, you must issue the mktask command from the CLI, it is not possible to create cron tasks from the GUI. To list the snapshot task that you have created you can use the lstask command an shown in Figure 6-13.
[[SONAS]$ lstask -t cron Name Description Status Last run Runs on Schedule MkSnapshotCron This is a cronjob for scheduled snapshots. NONE N/A Mgmt node Runs at every 5th minute. Figure 6-13 List scheduled tasks

And to verify that snapshots are being correctly performed you can use the lssnapshot command as shown in Figure 6-14.
[SONAS]$ lssnapshot Cluster ID Device name Path 72..77 gpfsjt @GMT-2010.04.09-03.15.06 72..77 gpfsjt @GMT-2010.04.09-03.10.08 72..77 gpfsjt @GMT-2010.04.09-03.05.03 72..77 gpfsjt @GMT-2010.04.09-03.00.06 72..77 gpfsjt @GMT-2010.04.09-00.32.43 72..77 gpfsjt @GMT-2010.04.08-20.52.41 Figure 6-14 List snapshots

Valid Valid Valid Valid Valid Valid

Status Creation Used (metadata) Used (data) ID 09.04.2010 05:15:08.000 16 0 9 09.04.2010 05:10:11.000 16 0 8 09.04.2010 05:05:07.000 16 0 7 09.04.2010 05:00:07.000 16 0 6 09.04.2010 02:32:43.000 16 0 5 08.04.2010 22:52:56.000 64 1 1

Chapter 6. Backup and recovery, availability, and resiliency functions

197

Microsoft Windows Viewing previous versions


Snapshots created with the naming convention like @GMT-yyyy.MM.dd-HH.mm.ssname will be visible in the Previous version window of the Windows Explorer, as illustrated in Figure 6-15. The snapshots are only visible at the export level. To see the previous versions for an export, follow these steps: 1. Open a Windows Explorer window to see the share for which you want previous versions displayed, \\10.0.0.21 in our example is the server and sonas21jt is our share. 2. Click the sonas21jt share name with mouse button two to bring up the sonas21jt share properties window as shown in step (1) in the diagram. 3. Double-click with the mouse to select a timestamp for which you want to see the previous versions, Today, April 09, 2010, 12:15 PM as shown in step (2) in the diagram. 4. You are now presented with a panel (3) showing the previous versions of files and directories contained in the sonas21jt folder.

Figure 6-15 Microsoft Windows - viewing previous versions

6.4 Local and remote replication


Data replication functions create a second copy of the file data and are used to offer a certain level of protection against data unavailability. Replication generally offer protection against component unavailability such as a missing storage device or storage pod but does not offer protection against logical file data corruption. When we replicate data, we usually want to send it to a reasonable distance as a protection against hardware failure or a disaster event that makes the primary copy of data unavailable, in the case of disaster protection we usually talk about sending data to a remote site at a reasonable distance from the primary site.

198

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6.4.1 Synchronous versus asynchronous replication


Data replication can occur in two ways depending when the acknowledgement to the writing application is returned: it can be synchronous or asynchronous. With synchronous replication both copies of the data are written to their respective storage repositories before returning an acknowledgement to the writing application. With asynchronous replication one copy of the data is written to the primary storage repository, then an acknowledgement is returned to the writing application and only subsequently is the data going to be written to the secondary storage repository. Asynchronous replication can be further broken down into continuous or periodic replication depending on the frequency that batches of updates are sent to the secondary storage. The replication taxonomy is illustrated in Figure 6-16.

replication

synchronous asynchronous

periodic continous

Figure 6-16 Replication types

Asynchronous replication is normally used when the additional latency due to the distance becomes problematic because it causes an unacceptable elongation to response times to the primary application.

6.4.2 Block level versus file level replication


Replication can occur at various levels of granularity, it can be block level when we replicate a disk or LUN and it can be file level when we replicate files or a portion of a file system such as a directory or a fileset. File level replication can the be either stateless or stateful. Stateless file replication occurs when we replicate a file to a remote site and then lose track of it whereas stateful replication tracks and coordinate updates made to the local and remote file so as to maintain the two copies of the file in sync.

6.4.3 SONAS cluster replication


Replication can occur inside one single SONAS cluster or between a local SONAS cluster and a remote SONAS cluster. The term intracluster replication refers to replication between storage pods in the same SONAS cluster whereas intercluster replication occurs between one SONAS cluster and a remote destination that can be a separate SONAS cluster or a file server. With intracluster replication the application does not need to be aware of the location of the file and failover is transparent to the application itself whereas with intercluster replication the application needs to be aware of the files location and will need to connect to the new location to access the file.

Chapter 6. Backup and recovery, availability, and resiliency functions

199

Figure 6-17 shows two SONAS clusters with file1 replicated using intracluster replication and file2 replicated with intercluster replication.
SONAS cluster#1
Interf.1 Interf.2 Interf.n

SONAS cluster#2
Interf.1 Interf.2 Interf.n

stor pod1

stor pod2

stor pod1

InTRAcluster replication file1 stor1

file2 file1 stor2

InTERcluster replication

file2

stor1

local/campus --------------------------- distance ---------------------------- geographic

Figure 6-17 Replication options

Table 6-1 shows the possible SONAS replication scenarios.


Table 6-1 SONAS replication solutions Type synchronous asynchronous Intracluster or intercluster intracluster Intercluster Stateful or stateless stateful stateless Local or Remote distance local remote

6.4.4 Local synchronous replication


Local synchronous replication is implemented within a single SONAS cluster so it is defined as intracluster replication. Synchronous replication is protection against total loss of a whole storage building block or storage pod and it is implemented by writing all datablocks to two storage building blocks that are part of two separate failure groups. Synchronous replication is implemented using separate GPFS failure groups. Currently synchronous replication applies to an entire filesystem and not to the individual fileset. Because the writes are acknowledged to the application only when both writes have been completed, write performance is dictated by the slower storage building block. High latencies will degrade performance and therefore this is a short distance replication mechanism. Synchronous replication requires InfiniBand connection between both sites and an increase in distances will decrease the performance. Another use case is protection against total loss of a complete site. In this scenario a complete SONAS cluster (including interface and storage nodes) is split across two sites. The data is replicated between both sites, so that every block is written to a building block on both sites. For proper operation the administrator has to define correct failure groups. For the two site scenario we need one failure group for each site. As of release 1.1 this use case is not completely applicable as all InfiniBand switches reside in the same rack and unavailability of this rack will stop SONAS cluster communications.

200

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Synchronous replication does not distinguish between the two storage copies and these are both peers, SONAS does not have a preferred failure group concept where it will send all reads, reads are sent to disks in both failure groups. Synchronous replication in the SONAS filesystem offers the following replication choices: No replication at all Replication of metadata only Replication of data and metadata It is best that metadata replication always be used for file systems within SONAS cluster. Synchronous replication can be established at file system creation time or later when the filesystem already contains data. Depending on when replication is applied, various procedures must be followed to enable synchronous replication. Synchronous replication requires that the disks belong to two distinct failure groups so as to ensure that the data and metadata is not replicated to the same physical disks. It is best that the various failure groups be defined on various storage enclosures, storage controllers to guarantee a possibility if failover in the case that a physical disk component becomes unavailable. Synchronous replication has the following prerequisites: Two separate failure groups must be present. The two failure groups must have the same number of disks. The same number of disks from each failure group and the same disk usage type must be assigned to the filesystem.

Establishing synchronous replication at filesystem creation


Synchronous replication across failure groups can be established as an option at filesystem creation time using either the GUI or the mkfs CLI command and specifying the -R option. This option sets the level of replication used in this file system and can be one of the following values: none, which means no replication at all meta, which indicates the file system metadata is synchronously mirrored all, which indicates the file system data and metadata is synchronously mirrored

Establishing synchronous replication after filesystem creation


Establishing synchronous replication after file system creation cannot be done using the GUI but requires the CLI interface. To enable synchronous replication, the following two steps must be carried out: Enable synchronous replication with the change filesystem chfs command and specifying the -R option Redistribute the filesystem data and metadata using the restripefs command The following section shows how to enable synchronous replication on an existing filesystem called gpfsjt:

Chapter 6. Backup and recovery, availability, and resiliency functions

201

We use lsdisk to see the available disks and lsfs to see the filesystems as shown in Figure 6-18.
[SONAS]$ Name Name gpfs1nsd gpfs2nsd gpfs3nsd gpfs4nsd gpfs5nsd gpfs6nsd lsdisk File system File system gpfs0 gpfs0 gpfsjt

Failure group Type Failure group Type 1 dataAndMetadata 1 dataAndMetadata 1 dataAndMetadata 1 dataAndMetadata 1 dataAndMetadata 2 dataAndMetadata

Pool Pool system system system userpool system userpool

Status Status ready ready ready ready ready ready

Availability Availability up up up

Timestamp Timestamp 4/12/10 3:03 4/12/10 3:03 4/12/10 3:03 4/13/10 1:55 4/13/10 1:55 4/13/10 1:55

AM AM AM AM AM AM

[SONAS]$ lsfs Cluster Devicen Mountpoint .. Data replicas Metadata replicas Replication policy Dmapi sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F sonas02 gpfsjt /ibm/gpfsjt .. 1 1 whenpossible T Figure 6-18 Disks and filesystem before replication

Using the example in Figure 6-18, we verify the number of disks currently assigned to the

gpfsjt filesystem in the lsdisk output and see there is only one disk used called gpfs3nsd.
To create the synchronous replica, we need the same number of disks as the number of disks currently assigned to the filesystem. From the lsdisk output, we also verify that there are a sufficient number of free disks that are not assigned to any filesystem. We use the disk called gpfs5nsd to create the data replica. The disk called gpfs5nsd is currently in failure group 1 as the primary disk, and we must assign the disk to a separate failure group 2, using the chdisk command as shown in Figure 6-19 and then we verify the disk status with lsdisk. Also verify that the new disk, gpfs5nsd is in the same pool as the current disk gpfs3nsd:
[SONAS]$ chdisk gpfs5nsd --failuregroup 2 EFSSG0122I The disk(s) are changed successfully! [SONAS]$ Name gpfs1nsd gpfs2nsd gpfs3nsd gpfs4nsd gpfs5nsd gpfs6nsd lsdisk File system gpfs0 gpfs0 gpfsjt

Failure group Type Pool 1 dataAndMetadata system 1 dataAndMetadata system 1 dataAndMetadata system 1 dataAndMetadata userpool 2 dataAndMetadata system 2 dataAndMetadata userpool

Status ready ready ready ready ready ready

Availability up up up

Timestamp 4/12/10 3:03 4/12/10 3:03 4/12/10 3:03 4/13/10 2:15 4/13/10 2:15 4/13/10 2:15

AM AM AM AM AM AM

Figure 6-19 Assign a new failure group to a disk

202

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

At this point we add the new disk to file system gpfsjt using the chfs -add command as illustrated in Figure 6-20 and verify the outcome using the lsdisk command.
[SONAS]$ chfs gpfsjt -add gpfs5nsd The following disks of gpfsjt will be formatted on node mgmt001st002.virtual.com: gpfs5nsd: size 1048576 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' 52 % complete on Tue Apr 13 02:22:03 2010 100 % complete on Tue Apr 13 02:22:05 2010 Completed adding disks to file system gpfsjt. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. EFSSG0020I The filesystem gpfsjt has been successfully changed. [SONAS]$ Name gpfs1nsd gpfs2nsd gpfs3nsd gpfs5nsd gpfs4nsd gpfs6nsd lsdisk File system gpfs0 gpfs0 gpfsjt gpfsjt

Failure group Type Pool 1 dataAndMetadata system 1 dataAndMetadata system 1 dataAndMetadata system 2 dataAndMetadata system 1 dataAndMetadata userpool 2 dataAndMetadata userpool

Status ready ready ready ready ready ready

Availability up up up up

Timestamp 4/12/10 3:03 4/12/10 3:03 4/12/10 3:03 4/13/10 2:26 4/13/10 2:26 4/13/10 2:26

AM AM AM AM AM AM

[SONAS]$ lsfs Cluster Devicen Mountpoint .. Data replicas Metadata replicas Replication policy Dmapi sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F sonas02 gpfsjt /ibm/gpfsjt .. 1 1 whenpossible T Figure 6-20 Add a disk to a filesystem

From the lsdisk output, we can see that gpfs5nsd is assigned to filesystem gpfsjt, and from the lsfs output, we notice that we still only have one copy of data and metadata as shown in the Data replicas and Metadata replicas columns. To activate data and metadata replication, we need to execute the chfs -R command as shown in Figure 6-21.
[SONAS]$ chfs gpfsjt -R all EFSSG0020I The filesystem gpfsjt has been successfully changed. [SONAS]$ lsfs Cluster DevicenMountpoint Data replicas Metadata replicas Replication policy Dmapi sonas02 gpfs0 /ibm/gpfs0 .. 1 1 whenpossible F sonas02 gpfsjt /ibm/gpfsjt .. 2 2 whenpossible T Figure 6-21 Activate data replication

The lsfs command now shows that there are two copies of the data in the gpfsjt filesystem. Now we perform the restripefs command with the replication switch to redistribute data and metadata as shown in Figure 6-22.

Chapter 6. Backup and recovery, availability, and resiliency functions

203

[SONAS]$ restripefs gpfsjt --replication

Scanning file system metadata, phase 1 ... Scan completed successfully. Scanning file system metadata, phase 2 ... 64 % complete on Thu Apr 15 23:11:00 2010 85 % complete on Thu Apr 15 23:11:06 2010 100 % complete on Thu Apr 15 23:11:09 2010 Scan completed successfully. Scanning file system metadata, phase 3 ... Scan completed successfully. Scanning file system metadata, phase 4 ... Scan completed successfully. Scanning user file metadata ... EFSSG0043I Restriping of filesystem gpfsjt completed successfully. [root@sonas02.mgmt001st002 dirjt]#
Figure 6-22 Restripefs to activate replication

SONAS does not offer any command to verify that the file data is actually being replicated. To verify the replication status, connect to SONAS as a root user and issue the mmlsattr command with the -L switch as illustrated in Figure 6-23. The report shows the metadata and data replication status; we can see that we have two copies for both metadata and data.
s

[root@sonas02.mgmt001st002 userpool]# mmlsattr -L * file name: f1.txt metadata replication: 2 max 2 data replication: 2 max 2 immutable: no flags: storage pool name: system fileset name: root snapshot name: file name: metadata replication: data replication: immutable: flags: storage pool name: fileset name: snapshot name: f21.txt 2 max 2 2 max 2 no userpool root

Figure 6-23 Verify that file data is replicated

Filesystem synchronous replication can also be disabled using the chfs command as shown in the following example: chfs gpfsjt -R all After changing the filesystem attributes, the restripefs command must be issued to remove replicas of the data, as shown in the following example: restripefs gpfsjt --replication

204

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6.4.5 Remote async replication


The ability to continue operations in the face of a regional disaster is handled through the async replication mechanism of the SONAS appliance. Async replication allows for one or more file systems within an SONAS file name space to be defined for replication to another SONAS system over the customer network infrastructure. As the name async implies, files created, modified or deleted at the primary location are propagated to the remote system sometime after the change of the file in the primary system. The async replication process looks for changed files in a defined file system of the source SONAS since the last replication cycle was started against it, and using the rsync tool to efficiently move only the changed portions of a file from one location to the next. In addition to the file contents, all extended attribute information about the file is also replicated to the remote system. Async replication is defined in a single direction, such that one site is considered the source of the data, and the other is the target as illustrated in Figure 6-24. The replica of the file system at the remote location must be used in a Read-only Mode, until it is needed to become usable in the event of a disaster.

SONAS cluster#1 File tree A rsync

SONAS cluster#2 File tree A replica

local ----------------------------------- distance ---------------------------- geographic

Figure 6-24 Async replication source and target

The SONAS interface nodes are defined as the elements for performing the replication functions. When using async replication, the SONAS system detects the modified files from the source system, and only moves the changed contents from each file to the remote destination to create an exact replica. By only moving the changed portions of each modified file, the network bandwidth is used very efficiently. The file based movement allows the source and destination file trees to be of differing sizes and configurations, as long as the destination file system is large enough to hold the contents of the files from the source. Async replication allows all or portions of the data of a SONAS system to be replicated asynchronously to another SONAS system and in the event of an extended outage or loss of the primary system the data kept by the backup system will be accessible in R/W by the customer applications. Async replication also offers a mechanism to replicate the data back to the primary site after the outage or new system is restored.

Chapter 6. Backup and recovery, availability, and resiliency functions

205

The backup system also offers concurrent R/O access to the copy of the primary data testing/validation of the disaster recovery mirror. The data at the backup system can be accessed by all of the protocols in use on the primary system. You can take R/W snapshot of the replica, which can be used to allow for full function disaster recovery testing against the customer's applications. Typically, the R/W snapshot is deleted after the disaster recovery test has concluded. File shares defined at the production site are not automatically carried forward to the secondary site, and must be manually redefined by the customer for the secondary location, an these shares must be defined as R/O until such time that they need to do production work against the remote system in full R/W, for example, for business continuance in the face of a disaster. Redefinition to R/W shares can be done by using the CLI or GUI. The relationship between the primary and secondary site is a 1:1 basis: one primary and one secondary site.The scope of an async replication relationship is on a file system basis. Best practices will need to be followed to ensure that the HSM systems are configured and managed to avoid costly performance impacts during the async replication cycles that can be due to the fact that the file has been migrated to offline storage before being replicated and needs to be recalled from offline storage for replication to occur.

User authentication and mapping requirements


Async replication requires coordination of the customer's Window SID domain information to the UID/GID mapping internal to the SONAS cluster as the ID mapping from the Windows domain to the UNIX UID/GID mapping is not exchanged between the SONAS systems. As the mapping are held external to the SONAS system in one of LDAP, NIS or with AD with Microsoft SFU, the external customer servers hold mapping information and must have coordinated resolution between their primary and secondary sites. Async replication will only be usable for installations that use LDAP, NIS, or AD with the SFU extensions. Note that standard AD, without SFU, will not be sufficient. The reason is that async replication can only move the files and their attributes from one site to the next. Therefore, the UID/GID information which GPFS maintains is carried forward to the destination. However, Active Directory only supplies a SID (windows authentication ID), and the CIFS server inside of the SONAS maintains a mapping table of this SID to the UID/GID kept by GPFS. This CIFS server mapping table is not carried forward to the destination SONAS. Given this, when users attempt to talk to the SONAS at the remote site, they will not have a mapping from their Active Directory SID to the UID/GID of the destination SONAS, and their authentication will not work properly, for example, users might map to the wrong users files. LDAP, NIS and AD with SFU maintain the SID to UID/GID mapping external to the SONAS, and therefore as long as their authentication mechanism is visible to the SONAS at the source and the destination site they do not have a conflict with the users and groups. The following assumptions are made for the environment supporting async replication: One of the following authentication mechanisms: either an LDAP or AD with SFU environment which is resolvable across their sites, or is mirrored/consistent across their sites such that the SONAS at each site is able to authenticate from each location. The authentication mechanism is the same across both locations The time synchronization across both sites is sufficient to allow for successful authentication

206

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Async replication considerations


This section highlights key considerations of async replication design and operation that need to be well understood: At release 1.1.1, it is best to limit the files in a filesystem that uses async replication to approximately 60 million files to limit scan time and avoid scalability issues. Replication is done on a filesystem basis, and filesets on the source SONAS cluster do not retain the fileset information located on the destination SONAS cluster. The file tree on the source is replicated to the destination, but the fact that it is a fileset, or any quota information, is not carried forward to the destination cluster's file tree. The path to the source and destination locations given to the underlying cnreplicate CLI command must not contain ':' '\' '\n' or any white space characters. The underlying paths within the directory tree being replicated are allowed to have them. The network bandwidth required to move large amounts of data, such as the first async replication of a large existing file system or the failback to an empty SONAS after a disaster, will take large amounts of time and network bandwidth to move the data. Other means of restoring the data, such as physical restore from a backup, is a preferred means of populating the destination cluster to greatly reduce the restore time and reduce the burden on the network. Disk I/O, the I/O performance is driven by GPFS and its ability to load balance across the nodes participating in the file system. Async replication performance is driven by metadata access for the scan part, and customer data access for the rsync movement of data. The number and classes of disks for metadata and customer data are an important part of the overall performance. Tivoli Storage Manager HSM stub files are replicated as regular files, and an HSM recall is performed for each file, so they can be omitted using the command line.

HSM in an async replication environment


Async replication can co-exist with SONAS file systems being managed by the Tivoli Storage Manager HSM software, which seamlessly moves files held within a SONAS file system to and from a secondary storage media such as tape. The key concept is that the Tivoli Storage Manager HSM client hooks into the GPFS file system within the SONAS, to replace a file stored within the SONAS with a stub file which appears to the end user that it still exists within the SONAS GPFS file system on disk, but actually has been moved to the secondary storage device. Upon access to the file, the Tivoli Storage Manager HSM client suspends the GPFS request for data within the file, until it to retrieve the file from the secondary storage device and replace it back within the SONAS primary storage. At which point the file can be accessed directly again from the end users through the SONAS. The primary function of this is to allow for the capacity of the primary storage to be less than the actual amount of data it is holding, using the secondary (cheaper/slower) storage to retain the overflow of data. The following list has key implications with using the HSM functionality with file systems being backed up for disaster recovery purposes with async replication:

Source and destination primary storage capacities


The primary storage on the source and destination SONAS systems needs to be reasonably balanced in terms of capacity. Because HSM allows for the retention of more data than primary storage capacity and async replication is a file based replication, planning must be done to ensure the destination SONAS system has enough storage to hold the entire contents of the source data (both primary and secondary storage) contents.

Chapter 6. Backup and recovery, availability, and resiliency functions

207

HSM management at destination


If the destination system uses HSM management of the SONAS storage, enough primary storage at the destination needs to be considered to ensure that the change delta to be replicated over into its primary storage as part of the DR process. If the movement of the data from the destination location's primary to secondary storage is not fast enough, the replication process can outpace this movement causing a performance bottleneck in completing the disaster recovery cycle. Therefore, the capacity of the destination system to move data to the secondary storage needs to be sufficiently configured to ensure that enough data has been pre-migrated to the secondary storage to account for the next async replication cycle and the amount of data to be replicated can be achieved without waiting for movement to secondary storage. For example, enough Tivoli Storage Manager managed tape drives will need to be allocated and operational, enough media, to ensure enough data can be moved from the primary storage to tape, in order to ensure that enough space is available to the next wave of replicated data.

Replication intervals with HSM at source location


Planning needs to be done to ensure that the frequency of the async replication is such that the changed data at the source location is still in primary storage when the async process is initiated. This requires a balance with the source primary storage capacity, the change rate in the data, and the frequency of the async replication scan intervals. If changed data is moved from primary to secondary storage before the async process can replicate it to the destination, the next replication cycle will need to recall it from the secondary storage back to the primary in order to copy it to the destination. The amount of files which need to be recalled back into primary storage and the duration to move them back into primary storage will directly impact the time which the async process will need in order to finish replicating.

SONAS async replication configurations


For business continuance in a disaster, SONAS supports an asynchronous replication between two SONAS systems in a 1:1 relationship. The SONAS systems are distinct from one another, such that they are independent clusters with a non-shared InfiniBand infrastructure, separate interface, storage and management nodes and so on. The connectivity between the systems is by the customer network between the customer facing network adapters in the interface nodes. The local and remote SONAS systems do not require the same hardware configuration in terms of nodes or disks, only the space at the secondary site needs to be enough to contain the data replicated from the primary site. The systems must be capable of routing network traffic between one another using the customer supplied IP addresses or fully qualified domain names on the interface nodes.

Async replication in a single direction


There are two primary disaster recovery topologies for a SONAS system. The first is where the second site is a standby disaster recovery site, such that it maintains a copy of file systems from the primary location only. It can be used for testing purposes, for continuing production in a disaster, or for restoring the primary site after a disaster.

208

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 6-25 illustrates the relationship between the primary and secondary sites for this scenario.

Users

AD w/SFU LDAP NIS

DR Users

SONAS cluster#1 File tree A rsync File tree A snapshot

SONAS cluster#2 File tree A replica File tree A replica snapshot

local ----------------------------------- distance ---------------------------- geographic

Figure 6-25 Async replication with single active direction

Async replication in two active directions


The second scenario shown in Figure 6-26 is when the second site exports shares of a file system in addition to holding mirrors of a file tree from the primary site. This scenario is when the SONAS at both sites is used for production I/O, in addition to being the target mirror for the other SONAS system's file structure. This can be in both directions, such that both SONAS systems have their own file trees, in addition to the having the file tree of the other; or might be that both have their own file tree, and only one has the mirror of the other.

User group A SONAS cluster#1 File tree A

Common AD w/SFU LDAP, NIS

User group B SONAS cluster#2 File tree A replica

rsync File tree A snapshot File tree B replica rsync File tree B replica snapshot File tree B snapshot File tree A replica snapshot File tree B

local ----------------------------------- distance ---------------------------- geographic

Figure 6-26 Bidirectional async replication and snapshots

Async replication configuration


The asynchronous replication code runs on the management and interface nodes. The configuration of async replication must be coordinated between the destination SONAS system with the source SONAS system. Asynchronous replication processes run on one or more nodes in the cluster

Chapter 6. Backup and recovery, availability, and resiliency functions

209

This is done through administration commands; you start on the destination SONAS system: 1. Define the source SONAS system to the destination SONAS: cfgrepl sourcecluster -target Where sourcecluster is the hostname or IP address of the source cluster's Management Node. 2. Define the file tree target on the destination SONAS to hold the source SONAS file tree. This creates the directory on the destination SONAS to be used as the target of the data for this replication relationship. mkrepltarget path sourcecluster Where path is the file system path on the destination SONAS, to be used to hold the contents of the source SONAS file tree, and sourcecluster is the hostname or IP address of the source cluster's management node (matching the one provided to the cfgrepl command). After the destination SONAS system is defined, the source SONAS needs to be configured through the following administrative actions: 1. Configure the async relationship on the source SONAS cluster cfgrepl targetcluster {-n count | --pairs source1:target1 [, source2:target2 ]} --source Where: targetcluster is the hostname or IP address of the target cluster's Management Node. count is the number of node pairs to use for replication. pairs is the explicit mapping of the source/destination node pairs to use for replication. 2. Define the relationship of the source file tree to the target file tree: cfgreplfs filesystem targetpath Where filesystem is the source file tree to be replicated to the destination and targetpath is the full path on the destination where the replica of the source has to be made. The configuration of the async replication determines how the system performs the mirroring of the data for disaster recovery. The configuration step identifies which SONAS nodes participate in the replication for the source and destination systems. At least one source and target pair must be specified with the cfgrepl CLI command, multiple pairs can be entered separated by commas. When setting up replication using this command, the following restrictions are in place: All source nodes must be in the same cluster. The IP addresses of the source node must be the internal IP addresses associated with the InfiniBand network with the SONAS. All target nodes must be in the same cluster. The IP addresses of the target nodes must be the public IP addresses of the interface nodes that CTDB controls. Source and target cannot be in the same cluster. The first source node specified controls the replication, and is considered the replication manager node. Multiple source nodes can replicate to the same destination.

210

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The cfgrepl command creates a configuration file, /etc/asnc_repl/arepl_table.conf, which contains the information provided with the following internal structure: src_addr1 src_addr2 src_addr3 dest_addr1 dest_addr2 dest_addr3

Part of the async configuration needs to ensure that the source cluster can communicate to the destination cluster without being challenged with the SSH/scp password requests. To achieve this, the ssh key from the id_rsa.pub from the destination SONAS system needs to be added to the authorized_keys file of the source nodes participating in the async operation.

Async replication operation


The primary function of the async replication is to make a copy of the customer data, including file system metadata, from one SONAS system to another over a standard IP network. The design also attempts to minimize network bandwidth usage by only moving the portions of the file which have been modified to the destination system. The primary elements of the async replication operation include: SONAS code performs key replication tasks such as scanning for changed files, removing files which are deleted at the source on the destination and recovery and retry of failures. UNIX rsync replication tool for comparing the source/destination files for differences, and only moving and writing the delta information about the destination to ensure that the destination matches the source.

Async replication process


The main steps involved in the async replication process are enumerated here: 1. Create local snapshot of source filesystem 2. Scan and collect a full file path list with the stat information 3. Build a new, changed and deleted file and directory list, including hard links 4. Distribute rsync tasks among defined nodes configured to participate in async replication 5. Remove deleted files and create hard links on the remote site 6. Create remote snapshot of replica file system if indicated in async command 7. Remove local snapshot if created from specified async command Async replication tools will, by default, create a local snapshot of the file tree being replicated, and use the snapshot as the source of the replication to the destination system. This is the preferred method as it creates a well defined point-in-time of the data being protected against a disaster. The scan and resulting rsync commands must be invoked against a stable, non-changing file tree which provides a known state of the files to be coordinated with the destination. Async replication does have a parameter which tells the system to skip the creation of the snapshot of the source, but the scan and following rsync will be performed on changing files. This has the following implications: Inconsistent point-in-time value of the destination system, as changes to the tree during the async process might cause files scanned and replicated first to be potentially from an earlier state than the files later in the scan. Files changed after the scan cycle had taken place are omitted from the replication. A file can be in flux during the rsync movement.

Chapter 6. Backup and recovery, availability, and resiliency functions

211

The name of the snapshot is based off of the path to the async replication directory on the destination system, with the extension _cnreplicate_tmp appended to it. For example, if the destination file tree for async is /ibm/gpfsjt/async, then the resulting snapshot directory will be created in the source file system: /ibm/gpfs0/.snapshots/ibm_gpfsjt_async_cnreplicate_tmp These snapshots are alongside any other snapshots created by the system as a part of user request. The async replication tool will ensure that it only operates on snapshots it created with its own naming convention. These snapshots do count towards the 256 snapshot limit per a file system, and can therefore be accounted for with the other snapshots used by the system. After the successful completion of async replication, the snapshot created in the source file system is removed. After the completion of the async replication, a snapshot of the filesystem containing the replica target is performed. The name of the snapshot is based off of the destination path to the async replication directory with the extension _cnreplicate_tmp appended to it. As with source snapshots, these snapshots are alongside any other snapshots created by the system as a part of user request. The async replication tool will ensure that it only operates on snapshots it created with this naming convention. These snapshots do count towards the 256 snapshot limit per a file system, and can therefore be accounted for with the other snapshots used by the system.

Replication frequency and Recovery Point Objective considerations


To ensure that data in the remote SONAS sites is as current as possible and has a small Recovery Point Objective (RPO), it seems natural to run the async replication as frequently as possible. The frequency of the replication needs to take into account a number of factors, including these: The change rate of the source data The number of files contained within the source file tree The network between SONAS systems, including bandwidth, latency, and sharing aspects The number of nodes participating in the async replication A replication cycle has to complete before a new cycle can be started. The key metric in determining the time it takes for a replication cycle to complete is the time it takes to moved the changed contents of the source to the destination based on the change rate of the data and the network capabilities. For example, a 10 TB file tree with a 5% daily change rate needs to move 500 GB of data over the course of a day (5.78 MB/s average over the day). Note that actual daily change rates are probably not consistent over the 24 hour period, and must be based off of the maximum change rate per hour of over the day. The required network bandwidth to achieve this is based on the RPO. With an RPO of 1 hour, enough network bandwidth is needed to ensure that the maximum change rate over the day can be replicated to the destination in under an hour. Part of the async replication algorithm is the determination of the changed files, which can be a CPU and disk intensive process that must be accounted for as part of the impact. Continually running replications below the required RPO can cause undue impact to other workloads using the system.

212

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Async replication scenarios


Before performing async replication, verify that the following conditions are met: Ensure you have consistent Active Directory with SFU or LDAP authentication across the sites participating in the disaster recovery environment. Mapping of users across both sites need to be consistent from Windows SID Domain to UNIX UID/GID. Ensure sufficient storage at destination for holding replica of source file tree and associated snapshots. Network between source and destination need to be capable of supporting SSH connections and rsync operations. The network between the source and destination interface nodes need sufficient bandwidth in order to account for the change rate of data being modified at the source between replicas, and the required RTO/RPO objectives to meet disaster recovery criteria. Define async relationship between interface nodes of the source and destination, define target filesystem, and create the source/destination file system relationship with cfgreplf, mkrepltarget, and cfgreplfs commands.

Performing async replications


The following are the considerations and actions to protect the data against an extended outage or disaster to the primary location. The protection is accomplished by carrying out async replications between the source and destination systems. Perform async replication between source and destination SONAS systems. Replication can be carried out manually or by scheduled operation. Manually invoke the startrepl command to initiate an async replication cycle against the directory tree structure specified in the command for the source and destination locations. Define an automated schedule for the async replication to be carried out by the system on defined directory tree structures. Monitor the stats of the current and previous async replication processes to ensure a successful completion. Async replication will raise a CIM indication to the Health Center, which can be configured to generate SMTP and/or SNMP alerts.

Disaster recovery testing


Define shares as R/O to destination file tree for accessing file resources at destination. Modification of the destination file tree as part of the validation of data or testing DR procedures must not be done. Changes to destination file tree are not tracked, and will cause the destination to differ from the source. FTP, HTTP, and SCP shares cannot be created R/O, and are a risk factor in being able to modify the target directory tree. Note that modifications to the target directory tree are not tracked by the DR recovery process, and can lead to discrepancies between the source and target file tree structures. You must access disaster recovery location file structure as read-only. You must create the shares at the destination site which are to be used to access the data from the disaster recovery location.

Chapter 6. Backup and recovery, availability, and resiliency functions

213

Business continuance
The steps for enabling the recovery site involve the following major components: 1. Perform baseline file scan of file tree replica used as the target for the async replication 2. Define shares/exports to the file tree replica 3. Continue production operation against remote system The baseline scan will establish the state of the remote system files which was last received by the production site, which will track the changes made from this point forward. For the configuration where the secondary site was strictly only a backup for the production site, establishing the defined shares for the replica to enable it for production is the primary consideration. Figure 6-27 illustrates this scenario.

Users

AD w/SFU LDAP NIS

DR Users

SONAS cluster#1 File tree A rsync File tree A snapshot

SONAS cluster#2 File tree A replica File tree A replica snapshot

local ----------------------------------- distance ---------------------------- geographic

Figure 6-27 Business continuance, active - passive, production site failure

If the second site contained its own production file tree in addition to replicas, then the failure also impacts the replication of its production file systems back to the first site as illustrated in Figure 6-28.
Common AD w/SFU LDAP, NIS

User group A SONAS cluster#1 File tree A File tree A snapshot File tree B replica File tree B replica snapshot

User group B SONAS cluster#2 File tree A replica File tree A replica snapshot File tree B File tree B snapshot

local ----------------------------------- distance ---------------------------- geographic

Figure 6-28 Business continuance, active-active, production site failure

214

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The steps to recover at the disaster recovery site are as follows: 1. Run the startrepl command with -S parameter to run a scan only on the destination system to establish a point in time of the current file tree structure. This allows the system to track changes to the destination file tree in order to assist in delta file update back to the original production system. 2. Define shares to destination file systems as R/W using the mkexport command, or change existing R/O shares used for validation/testing to R/W using the chexport command. 3. Proceed with R/W access to data at disaster recovery location against the file tree.

Recovery from disaster


The recovery of a SONAS system at a site following an extended outage will depend on the scope of the failure. The following primary scenarios are from the resulting outage: The failing site was completely lost, such that no data was retained. The failing site had an extended outage, but data was retained. The failing site had an extended outage, and an unknown amount of data has been lost.

Recovery to an empty SONAS system


If the failing site was completely lost, the recovery must take place against an empty system, either a new site location with a new SONAS system or the previous SONAS system was restored but contains none of the previously stored data. For the purposes of this document, we assume that the SONAS system has been installed, configured with IP addresses, connections to authentication servers, have been completed to be able to bring the system to an online state. The recovery steps for an active-passive configuration are as follows: 1. Configure the async replication policies such that the source to destination relationship moves from the secondary site to the new primary site. For new primary site, you need to enable it to be the destination of an async relationship and create target file tree for async replication. For the secondary site, you configure it as an async source, and define the async relationship with its file tree as the source and the one configured on the new primary site as the target. 2. Perform async replication back to the new primary site and note that it can take a long time to transfer the entire contents electronically, the time is based on the amount of data and the network capabilities. 3. Halt production activity to secondary site, perform another async replication to ensure that primary and secondary sites are identical 4. Perform baseline scan of primary site file tree 5. Define exports/shares to primary site 6. Begin production activity to primary site 7. Configure async replication of the source/destination nodes to direct replication back from the new primary site to the secondary site. 8. Resume original async replication of primary to secondary site as previously defined before disaster.

Chapter 6. Backup and recovery, availability, and resiliency functions

215

Figure 6-29 illustrates disaster failback to an empty SONAS.

Users

AD w/SFU LDAP NIS

DR Users

SONAS cluster#1 File tree A rsync File tree A snapshot

SONAS cluster#2 File tree A replica File tree A replica snapshot

local ----------------------------------- distance ---------------------------- geographic

Figure 6-29 Disaster failback to an empty SONAS

In the scenario where the second site was used for both active production usage and as a replication target, the recovery is as illustrated in Figure 6-30.

User group A SONAS cluster#1 File tree A

Common AD w/SFU LDAP, NIS

User group B SONAS cluster#2 File tree A replica

rsync File tree A snapshot File tree B replica rsync File tree B replica snapshot File tree B snapshot File tree A replica snapshot File tree B

local ----------------------------------- distance ---------------------------- geographic

Figure 6-30 Failback to an empty SONAS in an active-active environment

The loss of the first site also lost the replica of the second's site file systems, which will need to be replicated back to the first site. The recovery steps for an active-active configuration are outlined as follows: 1. Configure the async replication policies such that the source to destination moves from secondary site to the new primary site for file tree A. 2. Perform async replication with full replication parameter back of file tree A to new primary site; the time to transfer the entire contents electronically can be long time, based on the amount of data and network capabilities. 3. Halt production activity to secondary site, perform another async replication to ensure that primary and secondary sites are identical. 4. Perform baseline scan of file tree A at site 1.

216

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

5. Define exports and shares to file tree A at site 1. 6. Begin production activity to file tree A at site 1. 7. Configure async replication of the source/destination nodes to direct replication back from new primary site to secondary site for file tree A. 8. Resume original async replication of file tree A from new primary site to secondary site. 9. For the first async replication of file tree B from secondary site to new primary site, ensure that the full replication parameter is invoked, to ensure that all contents from file tree B are sent from secondary site to new primary site.

6.5 Disaster recovery methods


To rebuild a SONAS cluster, in the case of a disaster that makes the whole SONAS cluster unavailable, two types of data are required: The data contained on the SONAS cluster The SONAS cluster configuration files The data contained in the SONAS cluster can be backed up to a backup server such as Tivoli Storage Manager or other supported backup product or to recover the data from a remote intracluster replica of the data to a remote cluster or file server.

6.5.1 Backup of SONAS configuration information


SONAS configuration information can be backed up using the backupmanagementnode SONAS CLI command. This command makes a backup from the local management node, where the command is running on, and stores it on another remote host or server. This command allows you to back up one or more of the following SONAS configuration components: auth callhome cimcron ctdb derby misc role sonas ssh user yum The command allows you to specify how many previously preserved backup versions must be kept and the older backups are deleted. The default value is three versions. You can also specify the target host name where the backup is stored, by default the first found storage node of the cluster and the target directory path within the target host where the backup is stored, by default /var/sonas/managementnodebackup. The example in Figure 6-31 shows the backupmanagementnode command used to back up management node configuration information for the components, auth, ssh, ctdb, and derby.

Chapter 6. Backup and recovery, availability, and resiliency functions

217

[root@sonas02 bin]# backupmanagementnode --component auth,ssh,ctdb,derby EFSSG0200I The management node mgmt001st002.virtual.com(10.0.0.20) has been successfully backuped. [root@sonas02 bin]# ssh strg001st002.virtual.com ls /var/sonas/managementnodebackup mgmtbak_20100413041835_e2d9a09ea1365d02ac8e2b27402bcc31.tar.bz2 mgmtbak_20100413041847_33c85e299643bebf70522dd3ff2fb888.tar.bz2 mgmtbak_20100413041931_547f94b096436838a9828b0ab49afc89.tar.bz2 mgmtbak_20100413043236_259c7d6876a438a03981d1be63816bf9.tar.bz2 Figure 6-31 Activate data replication

Attention: Whereas administrator backup of management node configuration information is allowed and documented in the manuals, the procedure to restore the configuration information is not documented and needs to be performed under the guidance of IBM support personnel. The restoration of configuration data is done using the cnmgmtconfbak command that is used by the GUI when building up a new management node. The cnmgmtconfbak command can also be used for listing of available archives and it requires you to specify --targethost <host> and --targetpath <path> to any backup/restore/list. Figure 6-32 shows the command switches and how to get a list of available backups.
[root@sonas02]# cnmgmtconfbak Usage: /opt/IBM/sofs/scripts/cnmgmtconfbak <command> <mandatory_parameters> [<options>] commands: backup - Backup configuration files to the bak server restore - Restore configuration files from the bak server list - List all available backup data sets on the selected server mandatory parameters: --targethost - Name or IP address of the backup server --targetpath - Backup storage path on the server options: [-x] [-v] [-u N *] [-k N **] -x - Debug -v - Verbose --component - Select data sets for backup or restore (if archive contains data set. (Default:all - without yum!) Legal component names are: auth, callhome, cim, cron, ctdb, derby, role, sonas, ssh, user, yum, misc (Pls. list them separated with commas without any whitespace) only for backup -k|--keep - Keep N old bak data set (default: keep all) only for restore -p|--fail_on_partial - Fail if archive does not contain all required components -u|--use - Use Nth bak data set (default: 1=latest) [root@sonas02]# cnmgmtconfbak list --targethost strg001st002.virtual.com --targetpath /var/sonas/managementnodebackup 1 # mgmtbak_20100413043236_259c7d6876a438a03981d1be63816bf9.tar.bz2 2 # mgmtbak_20100413041931_547f94b096436838a9828b0ab49afc89.tar.bz2 3 # mgmtbak_20100413041847_33c85e299643bebf70522dd3ff2fb888.tar.bz2 4 # mgmtbak_20100413041835_e2d9a09ea1365d02ac8e2b27402bcc31.tar.bz2 Figure 6-32 Configuration backup restore command (..cont..)

218

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Remote server: You can back up the configuration data to a remote server external to the SONAS cluster by specifying the --targethost switch. The final copy of the archive file is performed by the scp command, so the target remote server can be any server to which we have a passwordless access established. Establishing passwordless access to a remote server does require root access to the SONAS cluster.

6.5.2 Restore data from a traditional backup


The data contained in the SONAS cluster can be backed up to a backup server such as Tivoli Storage Manager or other supported backup product; Using that backup it is possible to recover all the data that was contained in the SONAS cluster. Backup and restore procedures are discussed in more detail in 6.2, Backup and restore of file data.

6.5.3 Restore data from a remote replica


SONAS data can also be recovered from SONAS data replicas stored on a remote SONAS cluster or on a file server that is the target for SONAS asynchronous replication. To recover data stored on a remote system, you can use utilities such as xcopy and rsync to copy the data back to the original location. The copy can be performed from one of two places: 1. From a SONAS interface node on the remote system using asynchronous replication to realign the data 2. From an external SONAS client that mounts the shares for both the remote system that contain a copy of the data to be restored and for the local system that needs to be repopulated with data The first method requires that the remote system be a SONAS cluster, whereas the second method will work regardless of the type of remote system. For additional information about how to recover from an asynchronous replica, see Recovery from disaster on page 215.

Chapter 6. Backup and recovery, availability, and resiliency functions

219

220

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 7.

Configuration and sizing


In this chapter we provide information about the various SONAS configurations and the sizing information that you need to consider before ordering your SONAS appliance. We discuss the following topics: What you need to know and do to order the appliance SONAS capacity planning tools Guidelines

Copyright IBM Corp. 2010. All rights reserved.

221

7.1 Tradeoffs between configurations


As explained in Chapter 2, Hardware architecture on page 41, the SONAS solution has been designed to offer optimum flexibility for independent scalability on both user client bandwidth user needs and storage capacity and performance. SONAS can be flexibly configured to meet a wide variety of needs, which can go from the rack level to the device level inside interface nodes. Table 7-1 provides a summary of the SONAS product names and the corresponding IBM machine type/model numbers (MTMs) assigned to each product. All SONAS hardware products will be under a single IBM machine type of 2851.
Table 7-1 SONAS configurations and model numbers IBM product name SONAS Interface Node SONAS Management Node SONAS Storage Node SONAS RAID Storage Controller SONAS Storage Expansion Unit SONAS 36-port InfiniBand Switch SONAS 96-port InfiniBand Switch SONAS Base Rack SONAS Storage Expansion Rack SONAS Interface Expansion Rack Model number 2851-SI1 2851-SM1 2851-SS1 2851-DR1 2851-DE1 2851-I36 2851-96 2851-RXA 2851-RXB 2851-RXC

7.1.1 Rack configurations


In this section we describe all available configurations, from a macro level (rack) to a micro level (hardware device). SONAS Rack: SONAS Base Rack: Three versions of the SONAS Base Rack are available. Only one can lead to the smallest SONAS configuration, see Figure 7-2 on page 226. The two versions remaining have to be used with Storage Expansion Racks (see Figure 7-3 on page 227 and Figure 7-4 on page 228) because they do not include a Storage Pod. SONAS Storage Expansion Rack: Expansion Rack by definition which cannot be used alone and need to be connected to a Base Rack, see Figure 7-5 on page 229. SONAS Interface Expansion Rack: Expansion Rack by definition which cannot be used alone and needs to be connected to a Base Rack, see Figure 7-6 on page 230.

222

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

7.1.2 InfiniBand switch configurations


In Switches on page 47 we provide information about the internal and external switches in the SONAS appliance. In this section we discuss the following InfiniBand switch configurations for you to consider prior to ordering your SONAS appliance: All major components of a SONAS system, such as interface nodes, storage nodes and management node, are interconnected by a high-performance low-latency InfiniBand 4X Double Data Rate (DDR) fabric. The 36 port InfiniBand switch configuration (with two InfiniBand switches for redundancy) allows you to interconnect up to 36 nodes inside your SONAS cluster. Nodes can be one Management Node, multiple Storage Nodes and multiple Interface Nodes. As a reminder, recall that you have two Storage Nodes per Storage Pod. The 96 port InfiniBand switch configuration has actually two switches for redundancy and the larger port count provides you with the largest SONAS configuration with up to 60 Storage Nodes (or 30 Storage Pods) and 30 Interface Nodes, and obviously the Management Node. Note that the 96 port InfiniBand switch is actually made of up to four Board Lines - each Board Line is composed of 24 InfiniBand ports. This means that in the 96 port InfiniBand switch configuration, you can start out with only 24 ports and dynamically and non-disruptive add IB ports (in groups of 24) if needed. Important: Note that it is not possible to do a field upgrade from a 36 port InfiniBand switch configuration to a 96 port InfiniBand switch, configuration, as they are part of various Base Rack Models. It is therefore important to plan and specify your SONAS InfiniBand switch configuration properly at initial installation. InfiniBand is the layer which interconnects all Storage Nodes, Interface Nodes and Management Nodes. Four InfiniBand ports of each InfiniBand switch are reserved for the following components: Three reserved for future use One for the required management node The remaining InfiniBand ports are available for interface nodes and storage nodes. You can find in Table 7-3 on page 231 and Table 7-4 on page 232, the maximum capacity available inside your SONAS Storage Solution based on InfiniBand switch configuration.

7.1.3 Storage Pod configuration


In Storage pods on page 50 we provide details of the Storage Pods in the SONAS appliance, so be sure to review it prior to reading this section, which provides considerations related to Storage Pod configuration: Controller configuration: Inside a Storage Pod you can have at least one Storage Controller which is mandatory, and up to two Storage Controllers, each one with an optional Storage Expansion. Intermediate configurations are allowed and performance scales up with the number of disks inside the Storage Pod, up to 100% from the single Storage Controller configuration when adding another Storage Controller, and up to 85% when adding a Storage Expansion. You can choose to fill one Storage Controller or one Storage Expansion with SAS or Nearline SAS drives. Drives are added in groups of 60 SAS drive configuration: SAS drives have a faster spindle rotation speed than Nearline SAS drives and are have more fault tolerant design criteria, but they have smaller capacity and also require more power consumption. Currently the maximum number of SAS drives inside a single Storage Expansion Rack is 360 drives1.

Chapter 7. Configuration and sizing

223

Nearline SAS or SATA drive configuration: SATA or Nearline SAS drives have a larger capacity than SAS drives, up to 2 TB in SONAS configuration. These drives require less power per drive than SAS drives. This is why the current maximum configuration inside a single Storage Expansion rack with Nearline SAS drives is 480 drives. Be sure to have the same type and size of physical drives inside a logical storage pool; it is not desirable to mix drives types/sizes within the SONAS logical storage pool.

7.1.4 Interface node configuration


Review Interface nodes on page 43 for an understanding of the function of the Interface Node before reading this section. This section provides considerations related to Interface Node configuration. Memory capacity. By default, one interface node comes with 32 GB of memory. This amount of memory is used for caching. From a performance perspective Interface Nodes are caching frequently used files in memory. As SONAS is design to keep the connection between one client and one Interface Node until the client unmounts the share access, clients performance will increase due to this caching mechanism. For enhanced performance you can increase this amount of memory and then your chance to have files still in cache, by purchasing additional 32 GB of memory (FC 1000) or 128 GB memory (FC 1001) in your Interface Node. Feature code 1000 provides an additional 32 GB of memory in the form of eight 4 GB 1333MHz double-data-rate three (DDR3) memory dual-inline-memory modules (DIMMs). You can order only one of FC 1000 per interface node. FC 1001 installs a total of 128 GB of memory in the interface node. Installation of FC 1000 or FC 1001 into an already installed interface node is a disruptive operation that requires you to shut down the interface node. However, a system with a functioning interface node continues to operate with the absence of the interface node being upgraded. Client network connectivity. We introduced SONAS concepts in Chapter 1, Introduction to IBM Scale Out Network Attached Storage on page 1. We focused on how clients are accessing data through Interface Nodes. This access is physically allowed by the connectivity between Interface Nodes and clients, namely the client network. Each interface node has five 1 gigabit Ethernet (GbE) path connections (ports) on the system board, two of the onboard Ethernet ports connect to the internal private management network within the SONAS system for health monitoring and configuration. One is used for connectivity to the Integrated Management Module (IMM) that enables the user to remotely manage the interface node, the two onboard Ethernet ports remaining are used for the client network. By default the two 1 GigE connections are configured in active/failover mode. You can change this default configuration to an aggregate mode and then increase the theoretical bandwidth from 1 Gb/s to 2 Gb/s. If this default bandwidth does not fulfill your needs, you can add an extra Quad-port 1 GbE NIC (FC 1100) adapter. This feature provides a quad-port 10/100/1000 Ethernet PCIe x8 adapter card. This NIC provides four RJ45 network connections for additional host IP network connectivity. The adapter supports a maximum distance of 100m using Category 5 or better unshielded twisted pair (UTP) four-pair media. You are responsible for providing the network cables to attach the network connections on this adapter to their IP network. One Feature Code 1100 can be ordered for an interface node. The manufacturer of this card is Intel, OEM part number: EXPI9404PTG2L20.

When a future 60 amp power option is available for the SONAS storage expansion rack, this restriction will be lifted.

224

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The other option is to add an extra Dual-port 10 Gb Converged Network Adapter (FC 1101). Indeed this feature provides a PCIe 2.0 Gen 2 x8 low-profiles dual-port 10 Gb Converged Network Adapter (CNA) with two SFP+ optical modules. The CNA supports short reach (SR) 850nm multimode fiber (MMF). You are responsible for providing the network cables to attach the network connections on this adapter to their IP network. One of feature code 1101 can be ordered for an interface node. The manufacturer of this card is Qlogic, OEM part number: FE0210302-13. The last option is to purchase both adapters. Figure 7-2 summarizes available connectivity configurations within a single interface node.
Table 7-2 Number of ports available in Interface Node. Number of ports in various configurations of a single Interface Node on board 1 GbE connectors Available features Feature Code 1100, Quad-port 1 GbE Network Interface Card (NIC) 0 0 1 (with 4 ports) 1 (with 4 ports) Feature Code 1101, Dual-port 10 GbE Converged Network Adapter (CNA) 0 (1 with two ports) 0 1 (with 2 ports) Total number of data path connectors

2 2 2 2

2 4 6 8

Cabling considerations: For each interface node in the base rack no InfiniBand cables need to be ordered. Copper InfiniBand cables are automatically provided for all interface nodes in the base rack. The length of the copper InfiniBand cables provided is based on the position of the interface node in the rack. You must however order InfiniBand cable features for inter-rack cabling after determining the layout of your multi-rack system, tor example, if an Interface Expansion Rack is required. Indeed, multiple InfiniBand cable features are available, the main difference is if you are using a 36 or 96 port InfiniBand switch configuration. The connectors are not the same inside the two models, the 36 ports model requires QSFP connectors while the 96 ports model requires X4 connectors as shown in Figure 7-1.

Figure 7-1 InfiniBand connectors.

For additional Quad-Port adapters, Cat 5e cables or better are required to support 1 Gb network speeds, but Cat 6 provides better support for 1 Gbps network speeds. The 10 GbE data-path connections support short reach (SR) 850 nanometer (nm) multimode fiber (MMF) optic cables that typically can reliably connect equipment up to a maximum of 300 meters (M) using 2000MHz*km BW OM3 fiber.

Chapter 7. Configuration and sizing

225

7.1.5 Rack configurations


This section provides configuration considerations related to the available SONAS racks. See the following diagrams shown in Figure 7-1 through Figure 7-6 on page 230.

Feature Code 9005 Base Rack


The Feature Code 9005 Base Rack provides a Base Rack configuration which contains the smallest SONAS configuration. In this rack the following elements are required: Embedded GigE switches Management Node InfiniBand switches Two Interface Nodes Two Storage Nodes One Storage Controller If using 2TB Nearline SAS drives in the minimum one Storage Controller with the minimum one group of 60 drives, the capacity will be 120 TB. If more capacity is required, Disk Storage Expansion and Storage Controller can be added to have a full Storage Pod configuration (shown). If future growth requires an additional Storage Pod, or more interface nodes are required, Storage Expansion racks and Interface Expansion racks can be attached to this Base Rack. The only limitation here is the number of InfiniBand connections remaining on the 36 ports InfiniBand switches.

Figure 7-2 Base Rack Feature Code 9005

For additional information about the SONAS Rack Base Feature Code 9005, see Rack types: How to choose the correct rack for your solution on page 61.

226

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Feature Code 9003 Base Rack


The Feature Code 9003 Base Rack configuration has interface nodes. It does need a storage expansion rack. In this rack the following elements are required: Embedded GigE switches Management Node InfiniBand switches Two Interface Nodes All these mandatory components have to be used with an additional Storage Expansion rack. There are no Storage Pods in this Base Rack. According to the Storage capacity needed, you have to add as many Storage Expansion racks as required. If more Interface Nodes are needed for external client network connectivity, you can add one Interface Node Expansion rack. The only limitation here is the number of InfiniBand connections remaining on the 36 port InfiniBand switches.

Figure 7-3 Base Rack Feature Code 9003

For additional information about the Feature Code 9003 SONAS Rack Base, see Rack types: How to choose the correct rack for your solution on page 61.

Chapter 7. Configuration and sizing

227

Feature Code 9004 SONAS Base Rack


Feature Code 9004 Base Rack configuration can be used in a SONAS configuration where a large 96 port switch is needed. In this rack the following elements are required: Embedded GigE switches Management Node InfiniBand switches Two Interface Nodes These mandatory components have to be used with an additional Storage Expansion rack. Note that there are no Storage Pods in this Base Rack. Based on your Storage capacity requirements, you just add as many Storage Expansion racks as required. If additional Interface Nodes are needed for external client network connectivity; you can add one Interface Node Expansion rack. With the two 96 port InfiniBand switches configuration, you are able to achieve the largest SONAS configuration. The current largest SONAS capacity is a raw 14.4 TB capacity, if using all 2TB Nearline SAS drives.

Figure 7-4 Base Rack Feature Code 9004

For additional information about the Feature Code 9004 SONAS Base Rack, see Rack types: How to choose the correct rack for your solution on page 61.

228

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Storage Expansion Rack


The Storage Expansion Rack can be used is combination with a Base Rack. In this rack the mandatory elements are: Embedded GigE switches Two Storage Nodes One Storage Controller The Storage Expansion Rack must be used in combination with one of the Base Racks. If you need more capacity in your SONAS configuration you can add Disk Storage Expansion units, and additional Storage Controllers into the existing Storage Pod located in the Base Rack. After a Storage Pod is full, you can add an additional Storage Pod through a Storage Expansion Rack. There are a minimum of two Storage Nodes per Storage Expansion Rack each require an InfiniBand connection and up to four Storage Nodes. The maximum number of Storage Pods in a SONAS environment is 30, which will be housed in 15 Expansion Racks. The previous statement assumes that you are not limited in terms of InfiniBand switch connections.

Figure 7-5 Storage Expansion Rack.

For additional information about the SONAS Storage Expansion Rack, see SONAS storage expansion unit on page 53.

Chapter 7. Configuration and sizing

229

Interface Expansion Rack


The Interface Expansion Rack can be used is combination with a Base Rack. In this rack the mandatory components are: Embedded GigE switches One Interface Node The Interface Expansion Rack must be used in combination with a Base Rack (whichever version you have). If you need more bandwidth for your SONAS client network, you can do this by adding Interface Nodes. The Interface Expansion Rack provides space for additional Interface Nodes if your interface node space is full in your SONAS Base Rack. There is a minimum of one Interface Node per Storage Expansion Rack. Interface nodes are interconnected by connection to the InfiniBand switches in the base rack. The maximum number of Interface Nodes in a SONAS cluster is 30 (assuming that you are not constrained in your number of InfiniBand switch ports).

Figure 7-6 Interface Expansion Rack.

For additional information about the SONAS Interface Expansion Rack, see Rack types: How to choose the correct rack for your solution on page 61.

230

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

7.2 Considerations for sizing your configuration


We previously described the flexibility you have when designing the most appropriate SONAS solution for your environment. According to your requirements see Inputs for SONAS sizing on page 233 you will be able to size your SONAS environment in the Sizing the SONAS appliance on page 241. This means to determine: The appropriate number of interface nodes The appropriate capacity of the system and Storage Pods The appropriate InfiniBand switch configuration But also determine: The appropriate client network connectivity The appropriate amount of memory inside interface nodes The appropriate disk technology Important: In SONAS, it is not possible field upgrade from a 36 port InfiniBand switch configuration to a 96 port InfiniBand switch configuration. Therefore, size your initial SONAS InfiniBand switch configuration accordingly. If you know for sure that your needs will grow and you will then require expansion racks, you can still start with the 96 port configuration but not fill the entire switch at initial install time. A full 96 ports IB switches is actually composed by four board lines of 24 IB ports. Carefully consider and plan your SONAS hardware configuration. Do note that from a software standpoint, all software function is included with the SONAS Software (5639-SN1) licenses, you do not have to worry about potential key features not being included that you might need later. For more details on SONAS Software, see Chapter 3, Software architecture on page 73. Table 7-3 on page 231 shows the maximum storage capacity using the 36 port InfiniBand switch.
Table 7-3 Maximum storage capacity with the 36 ports InfiniBand switch configuration. Interface nodes Maximum Storage pods Number of storage nodes Maximum number of Storage controllers 28 28 26 26 24 24 22 22 20 20 Maximum number of Disk Storage Expansion units 28 28 26 26 24 24 22 22 20 20 Maximum number of hard disk drives 3360 3360 3120 3120 2880 2880 2640 2640 2400 2400 Maximum storage capacity using 2 TB disks 6720 6720 6240 6240 5760 5760 5280 5280 4800 4800

3 4 5 6 7 8 9 10 11 12

14 14 13 13 12 12 11 11 10 10

28 28 26 26 24 24 22 22 20 20

Chapter 7. Configuration and sizing

231

Interface nodes

Maximum Storage pods

Number of storage nodes

Maximum number of Storage controllers 18 18 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2

Maximum number of Disk Storage Expansion units 18 18 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2

Maximum number of hard disk drives 2160 2160 1920 1920 1680 1680 1440 1440 1200 1200 960 960 720 720 480 480 240 240

Maximum storage capacity using 2 TB disks 4320 4320 3840 3840 3360 3360 2880 2880 2400 2400 1920 1920 1440 1440 960 960 480 480

13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1

18 18 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2

Table 7-4 shows the maximum storage capacity using the 96 port InfiniBand switch.
Table 7-4 Maximum Storage Capacity with 96 port InfiniBand switch configuration. Number of Storage pods Number of Storage nodes Number of Storage controllers 2 4 6 8 10 12 14 16 Number of Disk Storage Expansion units 2 4 6 8 10 12 14 16 Maximum number of hard disk drives 240 480 720 960 1200 1440 1680 1920 Maximum storage capacity using 2 TB disks 480 960 1440 1920 2400 2880 3360 3840

1 2 3 4 5 6 7 8

2 4 6 8 10 12 14 16

232

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Number of Storage pods

Number of Storage nodes

Number of Storage controllers 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Number of Disk Storage Expansion units 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

Maximum number of hard disk drives 2160 2400 2640 2880 3120 3360 3600 3840 4080 4320 4560 4800 5040 5280 5520 5760 6000 6240 6480 6720 6960 7200

Maximum storage capacity using 2 TB disks 4320 4800 5280 5760 6240 6720 7200 7680 8160 8640 9120 9600 10080 10480 11040 11520 12000 12480 12960 13440 13920 14400

9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60

7.3 Inputs for SONAS sizing


We have previously described in detail the SONAS architecture, both from a hardware and software point of view. Then we pointed out various SONAS appliance configurations you can have, which provides you great flexibility. The question you might have now is, what is the most appropriate SONAS configuration? Will your SONAS Storage Solution fit all your needs? Not too large, not too small? As explained previously, SONAS is a file system solution that provides access to SONAS clients through network shares, NFS, CIFS, or FTP. On top of these protocols, your daily business applications are running. These critical applications might even rely on ISV-like databases, or work in combination with other like virtualization or backup solutions.

Chapter 7. Configuration and sizing

233

This means, depending of your application but also your entire software stack, you might require more performance or capacity here or there in order to fit your requirements. Basically like all Storage solutions, the better you know how your application works the easier it is to size the storage which will host your data. In this section we describe in detail various business application characteristics from a Storage point of view and how they can impact your choice regarding the sizing. First of all, you have to keep in mind that network file based solutions are not always the most appropriate option according to your workload. SONAS is only one product from the wide IBM Storage product portfolio. For instance if your daily business application is using Direct Attached Storage (DAS), by design this locally storage attached solution will have a lower latency than a network attached solution like SONAS. If your business application is latency bound, SONAS might not be the better option. Still dealing with network design, if your application is using very small access in a random way, a network attached solution will not provide you tremendous performance. For more details regarding good candidates for a SONAS solution, see the chapter titled SONAS usage cases in IBM Scale Out Network Attached Storage, SG24-7874.

7.3.1 Application characteristics


Typically, key application characteristics for SONAS are as follows: Capacity and bandwidth Access pattern Cache hit ratio The first characteristic of this list is the easiest one to determine. Keep in mind that the better you know your application, the easier it will be to size the storage solution. That means it can be very challenging to determine precisely these characteristics. The capacity and bandwidth required and/or currently in use in your existing environment is the easiest one to find whereas the cache hit ratio is much more complex to determine.

7.3.2 Workload characteristics definition


When planning your SONAS appliance the characteristics of your workloads must be taken into consideration.

Capacity and bandwidth


The capacity you might require for your SONAS will depend on your utilization. Indeed, if you planned to use SONAS only for your business application, you must be able to determine the capacity you need. But you can use your application in a virtualized or database environment which means you will need to deal with these middleware to determine the total capacity needed. Moreover if you have also planned to host your users data on your SONAS, with or without quotas, you have to extrapolate the amount of storage they will need. Last but not least backup and restore or disaster recovery policy can increase in a significant way the total amount of space you will need. To see more details regarding these critical aspects refer to the Chapter 6, Backup and recovery, availability, and resiliency functions on page 181. Regarding the bandwidth requirement, this is more business application related. Depending on your application, digital media or high performance computing for instance, you might require the largest bandwidth possible. Keep in mind you will have to process the data next, there is no need to have a large bandwidth if your application cannot handle it. However you can plan to have multiple user sessions running in parallel, which will require a larger bandwidth. 234
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

You will probably not use only one application. Indeed you can have many applications and many users using the shares for various purposes through NFS or CIFS, you can also have ISVs running and accessing data. For all these workloads you will have to accumulate bandwidth and capacity.

Access pattern
The access pattern is more difficult to identify. What we mean here by access pattern is the workload access type, random or sequential, the file size and the read/write ratio. When your application is performing IO on the Storage Pool, the IO access must be considered as sequential if you are able to get contiguous data when performing successive request access which have consecutive physical addresses. On the contrary it will be considered as random if, in order to retrieve contiguous data, you have to access to non-consecutive locations on the drive. In both cases, random or sequential, these access are writing or reading files. The file size is basically the size required on the Storage solution to store these files. We do not take into account snapshot, backup or replication concepts which can increase the size required to store a single file. Finally your business application does not perform reads or writes exclusively. The read/write ratio is actually the ratio between the average number of reads and the average number of writes during execution. Again, you will probably not use only one application. As SONAS allows you to use for all users and applications a unique global name space, this will lead to a mix of access types in case of multiple applications. Indeed if one application does sequential access while the second one is also accessing data from a random way, the global access on the SONAS File System might not be 100% sequential or 100% random. This is the same for file size and read/write ratio.

Cache hit ratio


The cache hit ratio information is definitively the more complex information to retrieve. From your application point of view, IO operations are embedded in computations or process operations. When performing an IO read request, after the first request the data is stored in memory, that means the data is cached. The next time you will need this exact same data (or an extract) you can retrieve it directly from the cache and then avoid access time from the storage pool. This is obviously much more efficient as access from memory are far more faster than access from disk. If you are able to retrieve it, this is a cache hit. If you are not able to retrieve it, and then need to access it from the disk again, this is a cache miss. The cache hit ratio is the ratio between the number of cache hits and the number of access requests. In case of multiple applications, many software layers or middleware accessing data, it might be even more difficult to determine this cache hit ratio. We have defined key characteristics from your application. We now explain how they can impact performance from a Storage point of view.

7.3.3 Workload characteristics impact


In this section we describe the impact the various workload characteristics can have on your SONAS performance.

Chapter 7. Configuration and sizing

235

Capacity and bandwidth


There is no real impact because of the capacity in a performance point of view in a standard utilization. Obviously if you do not have space left on device to perform snapshot, backup, or even IO operations for your application, you will just not be able to use it anymore. If you did not use all NSD available when you created your first SONAS file system, you can include these in order to add space, or do cleaning; see Chapter 10, SONAS administration on page 313 for more details. Also keep in mind that your storage needs will grow with months and years of utilization anyway. See Table 7-5 on page 242 for an overview of raw usable capacity. Regarding the bandwidth consideration the overall Storage bandwidth is determined by first the number of Storage Pods inside your SONAS Storage Solution, then the number of Storage Controllers and Storage Expansion inside each Storage Pod. Last but not least the type of disks inside (refer to Chapter 2, Hardware architecture on page 41). Indeed as SONAS is based on GPFS which is a scalable File System solution your overall bandwidth will increase with the number of Storage elements inside. The previous Storage consideration is only the first step, as your SONAS users will access data on shares through Interface Nodes, you need to ensure that interface nodes are able to deliver all the storage bandwidth. There are two ways to increase the Interface Node bandwidth, increase the number of interface nodes or increase their network connectivity bandwidth. This can be done with additional features such as the 10GigE adapter or the Quad ports GigE adapter (refer to Tradeoffs between configurations on page 222). In case your SONAS Storage Solution bandwidth does not fit your environment needs, your application will need more delay to complete. It can be particularly harmful in case of real time application like video surveillance. Note that these considerations are provided to help you better understand how to size your SONAS in order to fit your requirements the first time. Indeed SONAS is a scale out Storage Solution, this simply means that if you need more capacity or more bandwidth or a combination of both, you can add extra Storage capacity, the GPFS layer will allow you to add this new capacity and bandwidth in your existing environment up to 14.4 PB. But as described in Tradeoffs between configurations on page 222, even if SONAS provides flexibility there are configuration considerations like the InfiniBand switches capacity which are not swapable in a non-inexpensive way. This is clearly the aim of this section, to size and foresee further needs.

Access pattern
As discussed, access type can be random or sequential. On top of the access type you have the file size and then the read/write ratio. Basically from benchmark considerations, small random access are often associated to IO/s performance, while large sequential one are often associated to MB/s performance. These two values are basic disk and Storage Controllers characteristics, but they can also be appropriate metrics for your business application. Study and consider the workload access types, patterns, and file sizes of your installation, to obtain a better idea of your application or environment needs in term of IO/s or MB/s. From a disk level the IO/s are determined by the disk drive technology. The IO/s metric can be determined from disk characteristics such as: Average Seek Time. Rotational Latency

236

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The Average Seek Time is the time required by the disk drive to position the drive head over the correct track, while the Rotational Latency is the time required for the target sector to rotate under the disk head before it can be read or written. Average latency is estimated as the time required for one half of a full rotation. You can see Average Seek time and Rotational Latency in manufacturer specifications. Today in disk drive technology, there are two major types. High performance 15K RPM Serial Attached SCSI (SAS) disk drives have a lower Average Seek time and Rotational Latency (due to their higher rotational speed) than high capacity 7.2K RPM Nearline SAS or Serial Advanced Attachment Technology (SATA) disk drives. Currently, 15K RPM SAS disk drives generally have seek times in the 3-4 millisecond range, and are capable of sustaining an IOPS rate between 160 and 200 IOPs per second per disk drive. 7.2K RPM Nearline SAS and SATA disk drives generally have longer seek times in the 8-9 millisecond range, and are capable of sustaining 70-80 IOPS per second per disk drive. These are general Rules of Thumb for planning your SONAS system. You have 60 disks (SAS, Nearline SAS, or SATA) within a single Storage Controller drawer. Note that this does *not* mean that the Storage Controller performance is 60 times the performance of a single disk. Same if you add an additional Storage Expansion to increase the number of disk to 120 per storage Controller, the overall performance will not be 120 times the performance of a single disk. First because of the software RAID technology used in the Storage Controller read and write performances are not the same, even if both read and write are by definition a single IO, you do not have the same performance for a read and write operations, even for two write operations, you will not have same performance. Because of Raid 5 and Raid 6 definition, as described in Figure 7-7, you have to deal with parity bit. Actually depending on the RAID algorithm, the parity bit is not always on the same disk.

Figure 7-7 Raid 5 and Raid 6 definition

Chapter 7. Configuration and sizing

237

The biggest penalty you might have in terms of performance is when you are trying to update a single data inside your RAID array as shown in Figure 7-8. Four IO operations are required for a single data updates.

Figure 7-8 Raid 5 write penalty

For a full write on all disks there are no longer penalties; see Figure 7-9.

Figure 7-9 Raid 5 entire write

238

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Then you have a bottleneck due to the Storage Controller itself just as every Storage Controller, it just cannot scale perfectly with the number of disks inside. Regarding the bandwidth or MB/s characteristic you also have the amount of bandwidth from a single disk, whether they are SAS, Nearline SAS, or SATA. Which is not the same depending on the IO access read or write, and the overall bandwidth of the Storage Controller will not be the sum of each disks bandwidth. Both because of the RAID overhead and the Storage Controller bottleneck. Basically MB/s performance, which deals with sequential access, depends on the Storage controller technology and algorithms. Refer to the Chapter 2, Hardware architecture on page 41 to review performance differences between a configuration with two storage controllers and a configuration with a storage controller and an storage expansion unit. In both configurations, you have the same number of disks, but with two controllers performance is better. As read and write performances are not identical from a Storage Controller point of view, this is why the Storage controller presents both read and write performances; the read/write ratio can help you to better size your SONAS environment. Even if IO/s and MB/s of characteristics of a single disk are supposed to be the same for both read and write requests, the RAID layer implies additional overhead for write access even if algorithms used in Storage controllers are designed to perform as well as possible for both read and write. This means for the exact same capacity, and storage configuration, you will have better performance with a high read/write ratio, simply because your application is performing much more read than write requests.

Cache hit ratio


As described, the cache hit ratio describes the reuse potential of your business application. With a high cache hit ratio, you are able to reuse more frequently the data stored in cache (in the SONAS, this is data that is cached in the interface node). As memory accesses are far much more efficient than disk accesses, the more your application will reuse interface node cached data, the faster the access will be, and then the faster the IO will complete. In order to show you the advantage of the caching effect, let us deal with rough figures (rules of thumb). The caching effect means the data is in the Interface Node memory. Access time to retrieve data from server memory is roughly a few s (micro second). Compare to hundred of s through the network (1GigE or 10GigE), this is a small amount of time. That means roughly hundred of micro second to access data for cache hit. For cache miss, you have to add the InfiniBand latency to go to storage nodes, which is a few micro seconds. If you are lucky the data is in storage controller cache which is few ms (millisecond) which means few thousand of micro second and a total of thousand of micro second (network GigE. InfiniBand and cache miss are negligible compare to ms cache hit on Storage Controller). If you are unlucky, the data is on disks, and you will need additional milliseconds to access it. Roughly a factor 10 000 between cache hit on Interface Node and data on disks. You will find more information describing latency impact in Figure 4.6 on page 155. Basically this is exactly the same concern as inside a server CPU. When executing algorithms, data can be stored and accessed from memory, or from various levels of cache. Depending on the CPU architecture you can have up to three levels of cache. The closer to the cpu the data is (cpu <=> cache level 1 <=> cache level 2 (<=> cache Level 3) <=> memory), the faster the result is. Moreover cache miss means you first have to seek for something before getting it from next level (extra waste of time) and then extra penalties for performance. In the SONAS, interface nodes have been designed to cache data for users reuse. Using technology based on the IBM GPFS file system, interface nodes will access data from storage pods, and more precisely from storage nodes, through the InfiniBand network and
Chapter 7. Configuration and sizing

239

then will store data in cache. SONAS keeps a specific user allocated to the same interface node for the duration of the session (regardless of whether it is NFS or CIFS or others), specifically in order to provide users these caching capabilities. This amount of cache can be increased with the appropriate feature codes. SONAS interface nodes can configured with a total of 32 GB, 64 GB, or 128 GB of cache per interface node. If your application basically does much reuse, or has a high cache hit ratio, this caching capability can increase performance, especially with the additional cache memory. But if the caching ratio is low, then the caching effect might not help you much. Keep in mind than even if your application has a high cache hit ratio, if you have many applications running, many users accessing data, or a significant software stack, you must have enough cache memory to take advantage of this caching. If you do not know precisely your workload characteristics, next we describe methods to retrieve them. This will be useful for your SONAS sizing, and also helpful to understand more precisely your daily business application I/O behavior.

7.3.4 Workload characteristics measurement


In this section we describe how to measure the various workload characteristics.

Capacity and bandwidth


Your Storage Administrators can easily determine the amount of storage your current environment is using. If you are currently using an environment with separate islands of data that has to be managed independently, you will have to gather information using your storage management software. If Tivoli Storage Productivity Center is installed and monitoring your storage environment, you might be able to retrieve this information directly from it. Regarding the bandwidth, options available are to measure bandwidth from your storage subsystems. If you want only the capacity, you will need to measure it from each single storage subsystem. The second option is to measure it from the server-side running your application and any other software layers that might require bandwidth.

Access pattern
Tivoli Storage Productivity Center is again an appropriate option to retrieve any kind of IO utilization and access information. If Tivoli Storage Productivity Center is not set up on your current environment, you can retrieve the read and write information and it can be the size from your storage subsystem if it includes monitoring in the graphical view or CLI. From your servers, you can find this information from tools such as iostat, dstat, nestat, vmstat, and nmon. You can also ask your application developers for information regarding IO access.

Cache hit ratio


Cache hit ratio can be more difficult to discover. Options for determining cache hit ratio are Tivoli Storage Productivity Center, or existing monitoring tools on your existing storage subsystems. However keep in mind that it is often that case, you might only find partial information. If you are looking from a storage subsystem point of view you will find information regarding a particular storage subsystem and then only for applications running on it. If you are looking at the application level with your application developers, you might not be aware of IO access required by other users sharing your same storage subsystem.

240

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Tools that are helpful include nmon and nmon_analyser tools from UNIX systems (NFS), or perfmon tools with appropriate counters from Windows (CIFS) for graphical reports. From NFS access, you can also use iostat/dstat/netstat/vmstat tools.

7.4 Powers of two and powers of ten: The missing space


To avoid miscalculations and surprises it is important that you understand the measurement units in which computing capacities are expressed. We have bits and bytes, megabytes and gigabytes. How large is a gigabyte? Well, it depends how you calculate it. When disk vendors discuss storage capacity they usually are presenting it in powers of 10 so that 1 Gb is 109 (ten to the power of 9) or 1,000,000,000 bytes. When you format or report on the capacity of a storage device the numbers are generally represented in a binary scale based on 210 or 1,024 bytes also termed a kilobyte. Using this notation, 1 Gb of space in a file system is calculated as 10243 , that is equivalent to 230 , or 1,073,741,824 bytes. So if you format your new 1 Gb decimal drive you will see only 0,93GB binary. You are missing 7% of your space, and the effect gets more pronounced as the capacity grows. The table in Figure 7-10 shows how space is calculated using the decimal and binary notation and the percentage that they differ calculated as the binary representation divided by the decimal representation minus one.

Unit 1-kilo 2-mega 3-giga 4-tera 5-peta 6-exa 7-zetta

dec 10^3 10^3 10^6 10^9 10^12 10^15 10^18 10^21

bin 2^10 bin_dec% 2^10 2% 2^20 5% 2^30 7% 2^40 10% 2^50 13% 2^60 15% 2^70 18%

Figure 7-10 Space difference with binary and decimal notations

Note that at the terabyte scale we are off by around 10% that grows to 13% at the petabyte scale. That is also the reason why you only get around 55GB of space on your laptops 60GB drive. From a SONAS perspective the disk storage space is presented and discussed in the decimal notation and 60TB of disk is 60x1012 bytes of storage. On the other hand when you format the disk drives the space is presented using the binary notation, so 1TB is 240 bytes. Note that when we discuss network capacities and bandwidth using the Gbit and 10 Gbit Ethernet adapters we are using the binary notation. So a 1 Gbit Ethernet link corresponds to 230 bits per second or 134,217,728 bytes per second.

7.5 Sizing the SONAS appliance


We have looked at SONAS trade-off configurations, and workload characteristics. We now look at how to size the appropriate SONAS Storage Solution. We need to determine: Capacity requirements Storage Subsystem disks type Interface node connectivity and memory configuration Base Rack Model

Chapter 7. Configuration and sizing

241

7.5.1 SONAS disk drives and capacities


We previously looked at the capacity and bandwidth workload characteristics. According to these values, you can size the Storage Capacity of your SONAS appliance. The minimum SONAS configuration as described in Tradeoffs between configurations on page 222, is 60 disks within a single storage pod. Depending on the disk technology, this minimum Raw Usable Capacity ranges from 20 TB with 450 GB SAS disks, and up to 93 TB with 2TB Nearline SAS or SATA disks. See Table 7-5 for a description of the disk drives that have been supported in SONAS over time.
Table 7-5 Disk type and capacity for a drawer of 60 disk drives in SONAS Feature Code 6 x 1300 6 x 1301 6 x 1302 6 x 1310 6 x 1311 Disk Technology SATA SATA Nearline SAS SAS SAS Disk Capacity 1 TB 2 TB 2 TB 450 GB 600 GB Total Disks 60 60 60 60 60 Data Disks 48 48 48 48 48 Raw Usable Capacity 46 540 265 619 456 93 080 531 238 912 93 080 531 238 912 20 564 303 413 248 27 419 071 217 664

This table provides the amount of raw storage per 60 disk drive drawer. A SONAS storage pod will contain four drawers of 60 drives, thus the minimum configuration can grow up to four times the capacity shown. A full storage pod will have a total of 240 disk drives. If all 2 TB drives are used, this equates to a raw capacity of 480 TB, or after the RAID overhead is taken away, is a usable capacity of 372 TB. Capacity in SONAS is added by adding additional Storage Expansion Racks.

7.5.2 SONAS disk drive availabilities over time


Disk technologies advance rapidly in todays world. IBM continually upgrades disk technology to keep pace with the industry. Accordingly, the following SONAS disk drive availabilities over time have been as shown next. IBM Scale Out Network Attached Storage became available in March 2010 with the following available hard disk drive options and associated SONAS feature codes: Feature code 1300 1TB SATA Feature code 1301 2 TB SATA Feature code 1310 450 GB SAS In July 2010 the following hard disk drive option became available: Feature code 1311 600 GB SAS In November 2010, following the industry evolution in disk drive technology towards Nearline SAS drives as replacements for SATA drives, IBM SONAS introduced the following hard disk drive option: Feature code 1302 2 TB Nearline SAS HDDs

242

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Simultaneously in November 2010, IBM withdrew from SONAS marketing the following disk drive options: Feature Number 1300: 1 TB SATA hard disk drives Feature Number 1301: 2 TB SATA hard disk drives In January 2011 IBM withdrew from SONAS marketing, the following disk drive option: Feature Number 1310: 10-pack of 450GB 15K RPM SAS hard disk drives

7.5.3 Storage subsystem disk type


Regarding the Storage Subsystem disk type, it is more related to the access pattern considerations described in the previous section. Currently, you can choose between SAS or Nearline SAS disk drive technology in granularity of 60 drives per drawer. You can intermix them at the granularity of the drawer inside the storage pod. Refer to Chapter 6, Backup and recovery, availability, and resiliency functions on page 181 for more information regarding best practices and storage pools. If your workload is focused on sequential performance combined with lower cost, (that is, you are more interested in MB/s than IOP/s) then Nearline SAS drives might be the right option. Nearline SAS drives offer good results on large sequential access both in read and write. In comparison to SAS drives, Nearline SAS is nearly comparable in terms of MB/sec. SAS drives do offer the highest sequential performance per drive. However, at the amount of capacity required, Nearline SAS can be acceptable in many if not most circumstances. For your applications that are heavily IO/s oriented, that is, small and random access, then SAS drives are clearly be the indicated choice. If you do not need a large capacity, SAS drives are a good option for both random and sequential access, small or large, read or write. If you have more than one application running on your SONAS environment, you can implement a mix of SAS and Nearline SAS drives, create separate logical storage pools, and allocate your applications accordingly. After you determined your capacity requirements and then designed a initial version of your SONAS storage, then come back for an additional iteration to assure that the workload characteristics are properly matched with the storage disk drive types. The goal is an appropriate mix of SAS and Nearline SAS drives upon which to base your desired logical storage pool design. Finally, do an iteration that considers backup and restore, replication and snapshot, and disaster recovery considerations for storage capacity. In doing so, you will arrive at a precise view of the number of disk drives, disk types, and storage pods required for your SONAS.

7.5.4 Interface node connectivity and memory configuration


You might notice that sometimes we refer to the SONAS bandwidth and sometimes to the MB/s of the Storage Controller or disks. Actually the bandwidth is also from a SONAS users point of view, which means from the Interface Nodes point of view. Interface Nodes have nothing to do with classic storage considerations such as MB/s. The required bandwidth, from both your applications, users or middleware, will determine the Interface Node configuration. You need to determine the appropriate Interface Node configuration in order to be able to size the entire SONAS Storage Solution. Depending on the bandwidth you need you have the option to either increase the number of Interface Nodes, increase their network connectivity capability, or even both.

Chapter 7. Configuration and sizing

243

The first thing to do is to determine if your bandwidth is an overall bandwidth or a peak bandwidth. If you planned to have many users accessing the SONAS through multiple share access, accessing data independently but in parallel, then you are more interested in an overall bandwidth. If you planned to access your data hosted by your SONAS Storage Solution through few servers running your daily business application which required a huge bandwidth, then you are more focused on a peak bandwidth. As previously described, the interface node default configuration is two GigE connections in failover/backup configuration to the public network. This means a single 1 Gb/s connection for each Interface Node. Moreover you will access by NFS or CIFS protocol which can lead to extra overhead. Maximum packet size for NFS is 32KB for example. First option is to double this bandwidth by changing the configuration in aggregate mode, and then have a 2Gb/s bandwidth (still NFS or CIFS however). To increase the overall bandwidth the simplest way is to add extra Interface Nodes in your SONAS configuration, you can even add an Interface Expansion Rack in order to increase the number of interface nodes to the maximum allowed in a SONAS configuration. If you are more focused on a peak bandwidth, your first option is to add an extra Quad ports GigE connectivity feature. This means a total of six GigE connections to be configured in a single failover/backup configuration (this is the default but does absolutely not increase your bandwidth), three failover/backup configurations which will result to a 3Gb/s bandwidth, or an aggregate configuration which leads to a 6Gb/s bandwidth. Still with NFS and CIFS protocol on top. Another option is to add a 10GigE dual port adapter which can also be configured in failover/backup or in aggregate configuration. This will respectively lead to a 10 Gb/s bandwidth and 20 Gb/s bandwidth, still with NFS and CIFS protocols on top. The last option is to use both additional adapters which means 6 GigE connections and two 10GigE connections. Obviously if you add these extra features for each Interface Node, you will increase mechanically the overall bandwidth. The bandwidth considerations we discussed lead to a draft Interface Nodes configuration. As with Storage Pods, the cache hit ratio parameters will make you do a second iteration in your Interface Node configuration process. The cache hit ratio means that your application can reuse data and take advantage of the SONAS caching ability. In order to increase this caching potential you have two options. First increase the number of Interface Nodes or increase the amount of memory inside each Interface Node.

7.5.5 Base rack sizing


Based on the Capacity and Storage subsystem disk type, you have identified a number of Storage Pods for the SONAS configuration. Based on the information we have previously discussed regarding interface node connectivity and memory configuration, you have identified a number of Interface Nodes for your SONAS configuration. Both of these numbers will now help you to determine the appropriate Base Rack Model. As described in Rack types: How to choose the correct rack for your solution on page 61 there are three Base Rack Models: The first one contains Interface Nodes, no Storage Pod and a 36 port InfiniBand switch The second one contains Interface Nodes, no Storage Pods and a 96 port InfiniBand switch The third one contains both Interface and Storage Pod and a 36 port InfiniBand switch.

244

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Depending on the total number of Storage Pods, and Interface Nodes from the previous sections you are able to determine the total number of InfiniBand ports required for your SONAS configuration. Keep in mind that a single Storage Pod require two InfiniBand ports because is partially made of two Storage Nodes. More than 36 nodes in total imply the second base rack model with the 96 ports InfiniBand switch. Here again the aim of SONAS is to be a Scale Out solution, which means extra Storage added if needed and extra Interface Nodes added if needed. So you do not have to be extremely precise and exhaustive when configuring your SONAS. The only requirement is to choose carefully the base rack model, which means the InfiniBand switch, because there is no way to swap base rack model configuration in a non expansive way. You might still order for the 96 ports InfiniBand switch and partially fill it with a single 24 port InfiniBand Board Line and scale out later if needed.

7.6 Tools
There are tools available that can be used to help you analyze your workload and, using workload characteristics, size your SONAS system.

7.6.1 Workload analyzer tools


In this section we describe tools that you can use to help you understand your workload characteristics.

nmon tool
nmon is a free tool to analyze AIX and Linux performances and gives you a huge amount of information all on one screen. Instead of using five or six separate tools, nmon can gather information such as CPU utilization, Memory use, Disks I/O rates, transfers, and read/write ratios, free space on file systems, disk adapters, network I/O rates, transfers, and read/write ratios, Network File System (NFS), and much more, on one screen, as well as dynamically updating it. This nmon tool can also capture the same data into a text file for later analysis and graphing for reports. The output is in a spreadsheet format (.csv). As just described, you can use nmon in order to monitor dynamically your environment, but you can also capture data into a .csv file and other tools such as nmon_analyser or nmon_consolidator to analyze data, and generate graphs or tables. The aim of nmon_analser is to use nmon .csv output files generated during your run for instance, as input and generates an Excel spreadsheet where each tab gather information regarding CPU consumption, memory utilization or disks usage, and describe results with schemes and table. Basically in a big infrastructure you might need to monitor every node, server, and client help. If you need a big picture instead of one screen capture per node, you might want to gather all nmon information for a typical application, typical run, or typical subset of nodes. Instead of nmon_analyzer, you need nmon_consolidator, which is basically the same tool but which consolidates many .csv into a single Excel spreadsheet document. This might be useful also for a virtualized environment, where you might need to monitor resources from a host point of view (Red Hat 5.4 host, VMware ESX, or AIX IBM PowerVM), instead of a virtual machines point of view. In Figure 7-11 and Figure 7-12, you can see a CPU utilization summary both from a single LPAR (Power virtualization with AIX) and the entire system (Power AIX).

Chapter 7. Configuration and sizing

245

Figure 7-11 Single Partition output

Figure 7-12 Entire System output

Links
For more detailed information regarding these tools, see the corresponding website: nmon tool: http://www.ibm.com/developerworks/aix/library/au-analyze_aix/ nmon_analyser http://www.ibm.com/developerworks/wikis/display/Wikiptype/nmonanalyser nmon_consolidator http://www.ibm.com/developerworks/wikis/display/WikiPtype/nmonconsolidator

perfmon tool
Like the nmon tools suite for Linux and AIX systems, you can use the Windows perfmon tool in order to gather and analyze your workload application. Indeed Windows Operating Systems provide the perfmon (perfmon.exe) utility to collect data. Perfmon allows for real-time performance counter visualization or historical reporting. Basically there are various performance indicators or counters that are gathered into objects, such as these: processor memory physical disks network interfaces 246
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Then each object provides individual counters such as these: %processor time for processor Pages read/s for memory %disk write time for physical disks current bandwidth for network interfaces As explained, after you have selected appropriate counters, you can visualize results dynamically, or record these into a Excel spreadsheet for later analysis and report. Unlike nmon you do not need additional tools to analyze. First you need to launch the perfmon tool, then generate a data collection during the application execution for example, or during the whole day. After the data collection is generated, you can open the log file generated, visualize it and even generated a .csv file. Finally open the generated csv file with Excel and create a scheme and table as described in Figure 7-13 and Figure 7-14.

Figure 7-13 Processor counters

Figure 7-14 Physical Disks counters

Chapter 7. Configuration and sizing

247

248

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 8.

Installation planning
In this chapter we provide information about the basic installing planning of the SONAS appliance. We do not include considerations for Tivoli Storage Manager, replication, or ILM.

Copyright IBM Corp. 2010. All rights reserved.

249

8.1 Physical planning considerations


In Chapter 7, Configuration and sizing on page 221 we provided information to size the appropriate SONAS configuration according to your needs. In this chapter you will complete an entire questionnaire in order to prepare and plan your future SONAS installation. But before going further, there are critical physical considerations you need to think about. In this chapter we provide you some technical information regarding physical requirements. As for any IT solution, you will have to handle some physical features related to load floor or power requirements. All the following considerations have been IBM certified, they are not only measurements. This is exactly what is required in your data center.

8.1.1 Space and floor requirements


We described in detail in Chapter 7, Configuration and sizing on page 221 all SONAS frame model types, the Base and Expansion Rack models. You also have to handle these rack models in the Sizing the SONAS appliance on page 241 in order to create your SONAS Storage Solution configuration which will fit all your performance and scale out requirements. Your SONAS Storage Solution might be the smallest one, which means a single Base Rack model, or the largest one, which means 17 racks in your data center including the Base Rack, the Interface Expansion rack and 15 Storage Expansion racks. It will probably be something between, but whatever the number of SONAS racks you might need, there are some floor load requirements you need to consider inside your data center. Although your SONAS configuration might contain half empty racks, because of mandatory components (see Chapter 2, Hardware architecture on page 41) all weight considerations assume that all racks are full. Basically even before considering the floor requirement, you have to ensure that all logistic and setup aspects meet IBM requirements. You can find all the logistics consideration, loading dock, elevator or shipping containers and setup considerations like the use of raised floor or rack cabling in the manual IBM Scale Out Network Attached Storage, GS32-0716. The next step will then be to identify each SONAS rack model inside your configuration and your floor load rating of the location you plan to install the SONAS system. Floor load requirements are critical, so you need to carefully follow IBM specifications and ensure that your location meets the minimum used by IBM, which is 342 kg per m (70 lb. per ft). You then need to determine the appropriate weight distribution area by following IBM rules described in Figure 8-1. Remember that the weight distribution areas absolutely cannot overlap, they are calculated for the maximum weight of the racks (racks full) and for an individual frame model only.

250

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 8-1 Weight distribution area

Assuming the sizing you have done led to a configuration with one Base rack and two Storage Expansion racks, which will be setup on the same row in your data center. You must ensure first that all SONAS racks have at least 762 mm (30 in.) of free space in front and in the back of them. According to the weight distribution areas, they also have 155+313=468 mm or 313+313=626 mm between them as described in Figure 8-3. In Figure 8-2, you see the detailed SONAS rack dimensions.

Figure 8-2 Rack dimensions

Chapter 8. Installation planning

251

Figure 8-3 Floor loading example

As just described, the minimum space between a base rack RXA #3 and a Storage Expansion rack RXB is 468 mm (18.4 in.), whereas it is 626 mm (24.6 in.).

8.1.2 Power consumption


Each SONAS Rack has either four intelligent PDUs (iPDUs) or four base PDUs. The iPDUs can collect energy use information from energy-management components in IBM devices and report the data to the Active Energy Manager feature of IBM Systems Director for power consumption monitoring. Each SONAS rack also requires four 30A line cords, two as primary and two as secondary. As described in Chapter 2, Hardware architecture on page 41 you can configure your SONAS system with either Nearline SAS drives, SAS drives, or a combination of both1. As of the writing of this book, because SAS and Nearline SAS drives do not have the same power consumption requirements, as shown in Figure 8-4, you can see the maximum number of Storage controller and disk expansions units you can have inside a single Storage Expansion rack according to power consumption requirements.
1

Newer Nearline SAS drive technology replaced older SATA drive technology in SONAS in November 2010.

252

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 8-4 Maximum number of Storage according to SAS drives power consumption

Tip: In the future, when the 60 amp power option becomes available for the SONAS Storage Expansion rack, this restriction due to the higher power consumption of SAS drives will be lifted. In Figure 8-5, you can find additional information regarding the power consumption measurements done for heavy usage scenario with fully populated SONAS Racks and SAS drives exclusively.

Figure 8-5 Power consumption measurements

8.1.3 Noise
Based on acoustics tests performed for a SONAS system, these values apply: 90dB registered for a fully populated Base rack (2851-RXA) system Up to 93 bB in worst scenario with a fully populated Storage Expansion Rack (2851-RXB) The system operating acoustic noise specifications are as follows: Declared Sound Power Level, LwAd is less that 94 dBA @ 1m at 23C However, you can reduce the audible sound level of the components installed in each rack by up to 6 dB with the acoustic doors feature (feature code 6249 of each SONAS rack).

8.1.4 Heat and cooling


In order to optimize the cooling system of your SONAS Storage Solution, you can use a raised floor to increase air circulation in combination with perforated tiles. Additional information regarding such a setup can be found in the manual SONAS Introduction and Planning Guide, GA32-0716.

Chapter 8. Installation planning

253

See Figure 8-6 for information regarding the temperature and humidity while the system is in use or shut down.

Figure 8-6 Cooling measurements

8.2 Installation checklist questions


In this section we review the questionnaire you had to fill in order to complete your SONAS solution. This information is critical and we refer to some of these questions from that list in further sections. In Table 8-1are questions related to the management node configuration like the cluster name or domain name for example. Then in Table 8-2 on page 255 you will find questions regarding the remote access for the Call Home feature. In this section you will provide information about the quorum configuration in Table 8-3 on page 256, CLI credentials in Table 8-4 on page 256, and the nodes location in Table 8-5 on page 256. The last questions refer to DNS, NAT or Authentication method configurations. You will then be prompted to fill some fields related to these topics in Table 8-6 on page 257, Table 8-7 on page 257 and Table 8-8 on page 258.
Table 8-1 Management Node configuration Question # 1 2 Field Cluster Name Domain Name Value Notes This is the name of your IBM SONAS cluster. Example: sonascluster This is the your network domain name. Example: mydomain.com Note: The Cluster Name and Domain Name are typically used in combination. Example: sonascluster.mydomain.com Specify 1, 2, or 3. You need to use a predetermine range for your Private IP Network. But this range must not conflict with your already existing network configuration which will be used as the Public IP Network in order to access management or interface nodes. The available IP Address ranges are: 1. 172.31.*.* 2. 192.168.*.* 3. 10.254.*.* Note: If you are already using the first range, then choose the second one, except if you are using both then choose the third one.

Internal IP Address Range

254

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Question # 4

Field Management console IP address

Value

Notes This IP address will be associated to the management node. It has to be on the public network and accessible by the storage administrator. This is the numeric Gateway of your Management console. This is the numeric Subnet Mask of your Management console.

5 6 7 8

Management console gateway Management console subnet mask Host name Root Password mgmt001st001

This is your preassigned Management Node host name. You can specify here the password you want to be set on the management node for root access. By default it is Passw0rd (where P is capitalized and 0 is zero). SONAS need to synchronize all nodes inside the cluster and your Authentication method, you then have to provide at least one NTP Server. A second NTP Server is best for redundancy. Note: The Network Time Protocol (NTP) Server(s) can be either local or on the internet. Note: Only the Management Node require a connection to your NTP server, indeed itself will become the NTP server for the whole cluster. Referring to Time zone list. You have to specify the number corresponding to your location. You need to specify here the total quantity of rack frames in this cluster.

NTP Server IP Address

10 11

Time Zone Number of frames being installed

In Table 8-2, you provide some information regarding the remote configuration of your SONAS in order to enable the Call Home feature.
Table 8-2 Remote configuration Question # 12 13 Field Company Name Address This is the address where your SONAS is located. Example: Bldg. 123, Room 456, 789 N DataCenter Rd, City, State In case of severe issue this is the primary contact that IBM service will call. This is the alternate phone number. Optional. You have to provide the IP Address of the Proxy Server if it is needed to access the internet for Call Home feature. Optional. You have to provide the port of the Proxy Server if it is needed to access the internet for Call Home feature Value Notes

14 15 16

Customer Contact Phone Number Off Shift Customer Contact Phone Number IP Address of Proxy Server (for Call Home) Port of Proxy Server (for Call Home)

17

Chapter 8. Installation planning

255

Question # 18

Field Userid for Proxy Server (for Call Home) Password for Proxy Server (for Call Home)

Value

Notes Optional. You have to provide the userid of the Proxy Server if it is needed to access the internet for Call Home feature. Optional. You have to provide the Password of the Proxy Server if it is needed to access the internet for Call Home feature.

19

In Table 8-3 you provide the quorum topology of your SONAS system.
Table 8-3 Quorum topology Question # 20 Field Quorum storage nodes Value Notes 1. Your first action will be to select an odd number of Quorum Nodes, you can use both Interface and Storage Nodes. 2. Valid choices are 3, 5, or 7. 3. If your cluster is composed by more than a single frame, you must spread your quorum nodes in several frames. 4. After you have built the appropriate topology, write the Interface and Storage Node numbers in the table.

21

Quorum interface nodes

In Table 8-4 you provide CLI credentials. Your SONAS administrator will use these credentials to connect to the CLI or GUI in order manage your entire SONAS Storage Solution.
Table 8-4 CLI credentials Question # 22 23 Field CLI User ID CLI Password Value Notes Your SONAS administrator will use this ID for GUI or CLI connection, for instance: myuserid This is the password corresponding to the User ID. Example: mypassword

In Table 8-5, enter the locations of the SONAS nodes in your data center.
Table 8-5 Nodes location Question # 24 25 Node Number Management Node Interface Node #1 ... 26 Storage Node ... Rack number/ position Node Serial Number InfiniBand Port Number

The Rack number is actually the number of the rack containing this node whereas the position indicates position (U) where this node is installed in the rack. The Node Serial Number is the serial number of the node. The InfiniBand Port Number is the InfiniBand Switch port number where the node is connected. You do not have to give this information for preinstalled nodes.

256

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In Table 8-6 and Table 8-7 you provide information regarding your existing DNS and NAT configuration.
Table 8-6 DNS configuration Question # 27 Field IP Address of Domain Name Services (DNS) Server(s) Value Notes You need to provide here the numeric IP address of one or more Domain Name Services (DNS) Servers you are using inside your network. In order to avoid a bottleneck because of a single DNS server, and then improve performance, you can set up multiple DNS servers in a round-robin configuration. This is the Domain Name of your cluster (such as mycompany.com). Note: This field is not required and can be left blank. If it is left blank then no Domain name will be set for the cluster. This is a list of one or more Domain Names to be used when trying to resolve a shortname (example: mycompany.com, storage.mycompany.com, servers.mycompany.com). Note: This field is not required and can be left blank. If it is left blank then no search string will be set for the cluster.

28

Domain

29

Search String(s)

Table 8-7 NAT configuration Question # 30 Field IP Address Value Notes The numeric IP address requested here is the IP address needed to access the Management and Interface Nodes through the internal private network connections using NAT overloading, meaning that a combination of this IP Address and a unique port number will correspond to each node (Management Node and Interface Nodes only). This IP Address must not be the same as the Management Node IP Address or the Interface Node IP Addresses. This is the Subnet Mask associated with the IP Address. This is the CIDR (/XX) equivalent of the Subnet Mask specified. This is the Default Gateway associated with the IP Address.

31 32 33

Subnet Mask CIDR Equivalent of the Subnet Mask Gateway

The next step is to provide details of your authentication method in Table 8-8. You will have to integrate your SONAS system into your existing authentication environment, which can be Active Directory (AD) or Lightweight Directory Access Protocol (LDAP).

Chapter 8. Installation planning

257

Table 8-8 Authentication methods Question # 34 Field Authentication Method Value [ ] Microsoft Active Directory or [ ] LDAP Notes What is the authentication method you are using in your environment?

35

AD Server IP address

In case of an Active Directory configuration you need to provide the numeric IP address of the Active Directory server. This User ID and the Password next will be used to authenticate to the Active Directory server. This is the password associated to the userid. In case of a LDAP configuration, you need to provide the numeric IP address of the remote LDAP server. [ ] Off [ ] SSL (Secure Sockets Layer) [ ] TLS (Transport Layer Security) In case of a LDAP configuration you can choose to use an open (unencrypted) or a secure (encrypted) communication between your SONAS cluster and the LDAP server. In case of secured communication two methods can be used: SSL or TLS. When SSL or TLS is used, a security certificate file must be copied from your LDAP server to the IBM SONAS Management Node. This is the Cluster Name specified in Table 8-1 (example: sonascluster) This is the Domain Name specified in Table 8-1 (example mydomain.com) These are the suffix, rootdn, and rootpw from the /etc/openldap/slapd.conf file on your LDAP server. This information can be found at /etc/openldap/slapd.conf on your LDAP server. If you choose the SSL or TSL method, you need to provide the path on the IBM SONAS Management Node where you will copy the Certificate file.

36 37 38-0

AD Directory UserID AD Password LDAP IP Address

38

LDAP SSL Method

39 40 41 42 43

LDAP Cluster Name LDAP Domain Name LDAP Suffix LDAP rootdn LDAP rootpw

44

LDAP Certificate Path

After your SONAS appliance has been integrated into your existing environment, and the authentication method set up accordingly, you will be able to create exports in order to grant access to SONAS users. But before you create these exports, the protocol information in Table 8-9 is required.

258

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Table 8-9 Protocols access Question # 45 Field Protocols Value [ ] CIFS (Common Internet File System) [ ] FTP (File Transfer Protocol) [ ] NFS (Network File System). Notes These are all supported protocols which can be used in order to accessing data. You need to check one or more according to your needs.

46

Owner

This is the owner of the shared disk space. It can be a username, or a combination of Domain\username. Example: admin1 Example: Domain1\admin1 If you need a CIFS share you need to detail some options. The options are a comma-separated key-value pair list. Valid CIFS options are: browseable=yes comment="Place comment here" Example: -cifs browseable=yes,comment="IBM SONAS" IP Address: Subnet Mask: CIDR Equivalent: Access Options: [ ] ro or [ ] rw [ ] root_squash or [ ] no_root_squash [ ] async or [ ] sync. If you need an NFS share you need to provide some NFS options. If NFS options are not specified, the NFS shared disk will not be accessible by SONAS clients. NFS options include a list of client machines allowed to access the NFS shared drive, and the type of access to be granted to each client machine. Example: -nfs 9.11.0.0/16(rw,no_root_squash,async)

47

CIFS Options

48

In Table 8-10 you will need to provide details of Interface subnet and network information.
Table 8-10 Interface subnet Question # 49 Field Subnet Value Notes Basically this is the public network. This network will be use for communication between SONAS Interface Nodes and your application servers. As an example, if you have three Interface Nodes on a single network, with IP addresses from 9.11.136.101 through 9.11.136.103, then the your subnet will be 9.11.136.0, and the subnet mask 255.255.255.0 (/24 in CIDR format). This is the Subnet Mask associated with the Subnet listed. This is the Subnet Mask listed, converted to CIDR format.

50 51

Subnet Mask CIDR Equivalent of the Subnet Mask

Chapter 8. Installation planning

259

Question # 52

Field VLAN ID

Value

Notes Optional. This is a list of one or more Virtual LAN Identifiers. A VLAN ID must be in the range from 1 to 4095. If you do not use VLANs then leave this field blank. Optional. This is a name assigned to a network group. This allows you to reference a set of Interface Nodes using a meaningful name instead of a list of IP addresses or host names. If you do not use network groups then leave this field blank.

53

Group Name

54

Interface Node Number / hostname

IP Address Subnet/Subnetmask Gateway

(repeat it for each interface node).

You must complete these tables prior to any SONAS setup. Some of the information in the table is critical, for example, the authentication method. Your SONAS Storage Solution will not work if not properly configured. In this section and the previous one, our main concern was the pre-installation process. In the following sections we assume that your SONAS Storage Solution has been properly preconfigured and setup with all required information. However your SONAS is not yet ready to use, you will have to complete some additional planning steps regarding Storage and Network configuration, and last but not least the authentication method and IP address load balancing configuration.

8.3 Storage considerations


In this section we discuss storage considerations for your SONAS configuration.

8.3.1 Storage
SONAS storage consists of disks grouped in sets of 60 SAS, Nearline SAS, or SATA hard drives. Enclosures with Nearline SAS or SATA drives always are configured in RAID 6 array, enclosures with SAS drives always are configured in RAID 5 array. Because of power consumption it is possible to use maximum 360 SAS drives and 480 Nearline SAS or SATA drives per rack. It means that maximum capacity for SATA drives is 14.4 PB and for SAS drives is 2.43 PB.2 SONAS supports up to 2,147,483,647 files with 1 MB block size with maximum supported approximately 60 million files with async replication. The maximum number of files in a SONAS is constrained by the formula: maximum number of files = (total file system space/2) / (inode size + subblock size) For file systems that will be doing parallel file creates, if the total number of free inodes is not greater than 5% of the total number of inodes there is the potential for slowdown in file system access. Take this into consideration when changing your file system.
2

In the future, when a 60 amp power option is available for the SONAS Storage expansion rack, this restriction will be lifted.

260

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

8.3.2 Async replication considerations


If you are going to use async replication with SONAS, then you must take into consideration that the async replication tool will, by default, create a local snapshot of the file tree being replicated and use the snapshot as the source of the replication to the destination system. This is the preferred method because it creates a well defined point-in-time of the data being protected against a disaster.

Snapshot handling
After the successful completion of async replication, the snapshot created in the source file system is removed. However you have to ensure sufficient storage at the replication source and destination for holding replica of source file tree and associated snapshots. A snapshot is a space efficient copy of a file system at the point when the snapshot is initiated. The space occupied by the snapshot at the time of creation and before any files are written to the file system is a few KB for control structures. There is no additional space required for data in a snapshot prior to the first write to the file system after the creation of the snapshot. As files are updated, the space consumed increases to reflect the main branch copy and also a copy for the snapshot. The cost of this is the actual size of the write rounded up to the size of a file system block for larger files or the size of a sub-block for small files. In addition, there is a cost for additional inode space and indirect block space to keep data pointers to both the main branch and snapshot copies of the data. This cost grows as more files in the snapshot differ from the main branch, but the growth is not linear because the unit of allocation for inodes is chunks in the inode file which are the size of the file system sub-block. After the completion of the async replication, a snapshot of the filesystem containing the replica target is performed. Impact of snapshots to the SONAS capacity depends on the purpose the snapshots are used for a specific use case. If the snapshots are used temporarily for the purpose of creating an external backup and removed afterwards the impact is most likely not significant for configuration planning. In cases where the snapshots are taken frequently for replication or as backup to enable users to do an easy restore the impact cannot be disregarded. But the concrete impact depends on the frequency a snapshot is taken, the length of time when each snapshot exists and the number of the files in the file system are changed by the users as well as the size of the writes/changes.

Backup for disaster recovery purposes


The following list has key implications with using the HSM functionality with file systems being backed up for disaster recovery purposes with the async replication engine: Source/Destination Primary Storage Capacity: The primary storage on the source and destination SONAS systems must be reasonably balanced in terms of capacity. Because HSM allows for the retention of more data than primary storage capacity and async replication is a file based replication, planning must be done to ensure the destination SONAS system has enough storage to hold the entire contents of the source data (both primary and secondary storage) contents.

Chapter 8. Installation planning

261

HSM at destination: If the destination system uses HSM of the SONAS storage, consider having enough primary storage at the destination to ensure that the change delta can be replicated over into its primary storage as part of the Disaster Recovery process. If the movement of the data from the destination locations primary to secondary storage is not fast enough, the replication process can outpace this movement, causing a performance bottleneck in completing the disaster recovery cycle. Therefore, the capacity of the destination system to move data to the secondary storage must be sufficiently configured to ensure that enough data has been pre-migrated to the secondary storage to account for the next async replication cycle and the amount of data to be replicated can be achieved without waiting for movement to secondary storage. For example, enough Tivoli Storage Manager managed tape drives will need to be allocated and operational, along with enough media.

8.3.3 Block size


The size of data blocks in a SONAS file system can be specified at file system creation. This value cannot be changed without recreating the file system. SONAS offers the following block sizes for file systems: 16 KB, 64 KB, 128 KB, 256 KB (the default value), 512 KB, 1 MB, 2 MB and 4 MB. The block size defines the minimum amount of space that file data can occupy in a sub-blocks, a sub-block is 1/32 of the block size and it defines the maximum size of a I/O request that SONAS sends to the underlying disk drivers. File system blocks are divided into sub-blocks for use in files smaller than a full block or for use at the end of a file where the last block is not fully used. Sub-blocks are 1/32 of a file system block which is the smallest unit of allocation for file data. As an example, the sub-block size in a file system with the default 256 KB is 8 KB. The smallest file will be allocated 8 KB of disk space and files greater than 8 KB, but smaller than 256 KB will be rounded up to a multiple of 8 KB sub-blocks. Larger files will use whole data blocks. The file system will attempt to pack multiple small files in a single file system block, but it does not implement a guaranteed best fit algorithm for performance reasons. A Storage controller in SONAS is configured to support 32 KB chunk sizes, this value is preconfigured and cannot be changed. It means that for an example default SONAS block size (256 KB) is divided by 8 disks in each RAID array and written to 8 different drives in 32 KB chunks each. Each RAID array in SONAS consists of 10 disks, 8 is available for data, because Nearline SAS or SATA RAID-6 consists of 8+P+Q and SAS RAID-5 consists of 8+P+spare drives. On Figure 8-7 you can see how SONAS writes a block of data with default block size.

262

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

SONAS default 256 KB block size

can be changed

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

8 8 8 8 KBKBKBKB

1/32 of block size

can not be changed

32 KB chunk

32 KB chunk

32 KB chunk

32 KB chunk

32 KB chunk

32 KB chunk

32 KB chunk

32 KB chunk

parity

parity/spare

RAID array
Figure 8-7 How SONAS writes data to disks

In file systems with a mix of variance in the size of files within the file system, using a small block size will have a large impact on performance when accessing large files. In this kind of system it is suggested that you use a block size of 256 KB (8 KB sub-block). Even if only 1% of the files are large, the amount of space taken by the large files usually dominates the amount of space used on disk, and the waste in the sub-block used for small files is usually insignificant. Larger block sizes up to 1MB are often a good choice when the performance of large files accessed sequentially are the dominant workload for this file system. The effect of block size on file system performance largely depends on the application I/O pattern. A larger block size is often beneficial for large sequential read and write workloads. A smaller block size is likely to offer better performance for small file, small random read and write, and metadata-intensive workloads. The efficiency of many algorithms that rely on caching file data in a page pool depends more on the number of blocks cached rather than the absolute amount of data. For a page pool of a given size, a larger file system block size means fewer blocks cached. Therefore, when you create file systems with a block size larger than the default of 256 KB, it is best that you increase the page pool size in proportion to the block size. Data is cached in interface nodes memory, so it is important to plan correctly RAM memory size in interface nodes.

8.3.4 File system overhead and characteristics


There are two classes of file system overhead in the SONAS file system. One is the basic overhead of a file system and the overhead required to manage an amount of storage. This includes disk headers, basic file system structures and allocation maps for disk blocks. The second overhead is the space required in support of user usage of the file system. This includes user directories plus the inodes and indirect blocks for files and potential files.
Chapter 8. Installation planning

263

Both classes of metadata are replicated in a SONAS system for fault tolerance. The system overhead depends on the number of LUNs and the size of the LUNs assigned to a file system; but are typically on the order of a few hundred MB or less per file system. The metadata in support of usage can be far higher, but is largely a function of usage. The cost of directories is totally a function of usage and file naming structures. A directory costs at least the minimum file size for each directory and more if the number of entries is large. For a 256KB block size file system, the minimum directory is 8 KB. The number of directory entries per directory block varies with customer usage. For example, if the average directory contained 10 entries, the cost of a directory is 800 bytes. This number can be doubled for metadata replication. The cost of inodes is a function of how the file system is configured. By default, SONAS is configured with 50 M inodes preallocated and a maximum allowed inodes value of 100 M. By default, an inode requires 512 bytes of storage. The defaults require 50 GB of storage for inodes (512 * 50 M *2 for replication). If the user actually had 50 M files with an average directory holding 10 files, the cost for directories is about 80 GB. Higher density directories require less space for the same number of files. There might also be a requirement for space for indirect blocks for larger files. These two categories dominate to overhead for a file system, with other minor usages such as recovery logs or message logs.

8.3.5 SONAS master file system


A file system must be assigned as a master to be used to avoid a split-brain scenario out of the viewpoint of a clustered trivial database (CTDB). A split brain scenario occurs when two nodes within SONAS lose communication with each other (that is, network breaks) and in this case without master file system it is not possible to decide which node must be the recovery master. Without the master file system mechanism in case of failure a internal part of SONAS data might be corrupted. A CTDB requires a shared file system so all nodes in the cluster will access to the same lock file. This mechanism assure that only one part of the cluster in a split-brain scenario stays up and running, and can access the data. The recovery master checks consistency of the cluster and in case of failure performs recovery process. Only one node at a time can act as a recovery master. Which node is designated the recovery master is decided by an election process in the recovery daemons running on each node. The SONAS master file system can be shared with your data. It is not possible to unmount or delete a file system that is the master file system, but you can remove the master flag from a file system and then unmount it (for example, for maintenance reasons). It is not desirable to be without a master file system configuration for any longer than is absolutely necessary. During this time the system is exposed to a possible split-brain scenario and data corruption, so the master file system must be reactivated as soon as possible. Set and unset of the master file system can be executed only with CLI.

8.3.6 Failure groups


SONAS allows you to organize your hardware into failure groups. A failure group is a set of disks that share a common point of failure that can cause them all to become simultaneously unavailable. SONAS Software can provide RAID 1 mirroring at the software level. In this case failure groups are defined which are duplicates of each other, defined to reside on different disk subsystems. In the event that a disk subsystem failed and cannot be accessed, SONAS Software will automatically switch to the other half of the failure group. Expansion racks with storage pods can be moved away from each other for the length of the InfiniBand cables. Currently the longest cable is available 50m. This means that, for example, you are allowed to scratch the cluster and move two storage expansion racks at a distance of 50m and create a mirror on a failure group level between these two racks.

264

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

With failure of a single disk, if you have not specified multiple failure groups and replication of metadata, SONAS will not be able to continue because it cannot write logs or other critical metadata. If you have specified multiple failure groups and replication of metadata, the failure of multiple disks in the same failure group will put you in the same position. In either of these situations, GPFS will forcibly unmount the file system. It is best to replicate at least metadata between two storage pods, so you have to create two failure groups for two storage pods.

8.3.7 Setting up storage pools


Storage pool is a collection of disks with similar properties which provide a specific quality of service for specific use, such as to store all files for a particular application or a specific business division. Using storage pools, you can create tiers of storage by grouping storage devices based on performance, or reliability characteristics. For example, one pool can be an enterprise class storage system that hosts high-performance SAS disks, and another pool can consist of a set of economical Nearline SAS disks. The storage pool is managed together as a group, as the storage pool provide a means to partition the management of the file systems storage. There are two type of storage pools: System storage pool (exists by default): A storage pool that contains the system metadata (system and file attributes, directories indirect blocks, symbolic links, policy file, configuration information, and metadata server state) that is accessible to all metadata servers in the cluster. Metadata cannot be moved out of system storage pool. System storage pool is allowed to store user data and by default will go in to system storage pool unless placement policy is activated. System storage pool cannot be removed unless deleting the entire file system. Disks inside a system pool can be deleted as long as there is at least one disk assigned to system pool or enough disks with space to store existing metadata. System storage pool contains metadata, so use the fastest and the most reliable disks for reasons such as better performance of whole SONAS file system and failure protection. There can be only one system pool per file system, and the pool is required. User storage pool: Up to 7 user storage pools can be created per file system. User storage pool does not contain metadata, only stores data, so disks that are assigned to user storage pool can only have usage type data only. A maximum of eight storage pools per file system can be created including required system storage pool. Storage pool is an attribute of each disk and is specified as a field in each disk descriptor when the file system is created or when disk is added to an existing file system. SONAS offers internal storage pools and external storage pools. Internal storage pools are managed within SONAS. External storage pools are managed by an external application such as Tivoli Storage Manager. SONAS manages the movement of data to and from external storage pools. SONAS provides integrated automatic tiered storage (Integrated Lifecycle Management (ILM)), and provides an integrated global policy engine to enable centralized management of files/file-sets in the one or multiple logical storage pools. This flexible arrangement allows file based movement down to a 'per file' basis if needed (refer to 3.6.1, SONAS: Using the central policy engine and automatic tiered storage on page 107).

8.4 SONAS integration into your network


In this section we describe how to integrate your new SONAS system into your existing network environment. This network integration requires first an user authentication method to grant SONAS access, a planning Public and Private network and also a IP address load balancing mechanism configuration.
Chapter 8. Installation planning

265

8.4.1 Authentication using AD or LDAP


You can use your existing authentication method environment to grant user access to SONAS. Indeed SONAS support the following authentication method configurations: Microsoft Active Directory Lightweight Directory Access Protocol (LDAP) LDAP With MIT Kerberos SAMBA primary domain controller (PDC). However SONAS does not support multiple authentication methods running in parallel. The rule is only one type of authentication method at any given time. When a user attempts to access the IBM SONAS, he enters a user ID and password. The user ID and password are sent across the customer's network to the remote Authentication and Authorization server, which compares the user ID and password to valid user ID and password combinations in its local database. If they match, the user is considered Authenticated. The remote server sends a response back to the IBM SONAS, confirming that the user has been Authenticated and providing Authorization information. Authentication is the process to identify a user, while Authorization is the process to grant access to resources to the identified user. We provide more detail in the Chapter 9, Installation and configuration on page 279 for the AD or LDAP configuration but briefly through the command line interface, it can be perform with cfgad/cfgldap and chkauth commands.

MS Active Directory
One method for user authentication is to communicate with a remote Authentication and Authorization server running Microsoft Active Directory software. The Active Directory software provides Authentication and Authorization services. For the cfgad command, you will need to provide information such as the Active Directory Server IP address and cluster name. Basically, this information has been required in Table 8-8 on page 258. Here we need answers to questions #35 to #37. Through the Command Line Interface, run the following cfgad command as shown in Example 8-1.
Example 8-1 cfgad command example

cfgad -as <ActiveDirectoryServerIP> -c <clustername>.<domainname> -u <username> -p <password> Where: <ActiveDirectoryServerIP>: IP Address of the remote Active Directory server as specified in Table 8-8 on page 258, question #35. <clustername>: Cluster Name as specified in Example 8-1 on page 254, question #1. <domainname>: Domain Name as specified in Example 8-1 on page 254, question #2. <username>: Active Directory User ID as specified in Example 8-8 on page 258, question #36. <password>: Active Directory Password as specified in Example 8-8 on page 258, question #37. Example: cli cfgad -as 9.11.136.116 -c sonascluster.mydomain.com -u aduser -p adpassword

266

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

To check if this cluster is now part the Active Directory domain, use the chkauth command as shown in Example 8-2.
Example 8-2 chkauth command example

cli chkauth -c <clustername>.<domainname> -t Where: <clustername>: Cluster Name specified in Table 8-1 on page 254, question #1. <domainname>: Domain Name as specified in Table 8-1 on page 254, question #2. Example: cli chkauth -c sonascluster.mydomain.com -t If the cfgad command was successful, in the output from the chkauth command you will see CHECK SECRETS OF SERVER SUCCEED or a similar message.

LDAP
Another method for user authentication is to communicate with a remote Authentication and Authorization server running Lightweight Directory Access Protocol (LDAP) software. The LDAP software provides Authentication and Authorization services. For the cfgldap command, you will need to provide information such as the LDAP Server IP address and the cluster name. Basically, this information has been required in Table 8-8 on page 258. Here we need answers to questions #38 to #44. Through the Command Line Interface, run the cfgldap command as shown in Example 8-3.
Example 8-3 cfgldap command example

cfgldap -c <cluster name> -d <domain name> -lb <suffix> -ldn <rootdn> -lpw <rootpw> -ls <ldap server> -ssl <ssl method> -v Where: <cluster name>: Cluster Name as specified in Table 8-8 on page 258, Question #39 <domain name>: Domain Name as specified in Table 8-8 on page 258, Question #40 <suffix>: The suffix as specified in Table 8-8 on page 258, Question #41 <rootdn>: The rootdn as specified in Table 8-8 on page 258, Question #42 <rootpw>: The password for access to the remote LDAP server as specified in Table 8-8 on page 258, Question #43 <LDAP Server IP>: IP Address of the remote Active Directory server as specified in Example 8-8, Question #38-0. <ssl method>: SSL method as specified in Table 8-8 on page 258, Question #38 Example: cli cfgldap -c sonascluster -d mydomain.com -lb "dc=sonasldap,dc=com" -ldn "cn=Manager,dc=sonasldap,dc=com" -lpw secret -ls 9.10.11.12 -ssl tls -v To check if this cluster is now part the Active Directory domain, run the chkauth command described in Example 8-2 on page 267.

8.4.2 Planning IP addresses


In this section we describe briefly the Public and Private IP addresses in order to avoid any kind of conflicts during SONAS utilization. For more details regarding these network, both private and public, see the Understanding the IP addresses for internal networking on page 292.

Chapter 8. Installation planning

267

In Table 8-1 on page 254, Question #3, you have been prompted for an available IP Address range. As described in Chapter 2, Hardware architecture on page 41, SONAS is composed of three different networks. One of these is the public network, which will be used for SONAS users or administrators to access interface nodes or management nodes respectively. The other two are Private Network, or management network, which will be used by the management node to handle the whole cluster, and the Data Network, or InfiniBand Network, on top of which the SONAS File System is built. These two last networks, private and data, are not used by SONAS users or administrator, but as they coexist on all nodes with Public Network, ensure that you will not use the same in order to avoid some IP conflicts. There are only three choices for the Private Network range. The default setting for Public IP addresses is the range 172.31.*.* but you might already use this particular range in your existing environment, so the 192.168.*.* range might be more appropriate. Similarly if you are using both the 172.31.*.* and 192.168.*.* ranges, then the range 10.254.*.* must be used as private network instead. In order to determine IP address ranges currently in used on your data center location, ask your Network administrators.

8.4.3 Data access and IP address balancing


We now highlight required information in order to set up the SONAS IP address balancing. This IP balancing is basically handled both by the DNS and the CTDB layers. In this section we show you how the CTDB layer works, in coordination with the DNS, to provide SONAS users an access to data. As you noticed in the Installation checklist questions section, some details regarding your DNS configuration are required. With this information you will be able to set up the connection between your DNS and your SONAS. For the data access through the client network, SONAS users have to mount exports by CIFS, NFS, or FTP protocols. As the SONAS Storage Solution has also been designed to be a good candidate for Cloud Storage, accessing SONAS data must be as transparent as possible from a technical point of view. Basically your SONAS users do not have to know or even understand how to access the data, they just want to access it. As mentioned previously, this process is working thanks to a appropriate DNS configuration and the CTDB layer. First the DNS is responsible to route SONAS user requests to Interface Nodes in a round robin manner, which means that two consecutive requests will access data through two distinct interface nodes. In the tables from our Installation checklist questions on page 254, we used the sonascluster.mydomain.com DNS host name example. For consistency considerations, we keep the same one in the following schemes. The following schemes will describe step by step the DNS and CTDB mechanism in a basic environment. This environment is composed by three interfaces nodes, one DNS server and two active clients one running a Linux Operating System, the other one running a Windows operating system. Again for consistency considerations, we also represented a Management Node and Storage Pods, even if they do not have any impact on the DNS and/or CTDB mechanism. The last FTP client is also here to remind the last protocol in use in SONAS.

268

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The first SONAS user, running the Linux Operating system, wants to mount an NFS share on their workstation and run a mount command with the sonascluster.mydomain.com DNS hostname as described in the top left corner in Figure 8-8. This request will be caught by the DNS server (step 1), which will then look inside its list of IP addresses and forward the request to the appropriate interface node (step 2). This happens in a round robin way, while sending an acknowledgment to the Linux SONAS user (step 3). The connection between the first SONAS user and one interface node is then established as you can see with the dashed arrow in Figure 8-8.

Figure 8-8 SONAS user accessing data with NFS protocol

Chapter 8. Installation planning

269

Assuming now a second SONAS user, who needs also to access data hosted on the SONAS Storage Solution with a CIFS protocol from its Windows laptop. That user will run a net use command (or use the map network drive tool) using the same sonascluster.mydomain.com DNS hostname as you can see in Figure 8-9. This second request will be caught here again by the DNS server which, in a round robin way, will assign to this second user the next IP address. Then steps 1 to 3 are repeated as described in Figure 8-9. The final connection between the second SONAS user and the Interface Node is then established; see the new dashed arrow on the right.

Figure 8-9 SONAS user accessing data with CIFS protocol

Connections between SONAS users and interface nodes remain active until shares are unmounted from SONAS users, or in case of interface node failure. In case of interface node failure, the IP address balancing is handle by the CTDB layer. The CTDB layer in order to handle interface node failure works with a table. Briefly this table is recreated as soon as a new event happens. An event can be Interface Node failure or recovery. Table entries are interface nodes identifiers and Public IP addresses.

270

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In Figure 8-10, the SONAS has been configured in such a way that the CTDB have a table with three interface node identifiers and three Public IP addresses for SONAS users.

Figure 8-10 CTDB table with three interface node identifiers and three IPs

In our environment we have three interface nodes #1, #2 and #3 and three IP addresses. The CTDB table has been created with these entries: #1, #2, #3 10.10.10.1, 10.10.10.2, 10.10.10.3 From the CTDB point of view: #1 is responsible for 10.10.10.1. #2 is responsible for 10.10.10.2. #3 is responsible for 10.10.10.3. With your two SONAS users connected as shown in Figure 8-10, only the two first interface nodes are used. The first interface node is using the 10.10.10.1 IP address while the second one is using 10.10.10.2, according to the CTDB table.

Chapter 8. Installation planning

271

In case of failure of the first interface node, which was in charge of the 10.10.10.1 IP address, this IP address 10.10.10.1 will then be handled by the last interface node as in Figure 8-11. From the CTDB point of view in the case of failure you now have: #2 is responsible of 10.10.10.2. #3 is responsible of 10.10.10.3 and 10.10.10.1.

Figure 8-11 CTDB table with interface node identifiers and IP mappings after failure

As you can see in Figure 8-11, the first NFS SONAS user now has an active connection to the last interface node. This is basically how the CTDB is handling the IP address balancing. Your DNS is handling the round robin method while the CTDB is in charge of the IP failover.

272

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

However, in the previous example, there is a potential load balancing bottleneck in case of failure of one interface node. Indeed assuming a third user accessing the SONAS through the FTP protocol as described in Figure 8-12, the connection is established with the last dashed arrow on the third interface node. The first NFS user is still connected to the SONAS through the first interface node, while the second CIFS user is connected to the SONAS through the second interface node and the last FTP user is accessing the SONAS through the third interface node (the DNS here again gave the next IP address).

Figure 8-12 CTDB IP address balancing

Chapter 8. Installation planning

273

You might notice that from here all incoming users will be related to Interface Nodes #1, #2 or #3 in the same way because of the DNS round robin configuration. As an example, you might have four users connected to each Interface Node as described in Figure 8-13.

Figure 8-13 Interface node relationships showing CTDB round robin assignment

274

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The bottleneck that we mentioned earlier appears if one interface node fails. Indeed the IP address handled by this failing interface node will migrate, as will all users and their workload, to another interface node according to the CTDB table. You will then have one interface node handling a single IP address and four user workloads (second interface node) and the third interface node handling two IP addresses and eight user workloads as described in Figure 8-14.

Figure 8-14 Interface node assignment and workload distribution according to the CTDB table

The original overall SONAS users workload was equally load balanced between the three Interface Nodes, 33% of the workload each, after the interface node crash and with the previous CTDB configuration, the workload is now 33% on the second Interface Node and 66% on the third Interface Node. In order to avoid this situation, a simple configuration might be to create more IP addresses than interface nodes available. Basically, in our example, six IP addresses, two per interface node, might be more appropriate as shown in Figure 8-15.

Chapter 8. Installation planning

275

Figure 8-15 CTDB with more IP addresses than interface nodes assigned

In that case, the original the original CTDB table is: #1 is responsible of 10.10.10.1 and 10.10.10.4 #2 is responsible of 10.10.10.2 and 10.10.10.5 #3 is responsible of 10.10.10.3 and 10.10.10.6 In case of failure, the failing interface node previously in charge of two IP addresses, will off load his first IP address on the second Interface Node and his second IP address on the third Interface Node. Indeed bellow is the new CTDB table: #2 is responsible of 10.10.10.1 and 10.10.10.2 and 10.10.10.5 #3 is responsible of 10.10.10.3 and 10.10.10.4 and 10.10.10.6

276

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The results of this are a 50-50% workload spread into the two remaining Interface Nodes after the crash as described in Figure 8-16.

Figure 8-16 Even workload distribution after interface node failure

After the first Interface Node is back again, it will be a new event and the new CTDB table is as follows: #1 is responsible of 10.10.10.1 and 10.10.10.4 #2 is responsible of 10.10.10.2 and 10.10.10.5 #3 is responsible of 10.10.10.3 and 10.10.10.6 This means the traffic will then be load balanced on the three interface nodes again.

Chapter 8. Installation planning

277

8.5 Attachment to customer applications


This section is a summary of what you need to keep in mind before integrating your SONAS into your existing infrastructure and be able to use it.

8.5.1 Redundancy
SONAS, as explained in this book, has been designed to be a High Available Storage Solution. This High Availability relies on hardware redundancy and software high availability with GPFS and CTDB. But as you planned to integrate SONAS into your own existing infrastructure, you have to ensure that all externals services or equipment are also high available. Indeed your SONAS need an Active Directory Server (or LDAP) for Authentication, but is this authentication server redundant? Same question for NTP and DNS servers. From a hardware point of view, do you have redundant power? Are there network switches for the Public Network?

8.5.2 Share access


As described in the previous section on Data access and IP address balancing, you have to attach your SONAS system to your existing DNS and use a DNS round robin configuration in order to load balanced the user SONAS IP requests to all Interface Nodes (beware this is not a workload load balancing). But for any specific reason you want to use directly the IP address instead of the DNS hostname. Regarding the CTDB layer, the previous section shows you how to configure your IP Public Network and CTDB in order to load balance the workload from one failed Interface Node to the remaining ones. Typical SONAS use is to map one SONAS user to a single Interface Node in order to take advantage of the caching inside Interface Node, but perhaps you might need to use the same CIFS share twice from the same SONAS user (through two drive letters), and then use two interface nodes. However, do not do this with NFS shares; because of the NFS design, the NFS protocol needs to send metadata to different NFS services that might be located on two separate nodes in such a configuration.

8.5.3 Caveats
If you have planned to migrate your existing environment and business applications to a SONAS Storage Solution, be aware that NAS storage are not always the most appropriate options. Indeed if your business application is currently writing or reading data from a locally attached solution (DAS), you will increase in a significant way the latency on a Storage base solution by design. Similarly if your application is performing a huge numbers of write, even small ones, on a locally attached solution, it will quickly overload your network switches. A workaround for these requirements is first to use caching on the client side to reduce the higher bandwidth impact on performance, and to combine IO requests on client side in order to reduce IO size. You can also modify your application in order to be more tolerant in case of packet loss or time-out expiration due to IP protocol, and make it retry.

8.5.4 Backup considerations


There are also good practices for backing up your storage. First, stop your application cleanly in order to have consistent data, then take a snapshot, and use it for backup process while restarting your application.

278

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Chapter 9.

Installation and configuration


In this chapter we provide information about the basic installation and configuration of your SONAS appliance.

Copyright IBM Corp. 2010. All rights reserved.

279

9.1 Pre-Installation
At this point, you have completed your IBM SONAS purchase and it has been delivered. You are now ready to integrate your SONAS appliance with the installation: 1. Review the floor plan and pre-installation planning sheet to determine whether all information has been provided. 2. If the pre-installation planning sheet is not complete, contact the Storage Administrator. This information will be required through the rest of the installation, and the install cannot start until the Preinstallation Planning Sheet is done. 3. The IBM authorized service provider will perform all the necessary preliminary planning work including verifying the information in the planning worksheets in order to make sure you are well aware of the specific requirements such as physical environment or networking environment for the SONAS system.

9.2 Installation
Installation of a SONAS appliance requires both hardware installation as well as software installation.

9.2.1 Hardware installation


This section provides a high level overview of the tasks to complete the SONAS hardware installation. The IBM SONAS appliance shipped must be unpacked and moved to the desired location. The appliance when shipped from the IBM manufacturing unit has all the connections to the nodes inside the rack, already made. The internal connections are done using the InfiniBand connections through which the nodes communicate with each other. The IBM authorized service provider performs the following tasks: 1. Builds and assembles the hardware components into the final SONAS system. 2. Checks the InfiniBand switches to ensure that all storage locations are ready to use. 3. Connects the expansion rack if required. 4. Loads the software stack on each node of the rack. 5. Loads the Disk Drive modules onto the storage drawer. 6. Powers on the storage controllers, storage expansions, KVM switch and Display module. 7. Powers on the management node.

280

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

9.2.2 Software installation


After completing the hardware install, the IBM authorized service provider begins the software installation process. During this process the script first_time_install is run. The initial steps requires you to provide the configuration information needed to set up the internal network and get the management node connect to your network to the IBM authorized service provider. See the planning tables in Chapter 8, Installation planning on page 249 for the following information: Clustername Management console IP address Management node gateway address Management node subnet mask Root password NTP Server IP address After you have input the required parameters, the script then asks for powering on all the nodes. After you have all the nodes powered on, the configuration script first detects the interface nodes. Review the list of interface nodes on the panel to determine if the ID, Frame, Slot, and Quorum settings are correct. The storage nodes are then detected and configured in a similar way. Review the list of storage nodes to determine if the ID, frame, Slot and Quorum settings are correct. Check the health of the management nodes, interface nodes, and storage nodes to ensure that the management node can communicate with the interface nodes and storage nodes.

9.2.3 Checking health of the node hardware


The IBM authorized service provider uses a script that checks the health of the management nodes, interface nodes and storage nodes. it ensures that the management nodes can communicate with the interface nodes and the storage nodes. The script verify_hardware_wellness checks the node health. It searches for the nodes and checks the Ethernet connections to each node and displays the results. If the check is successful, it will display the number of nodes it detected. Compare this list with the list of Nodes you have in the pre-installation planning sheet.If the number of nodes displayed is correct, type Y and press Enter. If the number of nodes displayed is not correct, type N and press Enter. If the check detects a problem, it will display one or more error messages. See the Problem Determination and Troubleshooting Guide, GA32-0717 for detailed information.

9.2.4 Additional hardware health checks


This procedure checks the health of the Ethernet switches, InfiniBand Switches and Storage Drawers. The IBM authorized service provider runs the command cnrsscheck. This command will run all the checks and display the results.

Chapter 9. Installation and configuration

281

Review the result of the checks and verify that the checks have status of OK. For any problems reported by this command, refer to the Problem Determination and Troubleshooting Guide, GA32-0717.

9.3 Post installation


At the end of software installation the SONAS system will have a fully configured Clustered file system (GPFS) with all the disks configured for the filesystem for use, the Management GUI Interface running and a CLI Interface running. Create a CLI user id using the mkuser CLI command. In order to run the CLI commands on the SONAS appliance, add the cluster to the CLI Interface using the addcluster command. The SONAS appliance is now ready for further configuration.

9.4 Software configuration


The software configuration can either be performed by the IBM Personnel as an additional service offering or by you, as a system administrator of the SONAS appliance. It is carried out after the hardware and software installation. The pre-installation planning sheets in Chapter 8, Installation planning on page 249 requires you to fill in the administrative details and environment. These details include information regarding the network and the Microsoft Active Directory or LDAP server. The software configuration Procedure uses a series of CLI commands. We describe the procedure as a high level overview: 1. Verify that the nodes are ready by checking the status of the nodes using command lscurrnode and ensure they are in Ready condition. This command must also display the Roles of each node correctly along with their status of being Ready. 2. Configure the Cluster Manager (CTDB). 3. Create the Failover Group and Filesystem (GPFS) using the chdisk and mkfs commands. 4. Configure the DNS Server IP address and Domain using the command setnwdns. 5. Configure the NAT gateway using the command mknwnatgateway. 6. Integrate with an authentication server, such as active domain (AD) or LDAP with either cfgad (and optional cfgsfu), cfgldap, or cfgnt4. 7. Configure the Data Path IP Addresses, Group and attach IP Address using command mknw, mknwgroup and attachnw respectively. 8. Connect a client workstation through a configured export to verify that the system is reachable remotely and that the interfaces work as expected.

9.5 Sample environment


Let us now take an example and go through the steps for installation and configuration of a SONAS appliance.

282

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Consider the following setup: Hardware considerations: The rack contains 1 management node, 6 interface nodes, 2 storage nodes, switches, and InfiniBand connections. Software considerations: AD/LDAP is already configured on an external server, Filesystem and export information already available. The cluster name in this example is: Furby.storage.tucson.ibm.com

9.5.1 Initial hardware installation


As mentioned in 9.2.1, Hardware installation on page 280, the IBM authorized service provider prepares the SONAS system. The racks are assembled, Nodes are interconnected for them to communicate with each other. The software code is installed on the nodes. The cluster configuration data must be already present in the pre-installation planning sheet. The management node is then powered on, keeping the other nodes shut down. The script first_time_install is then run, which configures the cluster. In the following screen captures, you will see a few steps captured during the installation procedure carried out by the IBM authorized service provider. Figure 9-1 shows the options that are shown when the first_time_install script is run.

Figure 9-1 Sample of script first_time_install being run

As the script proceeds, it asks for the configuration parameters to configure the cluster. These details include the Management Node IP, Internal IP Range, Root Password, Subnet IP Address, NTP Server IP, and more. The interface nodes and the storage nodes are then powered on. The script checks for these nodes and identifies them as per the configuration fed during the installation procedure.

Chapter 9. Installation and configuration

283

Figure 9-2 shows the detection of interface nodes and storage nodes when powered on.

Figure 9-2 The script detecting the interface node and storage node

284

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 9-3 and Figure 9-4 show the assignment of ID for the interface nodes and storage nodes in the cluster. It also involves the step to assign the nodes to be Quorum nodes or not.

Figure 9-3 Identifying the sequence of the interface nodes and assigning quorum nodes

Chapter 9. Installation and configuration

285

Figure 9-4 Identifying sequence of Storage nodes and assigning quorum nodes

The next panel in Figure 9-5 shows the configuration of the cluster where each of the interface nodes and storage nodes are added as a part of the cluster and the cluster nodes are prepared to communicate with each other.

286

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 9-5 Cluster being configured with all the Nodes

The panel in Figure 9-6 shows the end of the script after which the cluster has been successfully configured.

Figure 9-6 Cluster now being created and first_time_install script completes

Chapter 9. Installation and configuration

287

The Health of the system is then checked. The IBM authorized service provider then logs in into the management node and runs the health check commands. The verify_hardware_wellness script checks for the connectivity between the management node, interface nodes and the storage nodes. The command cnrsscheck is then run to check for the health of the Ethernet Switches, InfiniBand Switches and the Storage Drawers. and to see if the Nodes have their roles assigned respectively and if they are able to communicate with each other. Example 9-1 shows the command output for our example cluster setup.
Example 9-1 Running script verify_hardware_wellness and cnrsscheck to check the overall Health of cluster created

# verify_hardware_wellness [NFO] [2010-04-21 15:59:06] [NFO] [2010-04-21 15:59:06] 3 minutes. [NFO] [2010-04-21 16:00:54] minutes. [NFO] [2010-04-21 16:04:10] 1 minutes. [NFO] [2010-04-21 16:04:18] [NFO] [2010-04-21 16:04:28] Discovery results: There are 6 interface nodes. There are 2 storage nodes. There is 1 management node.

197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness() 197409 /opt/IBM/sonas/bin/verify_hardware_wellness()

Is this configuration correct? (y/n): y Hardware configuration verified as valid configuration.

[root@Humboldt.mgmt001st001 ~]# cnrssccheck --nodes=all --checks=all vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv mgmt001st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ================================================================================ Run checks on mgmt001st001 It might take a few minutes. EthSwCheck ... OK IbSwCheck ... OK NodeCheck ... OK ================================================================================ IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:07:57+00:00 Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all ================================================================================ Host Name: mgmt001st001 Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml ================================================================================ ================================================================================ Summary of NON-OK Statuses: Warnings: 0 Degrades: 0 Failures: 0 Offlines: 0 ================================================================================ Ethernet Switch status:

288

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Verify Ethernet Switch Configuration (Frame:1, Slot:41) OK Verify Ethernet Switch Hardware (Frame:1, Slot:41) OK Verify Ethernet Switch Firmware (Frame:1, Slot:41) OK Verify Ethernet Switch Link (Frame:1, Slot:41) OK Verify Ethernet Switch Configuration (Frame:1, Slot:42) OK Verify Ethernet Switch Hardware (Frame:1, Slot:42) OK Verify Ethernet Switch Firmware (Frame:1, Slot:42) OK Verify Ethernet Switch Link (Frame:1, Slot:42) OK ================================================================================ InfiniBand Switch status: Verify InfiniBand Switch Configuration (Frame:1, Slot:35) OK Verify InfiniBand Switch Hardware (Frame:1, Slot:35) OK Verify InfiniBand Switch Firmware (Frame:1, Slot:35) OK Verify InfiniBand Switch Link (Frame:1, Slot:35) OK Verify InfiniBand Switch Configuration (Frame:1, Slot:36) OK Verify InfiniBand Switch Hardware (Frame:1, Slot:36) OK Verify InfiniBand Switch Firmware (Frame:1, Slot:36) OK Verify InfiniBand Switch Link (Frame:1, Slot:36) OK ================================================================================ Node status: Verify Node General OK ================================================================================ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mgmt001st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv strg001st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ================================================================================ Run checks on strg001st001 It might take a few minutes. FcHbaCheck ... OK DdnCheck ... OK DdnLogCollector ... OK ================================================================================ IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:10:46+00:00 Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all ================================================================================ Host Name: strg001st001 Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml ================================================================================ ================================================================================ Summary of NON-OK Statuses: Warnings: 0 Degrades: 0 Failures: 0 Offlines: 0 ================================================================================ DDN Disk Enclosure status: Verify Verify Verify Verify Verify Disk Enclosure Configuration (Frame:1, Slot:1) Disk in Disk Enclosure (Frame:1, Slot:1) Disk Enclosure Hardware (Frame:1, Slot:1) Disk Enclosure Firmware (Frame:1, Slot:1) Array in Disk Enclosure (Frame:1, Slot:1) OK OK OK OK OK

Chapter 9. Installation and configuration

289

================================================================================ FibreChannel HBA status: Verify Fibre Channel HBA Configuration (Frame:1, Slot:17, Instance:0) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:17, Instance:0) OK Verify Fibre Channel HBA Link (Frame:1, Slot:17, Instance:0) OK Verify Fibre Channel HBA Configuration (Frame:1, Slot:17, Instance:1) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:17, Instance:1) OK Verify Fibre Channel HBA Link (Frame:1, Slot:17, Instance:1) OK ================================================================================ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ strg001st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv strg002st001 vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv ================================================================================ Run checks on strg002st001 It might take a few minutes. FcHbaCheck ... OK DdnCheck ... OK DdnLogCollector ... OK ================================================================================ IBM SONAS Checkout Version 1.00 executed on: 2010-04-21 23:13:26+00:00 Command syntax and parameters: /opt/IBM/sonas/bin/cnrsscdisplay --all ================================================================================ Host Name: strg002st001 Check Status File: /opt/IBM/sonas/ras/config/rsSnScStatusComponent.xml ================================================================================ ================================================================================ Summary of NON-OK Statuses: Warnings: 0 Degrades: 0 Failures: 0 Offlines: 0 ================================================================================ DDN Disk Enclosure status: Verify Disk Enclosure Configuration (Frame:1, Slot:1) OK Verify Disk in Disk Enclosure (Frame:1, Slot:1) OK Verify Disk Enclosure Hardware (Frame:1, Slot:1) OK Verify Disk Enclosure Firmware (Frame:1, Slot:1) OK Verify Array in Disk Enclosure (Frame:1, Slot:1) OK ================================================================================ FibreChannel HBA status: Verify Fibre Channel HBA Configuration (Frame:1, Slot:19, Instance:0) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:19, Instance:0) OK Verify Fibre Channel HBA Link (Frame:1, Slot:19, Instance:0) OK Verify Fibre Channel HBA Configuration (Frame:1, Slot:19, Instance:1) OK Verify Fibre Channel HBA Firmware (Frame:1, Slot:19, Instance:1) OK Verify Fibre Channel HBA Link (Frame:1, Slot:19, Instance:1) OK ================================================================================ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ strg002st001 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

290

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Commands: All the commands run for the configuration of SONAS are run as root. You can either export the PATH variable to include the CLI path or run the commands from the CLI directory. In our example, we change directory to the CLI directory by running: # cd /opt/IBM/sofs/cli At the end of the hardware installation, the cluster is created. The IBM authorized service provider then creates a CLI user, and adds the cluster to the GUI. See Example 9-2.
Example 9-2 Creating a new CLI user using CLI command mkuser

[root@furby.mgmt001st001 cli]# mkuser -p Passw0rd cliuser EFSSG0019I The user cliuser has been successfully created. [root@furby.mgmt001st001 cli]# addcluster -h int001st001 -p Passw0rd EFSSG0024I The cluster Furby.storage.tucson.ibm.com has been successfully added You need to enable the license as in Example 9-3, after which the cluster is ready for the rest of the software configuration.
Example 9-3 Enabling License.

[root@furby.mgmt001st001 cli]# enablelicense EFSSG0197I The license was enabled successfully!

9.5.2 Initial software configuration


The initial software configuration is a series of CLI commands run by either, the IBM Personnel as an additional service offering or by you, as a system administrator of the SONAS appliance. The procedure is explained next. To start, you need to login to the management node using the root user id. You need to enter the root password. Make sure the cluster is added to the Management Interface.

Verifying that the nodes are ready


Before you begin with the configuration, make sure that the cluster has been added to the Management Interface and that the nodes are in ready state. Verify using the command lsnode as shown in Example 9-4 and confirm if the command output displays all the Nodes correctly and they all have OK as the Connection Status for each node.
Example 9-4 Verifying that the nodes are all ready by running CLI command lsnode

[root@furby.mgmt001st001 cli]# lsnode Hostname IP Description int001st001 172.31.132.1 int002st001 172.31.132.2 int003st001 172.31.132.3 int004st001 172.31.132.4 int005st001 172.31.132.5 int006st001 172.31.132.6 mgmt001st001 172.31.136.2 strg001st001 172.31.134.1 strg002st001 172.31.134.2

-v Role interface interface interface interface interface interface management storage storage

Product Version 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7

Connection stat OK OK OK OK OK OK OK OK OK

Chapter 9. Installation and configuration

291

Attention: The actual command output displayed on the panel has many more fields than are shown in this example. This example has been simplified to ensure that the important information is clear.

Checking the state of the nodes


Now, check the state of nodes using the command lscurrnode as shown in Example 9-5.
Example 9-5 Running command lscurrnode to check node state

[root@furby.mgmt001st001 cli]# lscurrnode Node ID Node type Node state Management IP Address int001st001 Interface ready 172.31.4.1 int002st001 Interface ready 172.31.4.2 int003st001 Interface ready 172.31.4.3 int004st001 Interface ready 172.31.4.4 int005st001 Interface ready 172.31.4.5 int006st001 Interface ready 172.31.4.6 mgmt001st001 Management ready 172.31.8.1 strg001st001 Storage ready 172.31.6.1 strg002st001 Storage ready 172.31.6.2

InfiniBand IP address 172.31.132.1 172.31.132.2 172.31.132.3 172.31.132.4 172.31.132.5 172.31.132.6 172.31.136.1 172.31.134.1 172.31.134.2

Attention: The actual command output displayed on the panel has many more fields that shown in this example. This example has been simplified to ensure the important information is clear. Column 3 in Example 9-5 displays the state of the node. Verify that the state of each Node is Ready.

9.5.3 Understanding the IP addresses for internal networking


For Internal Networking, we have the Management Network and the InfiniBand Network. The Management IP Address and InfiniBand addresses, as you see in the previous example, have the IP 172.31.*.*. This is chosen from three options that are available. See Chapter 8, Installation planning on page 249. For our example, we have chosen the 172.31.*.* IP address range. While the first two parts of the IP address remain constant to what you have chosen, the last two vary.

Management IP range
This is the network for the management node to send management data to the interface nodes and storage nodes. This is a Private Network not reachable by the outside clients. There is no data transferred in this network but only management related communication such as commands or passing management related information from the management nodes to the interface nodes and storage nodes. You can read more in Chapter 4, Networking considerations on page 141. From the previous Example 9-5 on page 292, you can see that the Management IP takes the range of 172.31.4.* for interface nodes, 172.31.8.* for management node, and 172.31.6.* for the storage node. This is done by the install script while creating the SONAS cluster. Here as we see, the first two part of IP address is constant and then depending on the interface nodes, management node, and storage node, the management IP address is assigned as follows: 292
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Interface node: 172.31.4.* Management node: 172.31.8.* Storage node: 172.31.6.* Here the last part of the IP address is incremented sequentially depending on the number of interface nodes and storage nodes. At the time of writing, only a single management node is supported.

InfiniBand IP range
This is the network range which is used for data transfer between the interface node and storage node. Like the Management IP, this is a Private Network and not reachable by the outside clients. Refer to Chapter 9, Installation and configuration on page 279. From Example 9-5 on page 292, you can see that the InfiniBand IP takes the range of 172.31.132.* for interface nodes, 172.31.136.* for management node and 172.31.134.* for storage node. This is done by the install script while creating the SONAS cluster. Here as we see, the first two part of IP address is constant and then depending on the interface nodes, management node and storage node, the management IP address is assigned as: Interface node: 172.31.132.* Management node: 172.31.136.* Storage node: 172.31.134.*

9.5.4 Configuring the Cluster Manager


The Cluster Manager (CTDB) manages the SONAS cluster to a large extent. It is an integral part of the SONAS appliance and holds important configuration data of the cluster. The CTDB acts as the Cluster Manager for the SONAS appliance. More information about the CTDB can be found in the Appendix , CTDB on page 486 under CTDB. The SONAS Cluster Manager or CTDB is configured using the cfgcluster CLI command on the management node. The command requires you to add a Public Cluster Name for the cluster which will be the name used to advertise the cluster to the neighboring network like a Windows client machine. This name is limited to 15 ASCII characters without any spaces or special characters as shown in Example 9-6.
Example 9-6 Configuring the Cluster Manager using CLI command cfgcluster

[root@furby.mgmt001st001 cli]# cfgcluster Furby Are you sure to initialize the cluster configuration ? Do you really want to perform the operation (yes/no - default no): yes (1/6) - Prepare CIFS configuration (2/6) - Write CIFS configuration on public nodes (3/6) - Write cluster manager configuration on public nodes (4/6) - Import CIFS configuration into registry (5/6) - Write initial configuration for NFS,FTP,HTTP and SCP (6/6) - Restart cluster manager to activate new configuration EFSSG0114I Initialized cluster configuration successfully The command prompts, Do you really want to perform the operation? Type yes and press Enter to continue.

Chapter 9. Installation and configuration

293

Verify that the cluster has been configured by running the lscluster command. This command must display the CTDB clustername you have used to configure the Cluster Manager. The output of the command is shown in Example 9-7. The public Cluster name is Furby.storage.tucson.ibm.com.
Example 9-7 Verifying the cluster details using CLI command lscluster

[root@furby.mgmt001st001 cli]# lscluster ClusterId Name 12402779238926611101 Furby.storage.tucson.ibm.com

PrimaryServer strg001st001

SecondaryServ strg002st001

9.5.5 Listing all available disks


The available disks can be checked using the CLI command lsdisk. The output of the command is as shown in example Example 9-8.
Example 9-8 Listing the disks available using CLI command lsdisk

[root@furby.mgmt001st001 cli]# lsdisk Name File system Failure group Type Pool Status array0_sata_60001ff0732f8548c000000 1 system ready array0_sata_60001ff0732f8568c020002 1 system ready array0_sata_60001ff0732f8588c040004 1 system ready array0_sata_60001ff0732f85a8c060006 1 system ready array0_sata_60001ff0732f85c8c080008 1 system ready array0_sata_60001ff0732f85e8c0a000a 1 system ready array1_sata_60001ff0732f8558c010001 1 system ready array1_sata_60001ff0732f8578c030003 1 system ready array1_sata_60001ff0732f8598c050005 1 system ready array1_sata_60001ff0732f85d8c090009 1 system ready array1_sata_60001ff0732f85f8c0b000b 1 system ready array1_sata_60001ff0732f8608c0f000c 1 system ready

9.5.6 Adding a second failure group


As you saw in the previous Example 9-8, the failure groups for all the disks is the default failure group which is assigned at the time of cluster creation. For enabling replication of data on the filesystem, there must be more than one failure group available. In our example we create a filesystem with replication enabled and hence we change the failure group of disks that will be part of the filesystem, to have another one. The chdisk command allows you to modify the failure group property of the disk. See Example 9-9.
Example 9-9 Changing Failure Group of disks using CLI command chdisk

[root@furby.mgmt001st001 cli]# chdisk array1_sata_60001ff0732f8558c010001,array1_sata_60001ff0732f8578c030003,array1_sata_60001ff0732f 8598c050005,array1_sata_60001ff0732f85d8c090009,array1_sata_60001ff0732f85f8c0b000b, array1_sata_60001ff0732f8608c0f000c --failuregroup 2 You can verify the changed failure groups using the command lsdisk as seen in the previous section 9.5.5, Listing all available disks on page 294.

294

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Example 9-10 displays the output after changing the failure groups.
Example 9-10 Verifying the changed Failure Groups of disks using CLI command lsdisk

[root@furby.mgmt001st001 cli]# lsdisk Name File system Failure group Type Pool Status array0_sata_60001ff0732f8548c000000 1 system ready array0_sata_60001ff0732f8568c020002 1 system ready array0_sata_60001ff0732f8588c040004 1 system ready array0_sata_60001ff0732f85a8c060006 1 system ready array0_sata_60001ff0732f85c8c080008 1 system ready array0_sata_60001ff0732f85e8c0a000a 1 system ready array1_sata_60001ff0732f8558c010001 2 dataAndMetadata system ready array1_sata_60001ff0732f8578c030003 2 dataAndMetadata system ready array1_sata_60001ff0732f8598c050005 2 dataAndMetadata system ready array1_sata_60001ff0732f85d8c090009 2 dataAndMetadata system ready array1_sata_60001ff0732f85f8c0b000b 2 dataAndMetadata system ready array1_sata_60001ff0732f8608c0f000c 2 dataAndMetadata system ready

9.5.7 Creating the GPFS file system


The underlying clustered filesystem that SONAS uses is the IBM GPFS Filesystem. Use the CLI command mkfs to create the filesystem. Example 9-11 shows how to create the filesystem using the command. Note that we do not use all the available disks. We use three disks from failure group 1 and three disks from failure group 2.
Example 9-11 Creating the root Files system using CLI command mkfs

[root@furby.mgmt001st001 cli]# mkfs gpfs0 /ibm/gpfs0 -F array0_sata_60001ff0732f8548c000000,array0_sata_60001ff0732f8568c020002,array0_sata_60001ff0732f 8588c040004,array0_sata_60001ff0732f85a8c060006,array1_sata_60001ff0732f8558c010001,array1_sata_ 60001ff0732f8578c030003,array1_sata_60001ff0732f8598c050005,array1_sata_60001ff0732f85d8c090009 --master -R meta --nodmapi The following disks of gpfs0 will be formatted on node strg001st001: array0_sata_60001ff0732f8548c000000: size 15292432384 KB array0_sata_60001ff0732f8568c020002: size 15292432384 KB array0_sata_60001ff0732f8588c040004: size 15292432384 KB array0_sata_60001ff0732f85a8c060006: size 15292432384 KB array1_sata_60001ff0732f8558c010001: size 15292432384 KB array1_sata_60001ff0732f8578c030003: size 15292432384 KB array1_sata_60001ff0732f8598c050005: size 15292432384 KB array1_sata_60001ff0732f85d8c090009: size 15292432384 KB Formatting file system ... Disks up to size 141 TB can be added to storage pool 'system'. Creating Inode File 0 % complete on Wed Apr 21 16:36:30 2010 1 % complete on Wed Apr 21 16:37:08 2010 2 % complete on Wed Apr 21 16:37:19 2010 3 % complete on Wed Apr 21 16:37:30 2010 5 % complete on Wed Apr 21 16:37:35 2010 9 % complete on Wed Apr 21 16:37:40 2010 13 % complete on Wed Apr 21 16:37:45 2010 18 % complete on Wed Apr 21 16:37:50 2010 23 % complete on Wed Apr 21 16:37:55 2010 27 % complete on Wed Apr 21 16:38:00 2010 295

Chapter 9. Installation and configuration

32 % complete on Wed Apr 21 16:38:05 2010 37 % complete on Wed Apr 21 16:38:10 2010 42 % complete on Wed Apr 21 16:38:15 2010 46 % complete on Wed Apr 21 16:38:20 2010 51 % complete on Wed Apr 21 16:38:25 2010 56 % complete on Wed Apr 21 16:38:30 2010 61 % complete on Wed Apr 21 16:38:35 2010 66 % complete on Wed Apr 21 16:38:40 2010 70 % complete on Wed Apr 21 16:38:45 2010 75 % complete on Wed Apr 21 16:38:50 2010 80 % complete on Wed Apr 21 16:38:55 2010 84 % complete on Wed Apr 21 16:39:00 2010 89 % complete on Wed Apr 21 16:39:05 2010 94 % complete on Wed Apr 21 16:39:10 2010 99 % complete on Wed Apr 21 16:39:15 2010 100 % complete on Wed Apr 21 16:39:16 2010 Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool 'system' 20 % complete on Wed Apr 21 16:39:33 2010 38 % complete on Wed Apr 21 16:39:38 2010 57 % complete on Wed Apr 21 16:39:43 2010 74 % complete on Wed Apr 21 16:39:48 2010 92 % complete on Wed Apr 21 16:39:53 2010 100 % complete on Wed Apr 21 16:39:55 2010 Completed creation of file system /dev/gpfs0. EFSSG0019I The filesystem gpfs0 has been successfully created. EFSSG0038I The filesystem gpfs0 has been successfully mounted. EFSSG0140I Applied master role to file system gpfs0 EFSSG0015I Refreshing data ... In Example 9-11 here, the filesystem gpfs0 is created with replication of the MetaData set and hence uses disks from 2 failure groups. The second failure group was created in the previous Example 9-9 on page 294. The filesystem is also marked as the Master filesystem. Master filesystem is a unique filesystem in the SONAS appliance. This filesystem holds the shared information that is used by the Cluster Manager, CTDB. You can verify the creation of filesystem using the lsfs command. Example 9-12 displays the output for the newly created filesystem.
Example 9-12 Verifying the creation of file system using CLI command lsfs

[root@furby.mgmt001st001 cli]# lsfs Cluster Devicename Mountpoint Furby.storage.tucson.ibm.com gpfs0 /ibm/gpfs0 Attention: The actual information displayed on the panel has many more fields than that are shown in Example 9-12, and is too large to show in this example. This example has been simplified to ensure the important information is clear.

296

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The command lsdisk shows the list of disks used for the gpfs0 filesystem (Example 9-13).
Example 9-13 Verifying the disks used for the file system created using CLI command lsdisk

lsdisk Name array0_sata_60001ff0732f8548c000000 array0_sata_60001ff0732f8568c020002 array0_sata_60001ff0732f8588c040004 array0_sata_60001ff0732f85a8c060006 array1_sata_60001ff0732f8558c010001 array1_sata_60001ff0732f8578c030003 array1_sata_60001ff0732f8598c050005 array1_sata_60001ff0732f85d8c090009 array0_sata_60001ff0732f85c8c080008 array0_sata_60001ff0732f85e8c0a000a array1_sata_60001ff0732f85f8c0b000b array1_sata_60001ff0732f8608c0f000c

File system gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0

Failure group Type Pool Status 1 dataAndMetadata system ready 1 dataAndMetadata system ready 1 dataAndMetadata system ready 1 dataAndMetadata system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready 1 system ready 1 system ready 2 dataAndMetadata system ready 2 dataAndMetadata system ready

up up up up up up up up

As you can see, the disk mentioned at creation of filesystem is now part of the filesystem (gpfs0 in the example) and it includes disks from both the failure groups.

9.5.8 Configuring the DNS Server IP addresses and domains


The SONAS appliance must be configured with the IP Address of the Domain Name Services (DNS) servers and the Domains. These IP addresses are also called the Public IP addresses which are accessible on your network. Only the management node and the interface nodes are accessible on your network. The DNS can be configured using the CLI command setnwdns. we take three examples to explain the command. In the first example, the setnsdns command is run with only a single DNS server with IP address 9.11.136.116 with no Domain or Search String (see Example 9-14).
Example 9-14 Configuring DNS with only DNS server IP

[SONAS]$ setnwdns 9.11.136.116 In the second example, the setnwdns command is run with a single DNS server with IP address 9.11.136.116 along with a domain name of storage.ibm.com and single search string as servers.storage.ibm.com is used (see Example 9-15).
Example 9-15 Configuring DNS with DNS server IP, domain name and Search string

[SONAS]$ setnwdns 9.11.136.116 --domain storage.ibm.com --search servers.storage.ibm.com In the third example, the setnwdns command is run with multiple DNS servers having IPs 9.11.136.116 and 9.11.137.101, domain name of storage.ibm.com and multiple search strings such as servers.storage.ibm.com, storage.storage.ibm.com (see Example 9-16).
Example 9-16 Configuring DNS with DNS server IP, domain name and multiple search strings

[SONAS]$ setnwdns 9.11.136.116,9.11.137.101 --domain storage.ibm.com --search servers.storage.ibm.com,storage.storage.ibm.com

Chapter 9. Installation and configuration

297

For our example cluster setup, we use the setnwdns with three search string options, storage3.tucson.ibm.com, storage.tucson.ibm.com, and sonasdm.storage.tucson.ibm.com as shown in Example 9-17. Here our DNS Server IPs are 9.11.136.132 and 9.11.136.116.
Example 9-17 Configuring DNS with DNS server IP and multiple Search strings using CLI command mknw

[root@furby.mgmt001st001 cli]# setnwdns 9.11.136.132,9.11.136.116 --search storage3.tucson.ibm.com,storage.tucson.ibm.com,sonasdm.storage.tucson.ibm.com To verify that the DNS Server IP Address and Domain have been successfully configured, check the content of the resolv.conf file on each management and interface node. Keep in mind, the management node and interface nodes are the only nodes accessible from your network and hence only these nodes are used to set up DNS. Steps to verify the DNS configuration are shown in Example 9-18.
Example 9-18 Verifying that the DNS has been successfully configured

[root@furby.mgmt001st001]$ onnode all cat /etc/resolv.conf >> NODE: 172.31.132.1 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.2 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.3 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.4 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.5 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.132.6 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116 >> NODE: 172.31.136.2 << search storage3.tucson.ibm.com storage.tucson.ibm.com sonasdm.storage.tucson.ibm.com nameserver 9.11.136.132 nameserver 9.11.136.116

298

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In Example 9-18 on page 298, the SONAS setup has one management node and six interface nodes. The DNS server IP used is 9.11.136.116 and three search strings, storage3.tucson.ibm.com, storage.tucson.ibm.com and sonasdm.storage.tucson.ibm.com are used. The management node IP is: 172.31.136.2 and interface node IPs are: 172.31.132.* as described in 9.5.3, Understanding the IP addresses for internal networking on page 292.

9.5.9 Configuring the NAT Gateway


Network Address Translation (NAT) is a technique used with the network routers. The SONAS appliance has its interface nodes talking to each other using a Private IP Address. This network is not accessible by your network. The Public IP Addresses are the addresses through which the you can access the management nodes as well as the interface nodes. Hence, the management nodes and interface nodes have a Private IP address for internal communication and a Public IP address for the external communication. NAT allows a single IP address on your network or public IP address to be used to access the management node and interface nodes on their private network IP addresses. The network router converts the IP address and port on your network to a corresponding IP address and port on the private network. This IP address is not a Data path connection and is not used for reading or writing files. It is used to provide path from management node and interface nodes to your network for the authorization and authentication process. The CLI command mknwnatgateway is used to configure NAT on the SONAS appliance. Example 9-19 shows how NAT is configured using the CLI command.
Example 9-19 Setting up the NAT gateway using CLI command mknwnatgateway

[root@furby.mgmt001st001]$ mknwnatgateway 9.11.137.246/23 ethX0 9.11.136.1 172.31.128.0/17 mgmt001st001,int001st001,int002st001,int003st001,int004st001,int005st001,int006st001 EFSSG0086I NAT gateway successfully configured. As you can see in Example 9-19, the Public NAT gateway IP is 9.11.219.245, Interface is ethX0, default gateway is 9.11.136.1, private network IP address is 172.31.128.0/17 and the nodes specified are management nodes and the six interface nodes. This means, all the Management and interface nodes talk to the outside word on their public IP through the NAT Gateway. Confirm the NAT has been configured using the CLI command, lsnwnatgateway as shown in Example 9-20.
Example 9-20 Verifying that the NAT Gateway has been successfully configured using CLI command lsnwnatgateway

[root@furby.mgmt001st001]$ lsnwnatgateway Public IP Public interface Default gateway Private network Nodes 9.11.137.246/23 ethX0 9.11.136.1 172.31.128.0/17, 172.31.136.2,172.31.132.1,172.31.132.2,172.31.132.3,172.31.132.4,172.31.132.5,172.31.132.6 Another way to check that the NAT Gateway has been successfully configured is to check if the management nodes and interface nodes can ping the gateway specified (see Example 9-21).

Chapter 9. Installation and configuration

299

Example 9-21 Verifying that all the Nodes of the cluster can ping the NAT Gateway

onnode all ping -c 2 9.11.137.246/23 >> NODE: 172.31.132.1 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.034 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.022 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.022/0.028/0.034/0.006 ms >> NODE: 172.31.132.2 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.034 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.034/0.034/0.035/0.005 ms >> NODE: 172.31.132.3 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.029 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.023 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.023/0.026/0.029/0.003 ms >> NODE: 172.31.132.4 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.028 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.024 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.024/0.026/0.028/0.002 ms >> NODE: 172.31.132.5 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.035 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.022 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms rtt min/avg/max/mdev = 0.022/0.028/0.035/0.008 ms >> NODE: 172.31.132.6 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.036 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.016 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 1000ms 300
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

rtt min/avg/max/mdev = 0.016/0.026/0.036/0.010 ms >> NODE: 172.31.136.2 << PING 9.11.137.246 (9.11.137.246) 56(84) bytes of data. 64 bytes from 9.11.137.246: icmp_seq=1 ttl=64 time=0.027 ms 64 bytes from 9.11.137.246: icmp_seq=2 ttl=64 time=0.020 ms --- 9.11.137.246 ping statistics --2 packets transmitted, 2 received, 0% packet loss, time 999ms rtt min/avg/max/mdev = 0.020/0.023/0.027/0.006 ms A successful ping shows that the NAT gateway has been successfully configured.

9.5.10 Configuring authentication: AD and LDAP


SONAS requires that the users accessing the appliance must be authorized and authenticated. You can choose to use Active Directory (AD) or LDAP for authentication and authorization. SONAS supports both authentication methods and has equivalent CLI commands for its configuration. When users access SONAS, they are required to enter their user ID and password. This user ID and password pair is sent across the network to the remote authentication/authorization server which compares the user ID and password to the valid user ID and password combinations in the database. If they match then the user is considered to be Authenticated. The remote server then sends a response to SONAS confirming that the user has been successfully Authenticated. Terminology: 1. Authentication is the process of verifying the identity of the user. Users confirms that they are indeed the users they are claiming to be. This is typically accomplished by verifying the user ID and password. 2. Authorization is the process of determining if the users are allowed to access. The users might have permissions to access certain files but might not have permissions to access others. This is typically done by ACLs. The following sections describe the configuration of Active Directory Server (AD) and LDAP in detail. You will choose one of the authentication methods.

Configuring using Active Directory (AD)


The CLI command cfgad allows you to configure AD server. After the configuration, you need to check if it has been successful using the CLI command chkauth. See Example 9-22 for the command usage. For the example, consider that the AD server here has IP 9.11.136.116 and the AD user aduser has password adpassword.
Example 9-22 Configuring using Windows AD using CLI command cfgad

[root@furby.mgmt001st001]$ cfgad -as 9.11.136.132 -c Furby.storage.tucson.ibm.com -u Administrator -p Ads0nasdm (1/11) Parsing protocol (2/11) Checking node accessibility and CTDB status (3/11) Confirming cluster configuration (4/11) Detection of AD server and fetching domain information from AD server (5/11) Checking reachability of each node of the cluster to AD server 301

Chapter 9. Installation and configuration

(6/11) Cleaning previous authentication configuration (7/11) Configuration of CIFS for AD (8/11) Joining with AD server (9/11) Configuration of protocols (10/11) Executing the script configADForSofs.sh (11/11) Write auth info into database EFSSG0142I AD server configured successfully Now verify that the cluster is now part of the Active Directory (AD) domain using the chkauth command as shown in Example 9-23.
Example 9-23 Verifying that the Windows AD server has been successfully configured.

[root@furby.mgmt001st001]$ chkauth -c Furby.storage.tucson.ibm.com -t Command_Output_Data UID GID Home_Directory Template_Shell CHECK SECRETS OF SERVER SUCCEED

Configuring using Lightweight Directory Access Protocol (LDAP)


The CLI command cfgldap allows you to configure the LDAP server. After the configuration, you need to check if it has been successful using the CLI command chkauth. See Example 9-22 for the command use. For Example 9-24, consider the LDAP server (ls) here has IP sonaspb29 and other parameters that LDAP requires such as these: Suffix (lb) as dc=sonasldap,dc=com, rootdn (ldn) as cn=manager, dc=sonasldap, dc=com, password (lpw) as secret and ssl method as tls. You can get this information from your LDAP administrator. It is found in the /etc/eopn/ldap/slapd.conf file on the LDAP server. This information is also collected in Preinstallation Planning Sheet. in Chapter 8, Installation planning on page 249.
Example 9-24 Configuring using LDAP using CLI command cfgldap

[root@furby.mgmt001st001]$ cfgldap -c Furby.storage.tucson.ibm.com -d storage.tucson.ibm.com -lb dc=sonasldap,dc=com -ldn cn=Manager,dc=sonasldap,dc=com -lpw secret -ls sonaspb29 -ssl tls -v Now verify that the cluster is now part of the LDAP sever using the chkauth command as show in Example 9-25.
Example 9-25 Verifying that the LDAP server has been successfully configured

[root@furby.mgmt001st001]$chkauth -c Furby.storage.tucson.ibm.com -t Command_Output_Data UID GID Home_Directory Template_Shell CHECK SECRETS OF SERVER SUCCEED

9.5.11 Configuring Data Path IP Addresses


CLI command mknw helps you configure the Data Path IP Addresses. The command creates a network configuration that can be configured on the interface node for the cluster. This meta-configuration-only operation is applied first if a user attaches the network configuration to an interface and, optionally, to a host group (see Example 9-26). In the example we use the public IP addresses of: 9.11.137.10, 9.11.137.11, 9.11.137.12, 9.11.137.13, 9.11.137.14, and 9.11.137.15. The subnet is 9.11.136.0/23 and the subnet gateway is 0.0.0.0/0:9.11.136.1

302

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Example 9-26 Configuring the Data Path IP using the CLI command mknw

[root@furby.mgmt001st001]$ mknw 9.11.136.0/23 0.0.0.0/0:9.11.136.1 add 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15 Verify that the Data Path IP Address has been successfully configured using the CLI command lsnw as shown in Example 9-27.
Example 9-27 Verifying that the Network is successfully configured using CLI command lsnw

[root@furby.mgmt001st001]$ lsnw -r Network VLAN ID Network Groups IP-Addresses Routes


9.11.136.0/23 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15

The previous command is used with no VLAN. You can also run with VLAN option as shown in Example 9-28. In the example, 101 is the identification number of the VLAN.
Example 9-28 Configuring the Data Path IP with VLAN using the CLI command mknw

[root@furby.mgmt001st001]$ mknw 9.11.136.0/23 0.0.0.0/0:9.11.136.1 --vlan 101 add 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15 Verify that the command is successful by running CLI lsnw command. Example 9-29 shows sample output.
Example 9-29 Verifying that the network is successfully configured using CLI command lsnw

[root@furby.mgmt001st001]$ lsnw -r Network VLAN ID Network Groups IP-Addresses Routes


9.11.136.0/23 101 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15

9.5.12 Configuring Data Path IP address group


The CLI command mknwgroup helps you configure the Data Path IP Address Group. The command creates a group of nodes with the name groupName. An existing network configuration can be attached to this group of nodes (see Example 9-30).
Example 9-30 Configure Data Path IP Group using CLI command mknwgroup

[root@furby.mgmt001st001]$ mknwgroup int int001st001,int002st001,int003st001,int004st001,int005st001,int006st001 Verify that the command is successful by running CLI command lsnwgroup as seen in Example 9-31.
Example 9-31 Verifying that the Data Path IP Group has been successfully configured using CLI command lsnwgroup

[root@furby.mgmt001st001]$ lsnwgroup -r Network Group Nodes Interfaces DEFAULT int int001st001,int002st001,int003st001,int004st001,int005st001,int006st001

Chapter 9. Installation and configuration

303

9.5.13 Attaching the Data Path IP Address Group


The CLI command attachnw helps you attach the Data Path IP Address Group. The command attaches a network to a specified network group. All nodes in the network are configured so that the cluster manager can start any of the IP addresses configured for the specified network on the specified interface (see Example 9-32).
Example 9-32 Attaching the Data Path IP group using CLI command attachnw

[root@furby.mgmt001st001]$ attachnw 9.11.136.0/23 ethX0 -g int Verify that the command is successful by running the CLI command lsnw as shown in Example 9-33.
Example 9-33 Verify that the Data Path IP has been successfully attached using CLI command lsnw

[root@furby.mgmt001st001]$ lsnw -r
Network VLAN ID Network Groups IP-Addresses Routes 9.11.136.0/23 int 9.11.137.10,9.11.137.11,9.11.137.12,9.11.137.13,9.11.137.14,9.11.137.15

9.6 Creating Exports for data access


SONAS allows clients to access the data stored on the filesystem using protocols like CIFS, NFS and FTP. Data exports are created and as long as the protocols CIFS, NFS and FTP are active, these exports can be accessed using these protocols. An export is a shared disk space. Exports can be created using the CLI command mkexport and also using the GUI. The mkexport command requires you to enter Sharename and a Directory Path for the export. The Directory Path is the path where the directory that is to be accessed by the clients is located. This directory is seen by the clients with its Sharename. The Sharename is used by the clients to mount the export or access the export. Depending on the protocol you want to configure your export for, you need to pass the respective parameters. As an administrator, you must provide all these details: 1. FTP takes no parameter. 2. NFS requires you to pass certain parameters: Client/IP/Subnet/Mask: The clients that can access the NFS share. * implies all clients can access ro or rw: Depending on if the export must have read-only access or read-write access. root_Squash: This is enabled by default. You can set it to no_root_squash. async: This is enabled by default. You can set it to sync if required. 3. CIFS requires you to pass parameters like: browsable: By default it is Yes, you can change it to No. comment: You can write any comment for the CIFS export. In Example 9-34, we see how an export is created for all the protocols such as CIFS, NFS, and FTP. Let the Sharename be shared and Directory path be /ibm/gpfs0/shared. Here, we also need to be sure that the filesystem gpfs0 is already existing and mounted on all the nodes. We set the default parameters for CIFS and mention minimum parameters for NFS.

304

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Example 9-34 Creating Data export using CLI command mkexport [root@furby.mgmt001st001]$ mkexport shared /ibm/gpfs0/shared --nfs "*(rw,no_root_squash,async)" --ftp --cifs browseable=yes,comment="IBM SONAS" --owner "STORAGE3\eebbenall" EFSSG0019I The export shared has been successfully created.

Verify that the exports are created correctly using the lsexport command as shown in Example 9-35.
Example 9-35 Verifying that the export has been successfully created using CLI command lsexport [root@furby.mgmt001st001]$ lsexport -v Name Path Protocol shared /ibm/gpfs0/shared FTP shared /ibm/gpfs0/shared NFS shared /ibm/gpfs0/shared CIFS Active true true true Timestamp Options 4/14/10 6:13 PM 4/14/10 6:13 PM *=(rw,no_root_squash,async,fsid=693494140) 4/14/10 6:13 PM browseable,comment=IBM SONAS

9.7 Modifying ACLs to the shared export


Access Control Lists (ACLs) are used to specify the authority a user or group has to access a file, directory, or file system. A user or group can be granted read-only access to files in a directory, while given full (create/write/read/execute) access to files in another directory. Only a user who has been granted authorization in the ACLs will be able to access files on the IBM SONAS. Access rights / ACL management must only be executed on Windows clients. In case a Windows workstation is not available, it can be done on the CLI using the GPFS command to change the ACLs. For this, you will need root access to the SONAS system and user must be familiar with the VI editor. Go through the following steps to provide ACLS: 1. First, change the group or user who can access this shared export using command chgrp as in Example 9-36, where group is domain users from the domain domain1. The export here is /ibm/gpfs0/shared (see Example 9-36).
Example 9-36 Change group permissions using command chgrp

$ chgrp "STORAGE3\domain users" /ibm/gpfs0/shared 2. Use a Windows workstation on your network to modify the ACLs in order to provide the appropriate authorization. The following sub-steps can be used as a guide: a. Access the shared folder using Windows Explorer. Owner: This procedure must be used by the owner. b. Right-click the folder, and select Sharing and Security... c. Use the functions on the Sharing tab and/or the Security tab to set the appropriate authorization. 3. If a Windows workstation is not available for modifying the ACLs, use the following sub-steps to manually edit the ACLs: VI editor: This requires manual editing of the ACL file using the VI editor, so it must only be used by those who are familiar with the VI editor. You need to be root in order to execute this command.

Chapter 9. Installation and configuration

305

4. Run the command shown in Example 9-37 using the VI editor to modify ACLs. Specify that you want to use VI as the editor in the following way: $ export EDITOR=/bin/vi
Example 9-37 View GPFS ACLs to give access to users using GPFS command mmgetacl $ export EDITOR=/bin/vi Type mmeditacl /ibm/gpfs0/shared and press Enter. The following screen is displayed: #NFSv4 ACL #owner: STORAGE3\eebenall #group: STORAGE3\domain users special:owner@:rwxc:allow (X)RbEAD/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

5. Now change the ACLs by adding the text shown in Example 9-38 in bold.
Example 9-38 Adding new group to export using the GPFS command mmeditacl #NFSv4 ACL #owner: STORAGE3\eebenall #group: STORAGE3\domain users group:STORAGE3\domain admins:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

306

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6. Verify that the new user/group has been added by running the command mmgetacl for the directory whose ACLs were changed. The output (see output in Example 9-39) must include the newly added user/group shown in Example 9-37 on page 306.
Example 9-39 Verifying that the new group was successfully added to export using GPFS command mmgetacl $ mmgetacl /ibm/gpfs0/shared #NFSv4 ACL #owner: STORAGE3\administrator #group: STORAGE3\domain users group:domain1\domain admins:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (X)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (X)WRITE_NAMED special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:rwx-:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

9.8 Testing access to the SONAS


Now that we have completed the installation and configuration, let us see how to access the SONAS appliance: 1. You need to be already connected to the management node. If not, login with user id root and password. 2. From the management node, you can view the Health Center using the following steps: a. Select Applications SONAS SONAS GUI as shown in Figure 9-7.

Figure 9-7 Showing how to access SONAS GUI

b. If an Alert is displayed, warning you about an invalid security certificate, click OK.

Chapter 9. Installation and configuration

307

c. A Secure Connection Failed message might be displayed as shown in Figure 9-8.

Figure 9-8 Security Connection Failed message when accessing GUI

Click Add Exception as shown in Figure 9-9.

Figure 9-9 Adding exception

308

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

A new window appears as shown in Figure 9-10. Click Get Certificate, and click Confirm Security Exception.

Figure 9-10 Get Certificate and Confirm Security Exception

d. At the Integrated Solutions Console login panel, log in with User ID root and the root password as shown in Figure 9-11.

Figure 9-11 Login into Management GUI Interface with root user ID and Password

e. If you are asked if you want Firefox to remember this password, click Never for This Site. f. The first time you log into the GUI, you will be asked to accept the software license agreement. Follow the instructions on the panel to accept the software license agreement.
Chapter 9. Installation and configuration

309

g. Click Health Summary. h. Click Alert Log. The Alert Log will be displayed. i. Review the Alert Log entries. Figure 9-12 shows an example of Alert Log.

Figure 9-12 Example Alert Log. Ignore the Critical Errors

Attention: It is normal for one or more informational entries (entries with a severity of info) to be in the log following the installation. These entries can be ignored. j. If any problems are logged, click the Event ID for more information. The Information Center will be displayed with information about the Event ID. k. If a Firefox prevented this site from opening a pop-up window. message is displayed, click Preferences, and click Allow pop-ups for localhost. l. Resolve any problems by referring to the Problem Determination guide in the Information Center. If unable to resolve a problem, contact your next level of support. m. When any problems have been resolved, clear the System Log by clicking System Log then clicking Clear System Log. n. When you are finished using the SONAS GUI, click Logout. The Scale Out File Services login panel will be displayed. o. Close the browser by clicking X. The Linux desktop will be displayed. p. Log out by selecting System Log Out root. 310
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

3. Connect the Ethernet network cables. Cables: Connecting the customer Ethernet cables is a customer responsibility. 4. Connect each cable to an available Ethernet port on the interface node. 5. If the rack contains another interface node, repeat the steps in this section until all interface nodes in the rack have been cabled. 6. If you are installing more than one rack, repeat the steps in this section until all interface nodes in all of the racks you are installing have been cabled. The IBM SONAS system is now ready for use.

Chapter 9. Installation and configuration

311

312

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

10

Chapter 10.

SONAS administration
In this chapter we provide information about how you use the GUI and CLI to administer your SONAS. Daily administrator tasks are discussed and examples provided.

Copyright IBM Corp. 2010. All rights reserved.

313

10.1 Using the management interface


The SONAS appliance can be accessed using the Graphical User Interface (GUI) and the Command Line Interface (CLI) provided by SONAS. The GUI has different administrative Panels to carry out administrative tasks. The CLI allows you to administer the system using commands. Both CLI and GUI provide with details and help for each tasks and command. CLI also contains the manpages that you can use to get more information about a certain command. The GUI tasks are made to be self explanatory and also have tool tips for every text box or command which gives more information about what is to be done. There is also the ? sign which is for help on the right hand upper corner of each panel in the GUI. In this chapter, most of the important and commonly used commands are explained for both the GUI and the CLI.

10.1.1 GUI tasks


You can start and stop the GUI as root user by the Management console using the startmgtsrv and stopmgtsrv commands. GUI tasks are those that you can carry out using the Graphical Interface of the SONAS appliance. You login to the Management Node using an Internet Browser such as Internet Explorer or Firefox. On the URL bar, you enter this link: https://management_node_name_or_ip_address:1081/ibm/console In our example, we have management node with ip: 9.11.137.220. Hence, you can access the GUI using this link: https://9.11.137.220:1081/ibm/console Figure 10-1 shows the login page. You need to enter the login name and password and click Log in to login to a Management Interface.

Figure 10-1 SONAS Management GUI asking for login details

314

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

When logged in, you will be able to see the panel as shown in Figure 10-2.

Figure 10-2 SONAS GUI when logged in as GUI user

The left frame of the GUI in Figure 10-2 shows the collapsed view of all the categories existing in the GUI. Figure 10-3 illustrates the various areas on the GUI navigation panel. To the left we have the main navigation pane which allows us to select the component we want to view or the task we want to perform. On the top we see the currently logged-in administrative user name and just below that we find the navigation tabs that allow you to switch between multiple open tasks. Underneath we have a panel that contains context-sensitive help, minimize, and maximize buttons at the top right. W e then have action buttons and table selection, sorting and filtering controls. Below that we see a table list of objects. At the bottom right is a refresh button that shows the time the data was last collected and will refresh the displayed data when pressed. Clicking an individual object brings up a detailed display of information for that object.

Chapter 10. SONAS administration

315

Main Navigation

Logged in User Tab Navigation Help, min/maximize Panel Action Buttons Table select, sort, filter

Click for Details

Click to refresh List

Figure 10-3 ISC GUI navigation panel areas

The left frame expanded view of all the tasks are as shown in Figure 10-4. As seen on the URL bar, you provide the MGMT GUI IP address or Management Node hostname along with right path to access GUI as mentioned previously. When logged in, on the main page, around the top center, you can see the CLI user name who is currently logged in to the GUI. On the right corner, you will also see link to logout from GUI. The left frame is a list of categories which provide links to perform any task on the cluster. Click the links on the left to open the corresponding panel on the right. Next, we describe the categories at a high level.

316

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-4 Expanded view of the left panel with all tasks available in SONAS GUI

The GUI categories are divided into several tasks seen as links in the left frame. Here is a brief description of these tasks: 1. Health Center: This panel shows the health of the cluster, its nodes. It gives a topological view of the cluster and its components. It also provides the logs of system and alert logs of the system. It provides additional features such as Call home. 2. Clusters: This panel allows you to manage the Cluster, the Interface Nodes, Storage Nodes. 3. Files: All file system related tasks can be performed in this panel. 4. Storage: Storage at the back-end can be managed using the tasks available. 5. Performance and Reports: SONAS GUI provides you with elegant reports and graphs of various parameters that you can measure such as File system utilization, Disk utilization, and others. 6. SONAS Console Settings: In this section, you can enable threshold limits for Utilization Monitoring. You can also view the tasks scheduled on the SONAS system. Also, in case of any notification required for crossing any threshold values, you can set notification to send emails. Managing this can be done using this panel. 7. Settings: In this Panel, you can manage users and also enable tracing.
Chapter 10. SONAS administration

317

In the next section, we discuss each of the categories and underlying tasks.

Health Summary
This category allows you to check the health of the cluster including the Interface Nodes, Storage Nodes and the Management Nodes. It consists of 3 panels: 1. Topology: This panel displays a graphical representation of the SONAS Software system topology. It provides information about the Management Nodes, Interface Nodes, Storage Nodes. It includes the state of the data networks and Storage Blocks. It also shows information related to the filesystem, such as number of Filesystems existing, number Mounted, number of Exports and more (Figure 10-5). You can click each component and see further details about each.

Figure 10-5 Topology View of the SONAS cluster

318

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2. Alert Log: The Alert log panel displays the alert events that are generated by the SONAS Software. Each page has around 50 logs displayed. Severity of event can be Info, Warning or Critical. They are displayed in Blue, Yellow and Red respectively. You can filter logs in the table depending on the severity, time period of logs and source. Source of logs is the host on which the event occurred on. See Figure 10-6.

Figure 10-6 Alert Logs in the GUI for SONAS

Chapter 10. SONAS administration

319

3. System Log: This panel displays system log events that are generated by the SONAS Software, which includes management console messages, system utilization incidents, status changes and syslog events. Each page displays around 50 logs. System logs are of 3 levels, Information (INFO), Warnings (WARNING) and Severe (SEVERE). You can filter the logs by the log level, component, host and more. Figure 10-7 shows how the System log panel in the GUI looks.

Figure 10-7 System Logs in the GUI for the SONAS cluster

Clusters
This panel allows you to administer the cluster including the Interface Nodes and storage Nodes. It allows you to modify the cluster configuration parameters. Each panel and its tasks are discussed in the following section: 1. Clusters: a. Add/Delete cluster to Management Node: The GUI allows you to manage not just the cluster it is a part of, but also other clusters. You can also delete the cluster from the GUI n order to stop managing it. You can add the cluster you want to manage using the Add cluster option in the Select Action drop-down box. This opens a new panel (in the same window) in which you need to add IP Address of one of the nodes of the cluster and its password. The cluster is identified and is added into the GUI. Figure 10-8 on page 321 shows how you can add the cluster.

320

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-8 Cluster Page on the GUI. Select Action to add cluster in the GUI

You can also Delete the cluster previously added to the GUI, by selecting it using the check box present before the name of the cluster and clicking the Delete cluster option in the Select Action drop box. It will ask you for your confirmation before deleting it. Figure 10-9 shows how you can delete the cluster added to the GUI.

Figure 10-9 Cluster Page on the GUI. Select Action to delete cluster in the GUI

Chapter 10. SONAS administration

321

b. View Cluster Status: This panel displays the clusters that have been added to the GUI. See Figure 10-10.

Figure 10-10 View Cluster Status

c. Nodes: This panel is one of the tabs on the lower side of the clusters panel. Here, you can view the status of connection for all the nodes such as Management Node, Interface Nodes and Storage Nodes. Figure 10-11 shows the view of the Nodes Panel.

Figure 10-11 Node information seen

Clicking the links of the Nodes shown in blue in Figure 10-12 will take you to the respective Interface Node, Storage Node Panel explained in points 2 and 3 of the section Clusters on page 320. d. File Systems: This panel displays all the filesystem on the SONAS appliance. It shows other information of the filesystem such as the mount point, the size of the filesystem, free space, used space and more. See Figure 10-12. This is a read-only panel for viewing.

Figure 10-12 Filesystem Information seen

322

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

e. Storage Pools: The storage pool panel displays the information of the various Storage Pools existing. It displays which filesystem belongs to the Storage pool and its capacity used. See Figure 10-13. This is just a read only panel for viewing. You cannot modify any Storage Pool parameters.

Figure 10-13 Storage Pool Information seen

f. Services: The Services panel shows the various Services that are configured on the SONAS appliance. It also shows its status whether Active or Inactive. The services that are supported are FTP, CIFS, NFS, HTTP and SCP. These services are required to configure the data exports on SONAS. End users can access data stored in SONAS using these data exports. Hence, services need to be configured in order to share the data to be able to be accessed by using one of the services. You cannot modify the status of the services from this panel. See Figure 10-14.

Figure 10-14 Service information seen

Chapter 10. SONAS administration

323

g. General Options: This option allows to view and modify the Cluster configuration. It allows you to modify some of its Global options as well as node specific parameters. You can also view the cluster details such as cluster name, cluster id, primary and secondary servers and many more. See Figure 10-15.

Figure 10-15 General Options for the cluster seen

h. Interface Details: This panel allows you to view and modify the Cluster Managers (CTDB) configuration details. The panel in Figure 10-16 is where you can see the Netbios Name, Workgroup name, if the Cluster Manager manages winbind and more. This panel is read only.

Figure 10-16 Interface details for the Cluster

324

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The Advance Options button, on the panel shown in Figure 10-17, allows you to view and modify the CTDB configuration parameters. You can modify the reclock path and other Advanced Options of the CTDB. CTDB manages many services and has a configurable parameter for each. You can modify each to allow CTDB to manage or to not manage. A few of the parameters are as follows: CTDB_MANAGES_VSFTPD, CTDB_MANAGES_NFS CTDB_MANAGES_WINBIND CTDB_MANAGES_HTTPD CTDB_MANAGES_SCP

By default, these values are set to yes and CTDB manages them. You can modify it to no if you want CTDB to not manage these services. In case CTDB is not managing the services, anytime the service goes down, CTDB will not notify by going unhealthy. It will remain in the OK state. In order to monitor the service, set this value to yes.

Figure 10-17 Advanced Options under Interface details seen as CTDB information

Chapter 10. SONAS administration

325

2. Interface Nodes: The Interface Nodes panel allows you to view the node status. It displays the public IP for each node, the active IP address it is servicing, CTDB status and more. You can also carry out operations on the node such as, Suspend, Resume node, Restart the node or Recover CTDB. To do so, you need to select the Node on which you want to perform the action and then select the button respectively. You must also select the Cluster whose Interface Nodes you want to check by selecting the cluster from the Active cluster drop-down menu. Figure 10-18 shows the Interface Nodes Panel.

Figure 10-18 Interface Node details and operations that can be performed on them

3. Storage Nodes: This panel displays the information of the Storage Nodes. It displays the IP address of the Storage Nodes, their Connection Status, GPFS status and more. It also allows you to Start and Stop the Storage Nodes using the Buttons, Start and Stop. You need to select the Storage Nodes which you want to Start/Stop and click the Start or Stop button respectively. You must also select the Cluster whose Storage Nodes you want to check by selecting the cluster from the Active cluster drop-down menu. You will notice that the first Storage Node is highlighted by default. Below in the same pane, you can see the details of the Storage Node such as Hostname, Operating System, Host IP Address, Controller Information and more. Figure 10-19 shows the Storage Node GUI panel.

326

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-19 Storage Node details and operations that can be performed on them

Files
This panel allows you to carry out file system related tasks. You can create File Systems, Filesets, Exports, Snapshots and many more. You must select the Cluster on which you want to perform the tasks by selecting the cluster from the Active cluster drop-down menu. Each of the tasks that can be performed will be described in the following section: 1. File Systems: This panel has four sections as described here: a. File System: This section allows you to create File Systems. The underlying File system that a SONAS appliance creates is a GPFS clustered File System. If there is a filesystem already existing, it displays the basic information about the filesystem such as Name, Mount Point, Size, Usage and more. You can also perform operations such as Mount, Unmount and Remove a Filesystem. The buttons on that panel help perform these tasks. See Figure 10-20. In case the filesystem extends to next page, you can click the arrow button to move to the next page and back. The table also has a refresh Button which is the button in the lower right corner. This will refresh the list of exports in the table. You can also select individually, select all or select inverse the exports in this table.

Chapter 10. SONAS administration

327

Figure 10-20 File system list and operations that can be performed on them

b. File System Configuration: This section displays the configuration details of the highlighted File System. It shows information about Device Number, ACL Type, Number of Inodes, Replication details, Quota details, Mount information and more. It also allows you to modify the ACL Type, Locking Type, Number of Inodes for the File System. Click the Apply button to apply the new configuration parameters. See Figure 10-21.

Figure 10-21 File system detail

328

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

c. File System Disks: This section displays the disks used for the filesystem. It also displays the disk usage type. You can add disks to the file system by clicking the Add a disk to the file system button, or remove disks from the File System by selecting the disk and clicking the Remove button. See Figure 10-22.

Figure 10-22 Disk information of the cluster

d. File System Usage: This section displays the File System Usage information such as the number of Free Inodes, Used Inodes, the Storage pool usage, and details. See Figure 10-23.

Figure 10-23 File system usage for the cluster

Chapter 10. SONAS administration

329

2. Exports: This panel displays the exports created on the SONAS appliance for clients to access the data. It also allows you to create new Exports, Delete Exports, Modify exports and more. You can also modify the configuration parameters for the protocols such as CIFS and NFS. Next we describe each of the sections: a. Exports: This section displays all the exports that are created along with details such as the Sharename, Path of Directory, Protocols configured for the export. You can add a new export by using the Add button. For existing exports, you can carry out operations such as, Modify the Export, Remove protocols, Activate or Deactivate an export and Remove the export by selecting the export you want to perform an operation on and click the respective button. Figure 10-24 shows the panel with some existing exports as examples. As you can see there are four pages of exports. You can click the arrow button to move to the next page and back. The table also has a Refresh button which is the button in the lower right corner. This will refresh the list of exports in the table. You can also select individual exports, select all or select inverse the exports in this table. By default, the first export is highlighted and protocol details of the exports are displayed in the lower section, explained in detail.

Figure 10-24 Exports existing in the cluster and operations you can perform on them

b. CIFS Export Configuration: This section displays the CIFS export configuration details of the highlighted export. As seen in Figure 10-24, the first export is highlighted by default. You can select other exports from the table. This panel displays the configured parameters such as Comment, Browsable options and Read-only option for the CIFS export and also allows you to modify them. You can also use the Add, Modify and Remove buttons to add, modify and remove the Advanced Options, if any. Click Apply to apply new configuration parameters. Figure 10-25 explains the panel.

330

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-25 Configuration details for CIFS protocol

c. NFS Export Configuration: This section displays the list of NFS clients configured to access the NFS Exports and their Options. You can modify existing client details using the edit link in the table, remove the client using the remove link and also add new client using the Add Client button. Click the Apply button to apply the changes. See Figure 10-26.

Figure 10-26 NFS configuration details

3. Policies: This panel displays and also allows you to set Policies for the file systems existing. Policy is a rule that you can apply to your File System. It is discussed in detail in Call Home test on page 434. The Policies panel has two sections: a. Policies List: This section allows you to view the policies set for the filesystem available. By default the first filesystem is highlighted and its policy details are shown in the lower section of the panel. You can set default policy to a filesystem by clicking the Set Default Policy button. Figure 10-27 shows the Policy Panel.

Chapter 10. SONAS administration

331

Figure 10-27 Policies listed for the file systems in the cluster

In the previous example, currently there is no policy set for the filesystem. b. Policy Details: This section shows the Policy details for the filesystem. There is a policy editor that exists which is a text box where you can write new policies for the file system. You can apply the policy using the Apply Policy button or set the policy by clicking the Set Policy button on the right of the editor. You can also load policies using the Load Policy button. See Figure 10-28.

Figure 10-28 Editor to create policies for the file systems

332

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

4. File Sets: This panel displays the File Sets existing in the file system. You can choose the file system whose File Sets you want to view by choosing the filesystem from the Active file system drop-down list along with the active cluster from the Active Cluster drop-down menu. The table then displays all the File Sets that exist for the filesystem. The root fileset is created by the system. This is the default one and is created when you create the first filesystem. In the table, you can also view the details of the filesets such as Name, Path linked to, Status and more, in the lower section of the table. You need to highlight or select the fileset whose details you want to see. By default the first File Set is highlighted and in the lower section of the panel, you can view and modify other details of the File Set. You can view other File Set details by clicking and highlighting the one you want to view. Figure 10-29 shows the list of all the file sets and information about the file set which is highlighted. In our example, we have just the root file set listed.

Figure 10-29 Listing Filesets in the cluster and displaying information of the fileset

Chapter 10. SONAS administration

333

5. You can use the Create a File Set button to create a new one. You can also delete existing file sets or unlink the existing file sets by selecting the file sets that you want operate on and clicking the Delete or Unlink button respectively. Quota: The clustered file system allows enabling quota and assigning quotas to the users, groups on filesets and file systems. There are soft limits and hard limits for disk space and for number of i-nodes. There is also Grace time available when setting quotas. These concepts are described here: Soft Limit Disk: The soft limit defines a level of disk space and files below which the user, group of users or file set can safely operate. Specify soft limits for disk space in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Hard Limit Disk: The hard limit defines the maximum disk space and files the user, group of users or file set can accumulate. Specify hard limits for disk space in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Soft Limit I-nodes: The i-node soft limit defines the number of i-nodes below which a user, group of users or file set can safely operate. Specify soft limits for i-nodes in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Hard Limit I-nodes: The i-node hard limit defines the maximum number of i-nodes that a user, group of users, or file set can accumulate. Specify hard limits for i-nodes in units of kilobytes (k or K), megabytes (m or M), or gigabytes (g or G). If no suffix is provided, the value is assumed to be in bytes. Grace Time: Grace time allows the user, group of users, or file set to exceed the soft limit for a specified period of time (the default is one week). If usage is not reduced to a level below the soft limit during that time, the quota system interprets the soft limit as the hard limit and no further allocation is allowed. The user, group of users, or file set can reset this condition by reducing usage enough to fall below the soft limit. Figure 10-30 shows the screen capture of how Quota looks in the GUI. On a SONAS appliance, as of now the GUI only allows read only access to quota which means, you can only view the quota but not enable or set quota. In our example, this is the default quota displayed for the file systems for the user root.

Figure 10-30 Quotas page in the GUI

334

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6. Snapshots: This panel displays the Snapshots existing in the file system. You can choose the file system whose File Sets you want to view by choosing the filesystem from the Active file system drop-down list along with the active cluster from the Active Cluster drop-down menu. In the table that lists the snapshots, you can also see other details such as Name, Status, Creation Time Stamp, and more. 7. You can remove an already existing snapshot from the cluster. To do this, select the snapshot you want to remove and click the Remove button. You can also create snapshots using the Create a new Snapshot of the active cluster and filesystem button. By default, the first snapshot is selected and highlighted. In the lower section of the panel, you can see the details of the snapshot. You can choose another snapshot from the list to see its corresponding details (Figure 10-31).

Figure 10-31 Snapshot lists that exist in cluster and its details

Chapter 10. SONAS administration

335

Storage
This panel allows you to view the Storage Disks and Pool details. You can perform certain operations on them such as Remove Disk, Suspend or Resume Disks. You can also view the Storage Pools available and its usage details. You must select the Cluster on which you want to perform the tasks by selecting the cluster from the Active cluster drop-down menu. We describe each of the tasks that can be performed in the following section: 1. Disks: This panel displays the disks that are available in the SONAS appliance and its information such as Usage, Filesystem it is attached to, Status, Failure Group, Storage Pool it belongs to and more. The table also has a refresh Button which is the button in the lower right corner. This will refresh the list of exports in the table. You can also select individually, select all or select inverse the exports in this table. You can filter the table using filter parameters. By default, the first disk is highlighted and below in the lower end of the pane, other details of the disks are displayed. This includes Volume ID, Sector Size, List of disk servers they reside on and more. See Figure 10-32.

Figure 10-32 List of Storage disks in the cluster and their details

336

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2. Storage Pools: This panel displays the Storage Pool list for a file system. The main table displays the filesystems existing in the cluster. It also displays the Pool Usage and i-node Usage for the filesystem. By default, the first filesystem in the list is highlighted and for this filesystem in the lower section of the panel, you can see the Storage Pool related details such as Number of Free i-Nodes, Maximum I-Nodes, Allocated i-Nodes. It also displays the size of the pool. You can also see Size, Free Blocks and Fragment details of the NSD or disks in the system. See Figure 10-33.

Figure 10-33 Storage pools existing in the cluster and their details

Chapter 10. SONAS administration

337

Performance and reports


This panel allows you to monitor the Performance and generate Reports. You can check performance of the System in the System Utilization panel and also of the File System in the File System Utilization panel. You can also generate reports for the same, daily, weekly, monthly or any other. The charts are a pictorial representation. Each of the panels are described next. 1. System Utilization: In this panel, you can view the Performance of the System. You can also generate reports or charts that illustrate the system utilization of the specified cluster nodes. You need to choose at least one node and the measurement settings. Click the Generate Charts button to display the chart you want. See Figure 10-34.

Figure 10-34 System Utilization details for the Nodes in the cluster.

338

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2. File System Utilization: This panel generates charts that illustrate the utilization of the specified file system. The table shows the filesystem that exist in the cluster. It also displays other details such as Clustername, Disk usage for filesystem, and more. At the start, no chart is displayed until you select the Filesystem, Duration and click the Generate Charts button. See Figure 10-35.

Figure 10-35 File system Utilization for the file systems in the cluster

SONAS console settings


This panel allows you to carry out various kinds of tasks such as view and add thresholds to monitor the cluster, list the scheduled tasks and also create new tasks for being scheduled. You can also setup notification so that, if the threshold is reached or crossed, an event report can be generated and a mail be sent to the administrator or other recipients. Additionally, you can add or remove recipients and edit the contact information. You must select the Cluster on which you want to perform the tasks by selecting the cluster from the Active cluster drop-down menu.

Chapter 10. SONAS administration

339

Each of the tasks that can be performed will be described in the following section: 1. Utilization Thresholds: This panel list all thresholds for various utilization monitors per cluster. A corresponding log message is generated for a monitor if the warning respectively the error level value is exceeded by the values measured the last recurrences times. The table displays the details of all the thresholds added such as their Warning level, Error level and more. You can remove a threshold previously added by selecting it and clicking the Remove button. You can add new Thresholds using the Add Threshold button. See Figure 10-36. Generation of charts is also explained in detail for system utilization in 10.9.1, System utilization on page 411 and file system utilization in 10.9.2, File System utilization on page 413.

Figure 10-36 Threshold details for the cluster

2. Scheduled Tasks: This panel allows you to view and manage the tasks. SONAS has a list of predefined task for the management node. A predefined task can be a GUI task or a cron task. GUI tasks can be scheduled only one time and only run on the management node. Whereas cron tasks can be scheduled multiple times and for the different clusters managed by the management node. Cron tasks are predefined to run either on all nodes of the selected cluster or on the recovery master node only. You can add new tasks and remove or execute existing tasks too. This panel has two sections. First part is the table that lists all the tasks that are already scheduled. The lower section is the details about each tasks. The two sections are explained next: a. Tasks List: This is the upper section of the pane. It lists the tasks that are already scheduled in the form of a table. It includes the Task Name, Schedule, Execution Node, Status of last run and more. You can execute or remove any task in the list by selecting the task and clicking the Execute or Remove button respectively. You can also add a new task using the Add Task button. See Figure 10-37. You can select any other tasks to see its details as displayed in the next section. You can also select single or multiple nodes and filter out using filter parameters. You can select the arrow button to view tasks in the next page if any.

340

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-37 Scheduled tasks list for the cluster

b. Task Details: By default, the first task is highlighted in the table. You can change the selection by clicking any other tasks from the table. Upon selecting the task, its details is shown in the lower section of the pane. The details include Task name, Description, Task Parameter, Schedule time and more. See Figure 10-38.

Figure 10-38 Task details

Chapter 10. SONAS administration

341

3. Notification Settings: This panel allows you to define notification settings for the selected cluster. Choose the Default option in the drop-down menu to apply the settings as default values for all clusters. The panel, as you can see in Figure 10-39, has a lot of options that you can choose to set notifications for. You can set it for Utilization monitoring, GUI, Syslg events, Quota Checking and many more. You must also fill out the General E-mail Settings section of the panel with email addresses and details, so that upon any event generated in case any threshold has been reached, the respective users will receive a notification email. Describe Header or/and Footer to the email if required. To finish, complete the SNMP Settings section with the required details and make sure that you click the Apply button to save your settings.

Figure 10-39 Notification settings

4. Notification Recipients: This panel lists all the recipients who are configured to receive notification emails in case certain threshold that you are monitoring has been crossed. Select the cluster from the Active Cluster drop-down menu. The table lists the Name, email ID, Status and more. You can remove an existing user added. See Figure 10-40.

342

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-40 Notification settings details for added recipients

5. Contact Information: The internal contact information is used as reference data only. You can enter the data for the internal contact who has been chosen to address any SONAS questions or issues. The details you must add are: Customer name, Main phone contact, Site phone contact, E-mail contact, Location, Comment. You can do so using the Edit button. See Figure 10-41.

Figure 10-41 Contact details of the customer

Chapter 10. SONAS administration

343

Settings
This panel allows you to manage the Console settings and tracing. It also allows you to manage users allowed to access SONAS using the Management Interface (GUI). The two sections are described briefly next: 1. Console Logging and Tracing: This panel allows you to view and modify the configuration properties of the console server diagnostic trace services. Changes to the configuration take affect after clicking OK. See Figure 10-42.

Figure 10-42 Console Logging and tracing

2. Console User Authority: Configuration for adding, updating, and removing Console users. See Figure 10-43.

344

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-43 Console User authority

10.1.2 Accessing the CLI


To access the CLI, ssh to the Management Node and login using the CLI User ID and password. You will be taken to a restricted shell which allows you to run only CLI commands and no Linux commands except a few limited ones. For example, consider the Management Node hostname is: Furbymgmt.storage.tucson.ibm.com, you login to CLI using: #ssh Furbymgmt.storage.tucson.ibm.com You will be asked to enter the CLI user id and Password. After that, you will be taken to a CLI prompt. See Figure 10-44.

Figure 10-44 CLI user logging in to the Management Node from a Linux client

Chapter 10. SONAS administration

345

Example 10-1 contains a list of commands that are available for a CLI user.
Example 10-1 Command list for a CLI user

[Furby.storage.tucson.ibm.com]$ help Known commands: addcluster Adds an existing cluster to the management. attachnw Attach a given network to a given interface of a network group. backupmanagementnodeBackup the managament node cfgad configures AD server into the already installed CTDB/SMABA cluster.Previously configured authentication server settings will be erased cfgbackupfs Configure file system to TSM server association cfgcluster Creates the initial cluster configuration cfghsm Configure HSM on each client facing node cfgldap configure LDAP server against an existing preconfigured cluster. cfgnt4 configure NT4 server against an existing preconfigured cluster. cfgsfu Configures user mapping service for already configured AD cfgtsmnode Configure tsm node. chdisk Change a disk. chexport Modifies the protocols and their settings of an existing export. chfs Changes a new filesystem. chfset Change a fileset. chkauth Check authentication settings of a cluster. chkpolicy validates placement rules or get details of management rules of a policy on a specified cluster for specified device chnw Change a Network Configuration for a sub-net and assign multiple IP addresses and routes chnwgroup Adds or removes nodes to/from a given network group. chuser Modifies settings of an existing user. confrepl Configure asynchronous replication. dblservice stop services for an existing preconfigured server. detachnw Detach a given network from a given interface of a network group. eblservice start services for an existing preconfigured server. enablelicense Enable the license agreement flag initnode Shutdown or reboot a node linkfset Links a fileset lsauth List authentication settings of a cluster. lsbackup List information about backup runs lsbackupfs List file system to tsm server and backup node associations lscfg Displays the current configuration data for a GPFS cluster. lscluster Lists the information of all managed clusters. lsdisk Lists all discs. lsexport Lists all exports. lsfs Lists all filesystems on a given device in a cluster. lsfset Lists all filesets for a given device in a cluster. lshist Lists system utilization values lshsm Lists configured hsm file systems cluster lslog Lists all log entries for a cluster. lsnode Lists all Nodes. lsnw List all public network configurations for the current cluster lsnwdns List all DNS configurations for the current cluster lsnwgroup List all network group configurations for the current cluster lsnwinterface List all network interfaces lsnwnatgateway List all NAT gateway configurations for the current cluster lsnwntp List all NTP configurations for the current cluster lspolicy Lists all policies 346
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

lspool Lists all pools. lsquota Lists all quotas. lsrepl List result of the asynchronous replications. lsservice Lists services lssnapshot Lists all snapshots. lstask Lists all (background) tasks for the management node. lstsmnode Lists defined tsm nodes in the cluster lsuser Lists all users of this mangement node. mkexport Creates a new export using one or more protocols. mkfs Creates a new filesystem. mkfset Creates a fileset mknw Create a new Network Configuration for a sub-net and assign multiple IP addresses and routes mknwbond Makes a network bond from slave interfaces mknwgroup Create a group of nodes to which a network configuration can be attached. See also the commands mknw and attachnw. mknwnatgateway Makes a CTDB NAT gateway mkpolicy Makes a new policy into database mkpolicyrule Appends a rule to already existing policy mksnapshot creates a snapshot from a filesystem mktask Schedule a prefedined task for mkuser Creates a new user for this management node. mountfs Mount a filesystem. querybackup Query backup summary restripefs Rebalances or restores the replication of all files in a file system. resumenode Resumes an interface node. rmbackupfs Remove file system to TSM server association rmcluster Removes the cluster from the management (will not delete cluster). rmexport Removes the given export. rmfs Removes the given filesystem. rmfset Removes a fileset rmlog Removes all log entries from database rmnw Remove an existing public network configuration rmnwbond Deletes a regular bond interface. rmnwgroup Remove an existing group of nodes. maybe attached public network configuration must be detached in advance rmnwnatgateway Unconfigures a CTDB NAT gateway. rmpolicy Removes a policy and all the rules belonging to it rmpolicyrule Removes one or more rules from given policy rmsnapshot Removes a filesystem snapshot rmtask Removes the given scheduled task. rmtsmnode Remove TSM server stanza for node rmuser Removes the user from the management node. rpldisk Replaces current NSD of a filesystem with a free NSD runpolicy Migrates/deletes already existing files on the GPFS file system based on the rules in policy provided setnwdns Sets nameservers setnwntp Sets NTP servers setpolicy sets placement policy rules of a given policy on cluster passed by user. setquota Sets the quota settings. showbackuperrors Shows errors of a backup session showbackuplog Shows the log of the recent backup session. showrestoreerrors Shows errors of a restore session showrestorelog Shows the log of the recent restore session. startbackup Start backup process

Chapter 10. SONAS administration

347

startreconcile startrepl startrestore stopbackup stoprepl stoprestore suspendnode unlinkfset unmountfs

Start reconcile process Start asynchronous replication. Start restore process Stops a running TSM backup session Stop asynchronous replication. Stops a running TSM restore session Suspends an interface node. Unlink a fileset. Unmount a filesystem.

Plus the UNIX commands: grep, initnode, man, more, sed, startmgtsrv, stopmgtsrv, sort, cut, head, less, tail, uniq For additional help on a specific command use 'man command'. [Furby.storage.tucson.ibm.com]$ In SONAS there are some tasks that can be done exclusively by a CLI users while some of the tasks you can perform using CLI commands as well from the GUI. The commands shown are a combination of both. Each command has help regarding the usage and can be viewed by using the command: # manpage <command_name> or # <command_name> --help For example, let us look up help for the command mkfs using --help and manpages. See Example 10-2 for complete help output from the command, and Figure 10-45, which shows a snapshot of the manpage help.
Example 10-2 Help or usage for CLI command mkfs taken as example

[Furby.storage.tucson.ibm.com]$ mkfs --help usage: mkfs filesystem [mountpoint] [-b <blocksize>] [-c <cluster name or id>] [--dmapi | --nodmapi] [-F <disks>] [-i <maxinodes>] [-j <blockallocationtype>] [--master] [-N <numnodes>][--noverify] [--pool <arg>] [-R <replica>] filesystem The device name of the file system to be created. File system names need not be fully-qualified. mountpoint Specifies the mount point directory of the GPFS file system. -b,--blocksize <blocksize> blocksize -c,--cluster <cluster name or id> define cluster --dmapi enable the DMAPI support for HSM -F,--disks <disks> disks -i,--numinodes <maxinodes> Set the maximal number of inodes in the file system. -j,--blockallocationtype <blockallocationtype> blockallocationtype --master master -N,--numnodes <numnodes> numnodes --nodmapi disable the DMAPI support for HSM --noverify noverify --pool <arg> pool -R,--replica <replica> Sets the level of replication used in this file system. Either none, meta or all

348

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-45 Manpage for CLI command mkfs taken as example

Similarly you can run help for each of the commands available in the CLI and also run the manpage command for each.

10.2 SONAS administrator tasks list


In this section we see the administrator tasks that can be done on the SONAS appliance. Some of these tasks are carried out by both the CLI commands and SONAS GUI. There are some that can be carried out only with CLI commands and some that can be done only through the SONAS GUI. Next we describe the commands and tasks that can be done as a whole on the SONAS System. Some of the important and commonly used commands will be discussed in detail in the following sections.

10.2.1 Tasks that can be performed only by the SONAS GUI


The following tasks are performed only by the SONAS GUI: 1. Configure GUI user roles. 2. Configure notification setting and recipients. 3. Configure threshold settings. 4. Change log or trace settings. 5. Show Alert log. 6. Show Health Center (Topology). 7. Show on which nodes the file system is mounted. 8. Start, resume, or suspend an NSD. 9. Querying file system space.

Chapter 10. SONAS administration

349

10.2.2 Tasks that can be performed only by the SONAS CLI


The following tasks are performed only by the SONAS CLI: 1. Start or stop management service. 2. Start, stop, or change asynchronous replication. 3. Create or delete console users. 4. Create or change quotas. 5. Create, list, change, or remove a CLI user. 6. Create, remove, list and restore backups with Tivoli Storage Manager 7. Configure and show network configuration (DNS, NTP, etc). 8. Configure and show authentication server integration. 9. Shutdown or reboot a node. 10.Restripe the file system. 11.'Replace a disk (LUN). 12.'Change disk properties. 13.Set/unset a master file system

10.2.3 Tasks that can be performed by the SONAS GUI and SONAS CLI
The following tasks are performed either by the SONAS GUI or the SONAS CLI: 1. Configure protocols and their settings. 2. Add or remove a cluster to/from the management node. 3. Add or remove Network Shared Disks (NSDs). 4. Create or delete a file system. 5. Create or delete exports. 6. Create or delete tasks 7. Create or delete snapshots. 8. Start or stop storage nodes. 9. Change file system parameters. 10.Change cluster parameters. 11.Change disk or NSD status. 12.'Change policies 13.Link or unlink file sets 14.'Mount or unmount a file system. 15.Select the GPFS cluster. 16.Show node status. 17.'Show cluster status. 18.'Show system utilization (CPU, RAM, and so on). 19.'Show snapshots. 20.'Show file system utilization.

350

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

21.'Show NSD status. 22.'Show file system status. 23.'Show or filter quotas. 24.'Show storage pools. 25.Show policies. 26.Show file sets. 27.Show the event log. 28.Show tasks.

10.3 Cluster management


Cluster related commands are the ones used to view or modify the cluster configuration. It includes configuration of management nodes, Interface nodes, storage nodes or cluster as a whole. Next we describe some of the common cluster tasks in detail.

10.3.1 Adding or deleting a cluster to the GUI


Using the GUI: You can add a cluster to the GUI using the Add/Delete Cluster option in the Clusters panel of the GUI. See details in section 1.a of Clusters on page 320. Using the CLI: You can add the cluster to the CLI using the addcluster command. Example 10-3 shows the usage and command output.
Example 10-3 Usage and command output for CLI command addcluster

[Furby.storage.tucson.ibm.com]$ addcluster --help usage: addcluster -h <host> -p <password> -h,--host <host> host -p,--password <password> password [Furby.storage.tucson.ibm.com]$ addcluster -h int001st001 -p Passw0rd EFSSG0024I The cluster Furby.storage.tucson.ibm.com has been successfully added

10.3.2 Viewing cluster status


Using the GUI: You can view the cluster details that is added by clicking the Clusters link under Clusters on the left side frame of the GUI. This will open a panel with the Cluster Details and all other information related to the cluster. For more information refer to point 1.b of section Clusters on page 320. Using the CLI: You can view the cluster status by running the CLI lscluster command. See Example 10-4 for command output.
Example 10-4 Command output for CLI command lscluster

[Furby.storage.tucson.ibm.com]$ lscluster ClusterId Name PrimaryServer SecondaryServer 12402779238924957906 Furby.storage.tucson.ibm.com strg001st001 strg002st001

Chapter 10. SONAS administration

351

10.3.3 Viewing interface node and storage node status


Using the GUI: You can view the Node status from the Management GUI. For this, you need to click the Interface Node or Storage Node link in the Clusters category. Upon clicking the links, the respective pages will open, which displays the corresponding status. Refer to the point 2 and point 3 from Clusters on page 320 section to view the status of the Interface Nodes and Storage Nodes. Using the CLI: You can view the status of the nodes using the CLI lsnode command. Example 10-5 shows the usage and command output. You can get more information using the -v option or see output formatted with delimiters using -Y option.
Example 10-5 Usage and command output for CLI command lsnode [Furby.storage.tucson.ibm.com]$ lsnode --help usage: lsnode [-c <cluster name or id>] [-r] [-v] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -v,--verbose extended list -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsnode Hostname IP Description Role Last updated int001st001 172.31.132.1 interface 4/22/10 3:59 PM int002st001 172.31.132.2 interface 4/22/10 3:59 PM mgmt001st001 172.31.136.2 management 4/22/10 3:59 PM strg001st001 172.31.134.1 storage 4/22/10 3:59 PM strg002st001 172.31.134.2 storage 4/22/10 3:59 PM

Product Version Connection status GPFS status CTDB status 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 1.1.0.2-7 OK OK OK OK OK active active active active active active active active

10.3.4 Modifying the status of interface nodes and storage nodes


You can also modify the status of interface nodes and storage nodes in following way:

Interface Nodes
This section describes the Interface Nodes commands: 1. Suspend Node: This command suspends the Interface Node and BANS the CTDB on it and disables the Node. A banned node does not participate in the cluster and does not host any records for the CTDB. Its IP address has been taken over by an other node and no services are hosted. Using the GUI: Refer to the point 2 from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: Use the CLI suspendnode command. Example 10-6 shows the usage and command output.

352

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Example 10-6 Command usage and output for CLI command suspendnode

[Furby.storage.tucson.ibm.com]$ suspendnode --help usage: suspendnode nodeName [-c <cluster name or id>] nodeName Specifies the name or ip of the node for identification. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ suspendnode int002st001 -c Furby.storage.tucson.ibm.com EFSSG0204I The node(s) are suspended successfully! 2. Resume Node: The resumenode command resumes the suspended interface node. It unbans the CTDB on that node and enables the node. The resumed node participates in the cluster and hosts records for the clustered trivial database (CTDB). It takes back its IP address and starts hosting services. Using the GUI: Refer to the point 2 from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: Use the CLI resumenode command. Example 10-7 shows the syntax and command output.
Example 10-7 Command usage and output for CLI command resumenode

[Furby.storage.tucson.ibm.com]$ resumenode --help usage: resumenode Node [-c <cluster name or id>] Node Specifies the name of the node for identification. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ resumenode int002st001 EFSSG0203I The node(s) are resumed successfully!

GUI: Recover Node and Restart Node cannot be done using the CLI. They should be done only using the GUI.

Storage nodes
This section describes the Storage Node commands: 1. Stop Node: This command unmounts the filesystem on that Node and shuts down the GPFS daemon. Using the GUI: Refer to the point 3from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: This task cannot be run using the CLI. There is no command existing to perform this operation. 2. Start Node: This starts the GPFS daemon the storage node selected and mounts the filesystem on that Node. Using the GUI: Refer to the point 3from Clusters on page 320 section to view the operations that you can perform on the Interface Nodes and Storage Nodes. Using the CLI: This task cannot be run using the CLI. There is no command existing to perform this operation.

Chapter 10. SONAS administration

353

10.4 File system management


File system management is one of the essential tasks in the SONAS appliance. The file system created is the GPFS file system. Under this category there are many tasks that you can perform right from creating, mounting, unmounting, deleting, changing file system details, adding disks, and more. We discuss some of the important and commonly used file system tasks in detail next.

10.4.1 Creating a file system


Using the GUI: You can create the file system using the GUI by clicking the File System link or task under the Files Category in the GUI. Upon clicking this link, a page will open on the right hand side which has a table that lists the file systems that already exist. In our example, the filesystem gpfs0 already exists. Below this table is the Create a File System button. See Figure 10-46.

Figure 10-46 File system details in the Management GUI

To create the File System, click the Create a File System button. A new panel will open which asks you to enter the details such as these: 1. Select NSD: Here, you select the NSD you want to add in the Filesystem. At least One NSD should be defined. In case of replication, at least two should be selected such that the two NSDs should belong to different Failure Groups so that in case of failure on one NSD, the replica will be available for access. Select the NSD by clicking the check box on the left of the table (see Figure 10-47). Click the Next tab in the panel.

354

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-47 Select NSDs from list available in SONAS GUI to create filesystem

2. Basic Information: This is the next tab. Enter the Mount point for the File System and Device name or File System Name you want to create. Choose the Block Size from the list available in the Block Size drop-down menu. You can use the Force option if you do not want GPFS to check if the NSD chosen has been already used by another File System. Its advisable to use this option if you are sure that the NSD is not currently being used by any file system and it is Free (see Figure 10-48 on page 355). Click the next tab in the panel.

Figure 10-48 Enter basic information in SONAS GUI to create filesystem

3. Locking and access control lists (ACL): This tab is for the ACLs and the Locking type. Currently we support only NFSV4 locking type and NFSV4 ACL type which is already chosen by default in the GUI. The drop-down menu for both Locking Type and ACL type is hence de-activated or disabled. See Figure 10-49. Click the next tab.

Chapter 10. SONAS administration

355

Figure 10-49 Locking and ACL information to create filesystem

4. Replication: This tab allows you to choose if you want replication enabled. In case you enable replication, you need to select at least 2 NSDs as mentioned before. Also, the two NSDs should belong to two different failure groups. The Enable Replication Support enables replication support for all files and metadata in the file system. This setting cannot be changed after the file system has been created. The value for both the maximum data and metadata replicas is set to 2. To set replication to true, select the Enable Replication check box. See Figure 10-50. Click the next tab.

Figure 10-50 Replication information for creating new filesystem

5. Automount: This tab allows you to set Automount to True which means, after every node restart, the file system will be automatically mounted on the nodes. If set to False, or not selected, the file system need to be manually mounted on the nodes. See Figure 10-51. Click the next tab.

Figure 10-51 Setting automount for the new filesystem

356

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

6. Limits: In this tab you need to enter the number of nodes you want the file system to be mounted on and the maximum number of files that the filesystem can hold. Enter the Number of Nodes in the text box available. This is estimated number of nodes that will mount the file system. This is used as a best guess for the initial size of some file system data structures. The default is 32. This value cannot be changed after the file system has been created. When you create a GPFS file system, consider over estimating the number of nodes that will mount the file system. GPFS uses this information for creating data structures that are essential for achieving maximum parallelism in file system operations. Although a large estimate consumes additional memory, under estimating the data structure allocation can reduce the efficiency of a node when it processes some parallel requests such as the allotment of disk space to a file. If you cannot predict the number of nodes that will mount the file system, use the default value. If you are planning to add nodes to your system, you should specify a number larger than the default. However, do not make estimates that are not realistic, because specifying an excessive number of nodes might have an adverse affect on buffer operations. Enter the number of Maximum number of Files in the text box available. This will be the maximum number of files that will be allowed to be created on this file system. See Figure 10-52. Click the next tab.

Figure 10-52 Setting inode limits and maximum number of files for the new filesystem

Chapter 10. SONAS administration

357

7. Miscellaneous: Using this tab you can enable other options such as Quota, DMAPI, atime, and mtime. Check boxes are provided which need to be selected in case you want to select the option. Uncheck if you do not want to select the option. See Figure 10-53.

Figure 10-53 Miscellaneous information for the file system

8. Final Step: Go through each tab again to verify that all the necessary parameters are selected. After you have confirmed all the parameters for the filesystem, click the OK button, which is located is on the lower end of the Create File system panel. When clicked, the task begins and a Tasks Progress window appears that displays the task being performed and its details. When done, at end of each task, there should be a Green check mark (). If any error occurs, there will be a Red cross (x) and an error message will appear. Check the error, correct it, and retry. When the task is completed, click the Close button to close the window. See Figure 10-54.

Figure 10-54 Task Progress bar for completion

358

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Using the CLI: You can create a Filesystem using the mkfs CLI command. The NSD name is mandatory and you need to enter at least one NSD. Select -R (replication) to none, if you do not want to enable Replication. In case you enable replication, you need to enter at least two NSDs where both these NSDs belong to different Failure Groups. The block size and replication factors chosen affect file system performance. Example 10-8 shows the help and usage of the command. For the example, the block size was left to the default 256 KB. Also, replication was not enabled.
Example 10-8 mkfs command example

[Furby.storage.tucson.ibm.com]$ mkfs --help usage: mkfs filesystem [mountpoint] [-b <blocksize>] [-c <cluster name or id>] [--dmapi | --nodmapi] [-F <disks>] [-i <maxinodes>] [-j <blockallocationtype>] [--master] [-N <numnodes>][--noverify] [--pool <arg>] [-R <replica>] filesystem The device name of the file system to be created. File system names need not be fully-qualified. mountpoint Specifies the mount point directory of the GPFS file system. -b,--blocksize <blocksize> blocksize -c,--cluster <cluster name or id> define cluster --dmapi enable the DMAPI support for HSM -F,--disks <disks> disks -i,--numinodes <maxinodes> Set the maximal number of inodes in the file system. -j,--blockallocationtype <blockallocationtype> blockallocationtype --master master -N,--numnodes <numnodes> numnodes --nodmapi disable the DMAPI support for HSM --noverify noverify --pool <arg> pool -R,--replica <replica> Sets the level of replication used in this file system. Either none, meta or all [Furby.storage.tucson.ibm.com]# mkfs gpfs1 --nodmapi -F array0_sata_60001ff0732f85c8c080008 -R none --noverify The following disks of gpfs1 will be formatted on node strg001st001: array0_sata_60001ff0732f85c8c080008: size 15292432384 KB Formatting file system ... Disks up to size 125 TB can be added to storage pool 'system'. Creating Inode File 3 % complete on Fri Apr 23 09:54:04 2010 5 % complete on Fri Apr 23 09:54:09 2010 7 % complete on Fri Apr 23 09:54:14 2010 9 % complete on Fri Apr 23 09:54:19 2010 11 % complete on Fri Apr 23 09:54:24 2010 13 % complete on Fri Apr 23 09:54:29 2010 15 % complete on Fri Apr 23 09:54:34 2010 17 % complete on Fri Apr 23 09:54:39 2010 19 % complete on Fri Apr 23 09:54:44 2010 21 % complete on Fri Apr 23 09:54:49 2010 23 % complete on Fri Apr 23 09:54:54 2010 25 % complete on Fri Apr 23 09:54:59 2010 27 % complete on Fri Apr 23 09:55:04 2010 29 % complete on Fri Apr 23 09:55:09 2010 31 % complete on Fri Apr 23 09:55:15 2010
Chapter 10. SONAS administration

359

33 % complete on Fri Apr 23 09:55:20 2010 35 % complete on Fri Apr 23 09:55:25 2010 37 % complete on Fri Apr 23 09:55:30 2010 39 % complete on Fri Apr 23 09:55:35 2010 41 % complete on Fri Apr 23 09:55:40 2010 43 % complete on Fri Apr 23 09:55:45 2010 45 % complete on Fri Apr 23 09:55:50 2010 47 % complete on Fri Apr 23 09:55:55 2010 48 % complete on Fri Apr 23 09:56:00 2010 50 % complete on Fri Apr 23 09:56:05 2010 52 % complete on Fri Apr 23 09:56:10 2010 54 % complete on Fri Apr 23 09:56:15 2010 56 % complete on Fri Apr 23 09:56:20 2010 58 % complete on Fri Apr 23 09:56:25 2010 60 % complete on Fri Apr 23 09:56:30 2010 62 % complete on Fri Apr 23 09:56:35 2010 64 % complete on Fri Apr 23 09:56:40 2010 66 % complete on Fri Apr 23 09:56:45 2010 67 % complete on Fri Apr 23 09:56:50 2010 69 % complete on Fri Apr 23 09:56:55 2010 71 % complete on Fri Apr 23 09:57:00 2010 73 % complete on Fri Apr 23 09:57:05 2010 75 % complete on Fri Apr 23 09:57:10 2010 77 % complete on Fri Apr 23 09:57:15 2010 79 % complete on Fri Apr 23 09:57:20 2010 81 % complete on Fri Apr 23 09:57:25 2010 82 % complete on Fri Apr 23 09:57:30 2010 84 % complete on Fri Apr 23 09:57:35 2010 86 % complete on Fri Apr 23 09:57:40 2010 88 % complete on Fri Apr 23 09:57:45 2010 90 % complete on Fri Apr 23 09:57:50 2010 92 % complete on Fri Apr 23 09:57:55 2010 94 % complete on Fri Apr 23 09:58:00 2010 96 % complete on Fri Apr 23 09:58:05 2010 97 % complete on Fri Apr 23 09:58:10 2010 99 % complete on Fri Apr 23 09:58:15 2010 100 % complete on Fri Apr 23 09:58:16 2010 Creating Allocation Maps Clearing Inode Allocation Map Clearing Block Allocation Map Formatting Allocation Map for storage pool 'system' 60 % complete on Fri Apr 23 09:58:31 2010 100 % complete on Fri Apr 23 09:58:34 2010 Completed creation of file system /dev/gpfs1. EFSSG0019I The filesystem gpfs1 has been successfully created. EFSSG0038I The filesystem gpfs1 has been successfully mounted. EFSSG0015I Refreshing data ... [Furby.storage.tucson.ibm.com]#

360

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

10.4.2 Listing the file system status


In this section we show how to list the file system status. Using the GUI: When clicking the File System link or task from the Files category, you can see the table that lists all the filesystems in the Cluster. By default the first file system is highlighted or selected. The details of the filesystem are shown in the lower section of the panel. You can see File System, Disk and Usage information. If you want to look at details of another file system, you need to select that such that it is highlighted. More information can be found in the point 1 of Files on page 327. Using the CLI: You can view the status of the File systems using the CLI command, lsfs. The command displays the file system names, mount point, Quota, Blocksize ACL Types, Replication details and more. See usage and command output in Example 10-9.
Example 10-9 Command usage and output for the CLI command lsfs

[Furby.storage.tucson.ibm.com]$ lsfs --help usage: lsfs [-c <cluster name or id>] [-d <arg>] [-r] [-Y] -c,--cluster <cluster name or id> define cluster -d,--device <arg> define device -r,--refresh refresh list -Y format output as delimited text
[Furby.storage.tucson.ibm.com]$ lsfs Cluster Devicename Mountpoint Type Remote device Quota Def. quota Blocksize Locking type ACL type Inodes Data replicas Metadata replicas Replication policy Dmapi Block allocation type Version Last update Master Humboldt.storage.tucson.ibm.com gpfs0 /ibm/gpfs0 local local user;group;fileset 256K nfs4 nfs4 100.000M 1 2 whenpossible F scatter 11.05 4/23/10 5:15 PM YES Humboldt.storage.tucson.ibm.com gpfs2 /ibm/gpfs2 local local user;group;fileset 64K nfs4 nfs4 14.934M 1 1 whenpossible F scatter 11.05 4/23/10 5:15 PM NO

10.4.3 Mounting the file system


In this section we show how to mount the file system. Using the GUI: To mount the File system you have created using the GUI, click the filesystem that you want to mount and click the Mount button. The task will proceed to ask you to choose the number of nodes you want to mount the filesystem on. You can either choose Mount on all nodes or Choose nodes from the drop-down menu as seen in Figure 10-55.

Figure 10-55 Select to mount the file system on all or selective nodes

Chapter 10. SONAS administration

361

Choose to mount on selected nodes then requires you to select the nodes on which you want to mount the file system. The window seen is similar to Figure 10-56.

Figure 10-56 Select the nodes if to mount on selective nodes

When done, click OK in the same window. The filesystem is then mounted on the nodes specified. The task progress window will display the progress, and when successful, will have Green check marks (). If any error, the error message will be shown and the window wall show Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-57.

Figure 10-57 Task Progress bar for completion

If successful, close the window by clicking the Close button. The window will disappear and you will be brought to the first page of the File Systems page. The table on the main File System page should now list the filesystem to be mounted on the number of nodes selected. Using the CLI: You can mount the filesystem using the CLI mountfs command. The command allows you to choose to mount the file system on all the nodes or on specific interface nodes. The usage and command output is displayed in Example 10-10. In the example, the filesystem gpfs1 is mounted on all nodes and hence the -n option is omitted.
Example 10-10 Command usage and output for the CLI command mountfs

[Furby.storage.tucson.ibm.com]$ mountfs --help usage: mountfs filesystem [-c <cluster name or id>] [-n <nodes>] filesystem Identifies the file system name of the file system. File system names need not be fully-qualified. -c,--cluster <cluster name or id> define cluster -n,--nodes <nodes> nodes [Furby.storage.tucson.ibm.com]$ mountfs gpfs2 EFSSG0038I The filesystem gpfs2 has been successfully mounted.

362

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

10.4.4 Unmounting the file system


Using the GUI: You can unmount the file system you have created using the GUI by clicking the file system that you want to mount and then clicking the Unmount button. The task will proceed to ask you to choose the number of nodes you want to unmount the filesystem on. You can either choose Unmount on all nodes or Choose nodes from the drop-down menu as seen in Figure 10-58.

Figure 10-58 Select if to be unmounted from all or selective nodes

Choose to mount on selected nodes then requires you to select the nodes on which you want to mount the file system. The window is shown Figure 10-59.

Figure 10-59 Select nodes to unmount from

When done, click OK in the same window. The file system is then unmounted on the nodes specified. The task progress window will display the progress, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-60.

Figure 10-60 Task Progress bar for completion

Chapter 10. SONAS administration

363

After operations have completed, close the window by clicking the Close button. The window will disappear and you will be brought to the first page of the File Systems page. The table on the main File System page should now list the filesystem to be unmounted on the number of nodes selected. Using the CLI: You can unmount the filesystem using the CLI unmountfs command. The command allows you to choose to unmount the file system on all the nodes or on specific interface nodes. The usage and command output is displayed in Example 10-11. The filesystem gpfs1 is unmounted on all nodes and hence -n option is omitted.
Example 10-11 Command usage and output for the CLI command unmountfs.

[Furby.storage.tucson.ibm.com]$ unmountfs --help usage: unmountfs filesystem [-c <cluster name or id>] [-n <nodes>] filesystem Specifies the name of the filesystem for identification. -c,--cluster <cluster name or id> define cluster -n,--nodes <nodes> nodes [Furby.storage.tucson.ibm.com]$ unmountfs gpfs2 EFSSG0039I The filesystem gpfs2 has been successfully unmounted.

10.4.5 Modifying the file system configuration


Using the GUI: The SONAS GUI allows you to modify the file system configuration of the filesystem already created. Some of the parameters require that the filesystem is unmounted while some can be done while it is still mounted. The lower section of the file system, panel which also displays the status, disks, and usage information of the filesystem, has some check boxes as well as text boxes that can be edited to modify various parameters. 1. Modifying the File system Configuration Parameters: As shown in Figure 10-61, the text box for Number of iNodes and drop-down menu for Locking Type and ACL type can be modified while the filesystem is still mounted.

364

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-61 Panel to view and modify the Filesystem details

The three check boxes, Enable Quota, Suppress atime and Exact mtime are the ones that need the filesystem to be unmounted. In Figure 10-61 these check boxes are shown with a red asterisk (*) Upon modifying the parameters, you need to click the button OK for the task to progress. The task bar will show you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-62 on page 365. Click the Close button to close the window.

Figure 10-62 Task Progress bar for completion

Chapter 10. SONAS administration

365

2. Modifying the Disks for the File System: You can add or remove disks added to the filesystem. The file system should have at least one disk. a. Adding New Disks: You can add more by clicking the Add Disk to the file system button. A new window appears listing the free disks, you can choose which disk to add. Choose the disk type. You can also specify the Failure Group and Storage pool of the disk when adding. When done, click OK. See Figure 10-63.

Figure 10-63 Select disk to add to the file system

The task progress bar appears, showing you the progress of operation, and when successful, it will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem, and retry. See Figure 10-64. The new disk will be successfully added. Click the Close button to close the window.

Figure 10-64 Task Progress bar for completion

366

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

b. Remove Disks: You can also remove the disk by selecting the disk you want to delete and clicking the Remove button from the panel in the lower section of the File Systems page as shown in Figure 10-65.

Figure 10-65 Select the disk to be removed from the list of disks for the file system selected

On clicking the Remove button, a new window will appear which asks for confirmation to remove the disk as shown in Figure 10-66.

Figure 10-66 Confirmation for removal of disks

To confirm, click the OK button. The task progress bar will show you the progress of the operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). In case of an error, check the logs, correct the problem, and retry. See Figure 10-68. The new disk will be successfully removed. Click the Close button to close the window.

Figure 10-67 Task Progress bar for completion

Using the CLI: You can change the file system parameters using the command chfs. Example 10-12 describes the usage and shows command output of chfs used to add new disk to the filesystem

Chapter 10. SONAS administration

367

Example 10-12 Command usage and output by adding disk to change properties

[Furby.storage.tucson.ibm.com]$ chfs --help usage: chfs filesystem [--add <disks> | --noverify | --pool <arg>] [--atime <{exact|suppress}>] [-c <cluster name or id>] [--force | --remove <disks>] [-i <maxinodes>] [--master | --nomaster] [--mtime <{exact|rough}>][-q <{enable|disable}>] [-R <replica>] filesystem The device name of the file system to be changed. File system names need not be fully-qualified. --add <disks> Adds disks to the file system. --atime <{exact|suppress}> If set to exact the file system will stamp access times on every access to a file or directory. Otherwise access times will not be recorded. -c,--cluster <cluster name or id> define cluster --force enforce disk removal without calling back the user -i,--numinodes <maxinodes> Set the maximal number of inodes in the file system. --master master --mtime <{exact|rough}> If set to exact the file or directory modification times will be updated immediately. Otherwise modification times will be updated after a several second delay. --nomaster nomaster --noverify noverify --pool <arg> pool -q,--quota <{enable|disable}> Enables or disables quotas for this file system. -R,--replica <replica> Sets the level of replication used in this file system. Either none, meta or all --remove <disks> Removes disks from the file system. [Furby.storage.tucson.ibm.com]$ chfs gpfs2 --add array0_sata_60001ff0732f85c8c080008 The following disks of gpfs2 will be formatted on node strg001st001: array0_sata_60001ff0732f85c8c080008: size 15292432384 KB Extending Allocation Map Checking Allocation Map for storage pool 'system' 9 % complete on Mon Apr 26 12:14:19 2010 10 % complete on Mon Apr 26 12:14:24 2010 18 % complete on Mon Apr 26 12:14:29 2010 26 % complete on Mon Apr 26 12:14:34 2010 27 % complete on Mon Apr 26 12:14:39 2010 35 % complete on Mon Apr 26 12:14:44 2010 43 % complete on Mon Apr 26 12:14:49 2010 44 % complete on Mon Apr 26 12:14:55 2010 52 % complete on Mon Apr 26 12:15:00 2010 53 % complete on Mon Apr 26 12:15:05 2010 61 % complete on Mon Apr 26 12:15:10 2010 62 % complete on Mon Apr 26 12:15:15 2010 70 % complete on Mon Apr 26 12:15:20 2010 71 % complete on Mon Apr 26 12:15:25 2010 77 % complete on Mon Apr 26 12:15:30 2010 83 % complete on Mon Apr 26 12:15:35 2010 90 % complete on Mon Apr 26 12:15:40 2010 95 % complete on Mon Apr 26 12:15:45 2010 100 % complete on Mon Apr 26 12:15:49 2010 Completed adding disks to file system gpfs2. mmadddisk: Propagating the cluster configuration data to all affected nodes. This is an asynchronous process. EFSSG0020I The filesystem gpfs2 has been successfully changed. EFSSG0015I Refreshing data ...

368

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

10.4.6 Deleting a file system


Using the GUI: To delete an existing file system from the cluster, click the File system link in the Files Category on the left panel. Select the filesystem you want to delete and click the Remove button. A window asking for your confirmation appears. Click the OK button if you are sure. See Figure 10-68. Attention: Make sure that the file system is unmounted at this point.

Figure 10-68 Confirmation to delete the filesystem

After you have confirmed, the operation is carried out. The task progress bar shows you the progress of operation and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-69. The new disk will be successfully removed. Click the Close button to close the window.

Figure 10-69 Task Progress bar for completion

Using the CLI: You can delete and existing file system from the cluster using CLI rmfs command. The command usage and output is shown in Example 10-13.
Example 10-13 Command usage and output for removing the file system.

[Furby.storage.tucson.ibm.com]$ rmfs --help usage: rmfs filesystem [-c <cluster name or id>] [--force] filesystem The device name of the file system to contain the new fileset. File system names need not be fully-qualified. -c,--cluster <cluster name or id> define cluster --force enforce operation without calling back the user [Furby.storage.tucson.ibm.com]$ rmfs gpfs2 Do you really want to perform the operation (yes/no - default no): yes All data on following disks of gpfs2 will be destroyed: array1_sata_60001ff0732f85f8c0b000b Completed deletion of file system /dev/gpfs2.

Chapter 10. SONAS administration

369

mmdelfs: Propagating the cluster configuration data to all

10.4.7 Master and non-master file systems


A Master file system is a special type of file system. At least one file system should be a master file system to avoid a split-brain detection. Only a single file system should be a master file system. The master role will be moved if another file system has the master role already. Split-brain detection and node failover for NFS will not work properly without a master file system. More about Master File system is explained in the CTDB section of the Appendix A, Additional component details on page 485. Using the GUI: As of now, you cannot create a Master file system from the GUI. Using the CLI: Master file system can be created using the CLI command mkfs and option --master. When creating the first file system in the cluster it will automatically be set as master even if --master has not been specified. You can make the first filesystem as non-master, by creating the file system using the mkfs command and using the --nomaster flag. To know more about the mkfs command, see Example 10-8 on page 359, which explains the creating of file system using the CLI command mkfs.

10.4.8 Quota management for file systems


The SONAS file system enables you to add quotas to the file systems and to the users and groups that exist on the box. You can set the quota for a user, a group, or a file set. Soft limits are subject to reporting only while hard limits will be enforced by the file system. The GUI as of now allows only to view the quotas that are enabled. You need to set the quota using the CLI setquota command. This command will set the quota. For setting quota the file system should have the quota option enabled for itself. You can do this using the GUI or the CLI command chfs. The file system should be in an unmounted state while making this change. Quota tasks are discussed next: 1. View or List quota Using the GUI: You can list the quota using the GUI by clicking the Quota link under the Files category. See pint 5 under Files on page 327 for more explanation on this. The section already explains viewing quota from the GUI. Using the CLI: You can view the quota from the CLI using the lsquotaCLI command. The command retrieves data regarding the quota managed by the management node from the database and returns a list in either a human-readable format or in a format that can be parsed.

370

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Example 10-14 shows the usage and the command output for the lsquota command.
Example 10-14 Command usage and output for CLI command lsquota.

[Furby.storage.tucson.ibm.com]$ lsquota --help usage: lsquota [-c <cluster name or id>] [-r] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -Y format output as delimited text

[Furby.storage.tucson.ibm.com]$ lsquota Cluster Device SL(usage) HL(usage) Used(usage) SL(inode) HL(inode) Used(inode) Furby.storage.tucson.ibm.com gpfs0 ----16 kB ----1 Furby.storage.tucson.ibm.com tms0 ----13.27 MB ----135 Furby.storage.tucson.ibm.com tms0 ----13.27 MB ----135 Furby.storage.tucson.ibm.com tms0 ----13.27 MB ----135 Furby.storage.tucson.ibm.com gpfs0 ----832 kB ----23 Furby.storage.tucson.ibm.com gpfs0 ----832 kB ----23 Furby.storage.tucson.ibm.com gpfs0 ----816 kB ----22 Tip: The actual command output displayed on the panel has many more fields than shown in this example, which has been simplified to keep the important information clear. 2. Set Quota Using the GUI: You cannot set quota from the GUI. GUI shows only a read only representation for the Quota management. Using the CLI: You can set the quota for the filesystem using the CLI command setquota. This command sets the quota for an user, a group, or a file set. Soft limits are subject to reporting only; hard limits will be enforced by the file system. Disk area size (terabytes), or p (petabytes). These values are not case sensitive. The effective quotas are passed in kilobytes and matched to block sizes. I-node limits accept only k and m suffixes. The maximal value for i-node limits is 2 GB. Warning: The setting of the quota does not update the database, because the refresh takes too much time. If you want to see the result immediately with the lsquota command, invoke it using the -r option (lsquota -r). Example 10-15 shows the command usage and the output for the CLI command setquota. In the example, we setquota for hard and soft limits for disk usage for a user eebenall from the domain Storage3 and the file system gpfs0.
Example 10-15 Command usage and output of CLI command setquota

[Furby.storage.tucson.ibm.com]$ setquota --help usage: setquota device [-c <cluster name or id>] [-g <arg>] [-h <arg>] [-H <arg>] [-j <arg>] [-S <arg>] [-s <arg>] [-u <arg>] device The mount point or device of the filesystem. -c,--cluster <cluster name or id> define cluster -g,--group <arg> name of the group -h,--hard <arg> hardlimit of the disk usage in bytes, KB, MB, GB, TB or PB -H,--hardinode <arg> hardlimit of the inodes in bytes, KB or MB -j,--fileset <arg> name of the fileset
Chapter 10. SONAS administration

371

-S,--softinode <arg> -s,--soft <arg> PB -u,--user <arg>

softlimit of the inodes in bytes, KB or MB softlimit of the disk usage in bytes, KB, MB, GB, TB or name of the user

accepted postfixes: 'k' : kiloByte, 'm' :MegaByte, 'g' : GigaByte, 't' : TeraByte, 'p' : PetaByte [Furby.storage.tucson.ibm.com]$ setquota gpfs0 -u STORAGE3\\eebenall -h 400g EFSSG0040I The quota has been successfully set. -s 200g

10.4.9 File set management


The SONAS appliance allows you to create filesets. A file set is a group of files. They are created inside an existing file system. They are similar to filesystem in some ways as you can perform file system operations on them. You can replicate, set quotas and also create snapshots. Filesets are not mounted but are linked or unlinked. You can link a fileset to a directory. This function creates a junction or a link. The directory to which you link the fileset should not be an existing directory. It will be created when you link and deleted when you unlink. More is explained in further sections. Other tasks you can do on a file set is view, create, remove, link and unlink. Let us consider each in detail. 1. View or list file sets: Using the GUI: You can view the file sets created and their information by clicking the Filesets link under the Files category. This section was described in Point 4 under Files on page 327. Using the CLI: You can view the file sets in the cluster by using the lsfset CLI command. This command lists all the filesets along with the details. In the next example, you can also see an additional fileset, newfileset along with the default root. Example 10-16 shows the command usage and output of CLI command lsfset.
Example 10-16 Usage and output for the CLI command lsfset

[Furby.storage.tucson.ibm.com]$ lsfset --help usage: lsfset device [-c <cluster name or id>] [-r] [-Y] device The device name of the file system to contain the fileset. File system names need not be fully-qualified. -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 4/21/10 4:39 PM root fileset 4/26/10 5:10 PM 1 newfileset Unlinked -4/26/10 10:33 AM this is a test fileset 4/26/10 5:10 PM

372

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2. Create Filesets: Using the GUI: You can create a fileset by clicking the Create a Fileset button on the main page of the Filesets. This will open a new window asking for the details of the filesets such as Name and an optional Comment. Click OK when done. The task will create a fileset. The newly created fileset will be displayed in the table of all the filesets. You can click it to see details. At this point, the fileset is not linked to any directory. It cannot be used to store data. You need to link the fileset similar to mounting a file system before using it to store data. Figure 10-70 shows the dialog box for creating a fileset.

Figure 10-70 Creating a new file set

The task progress bar shows you the progress of operation and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show Red cross sign (x). In case of an error, check the logs, correct the problem, and retry. See Figure 10-71. The new disk will be successfully added. Click the Close button to close the window.

Figure 10-71 Task bar showing progress of creating fileset

Using the CLI: You can create the fileset using the mkfset CLI command. This command constructs a new file set using the specified name. The new file set is empty except for a root directory, and does not appear in the directory namespace until the linkfset command is issued to link the fileset. The command usage and output is shown in Example 10-17. In the example, we create a new fileset called newfileset in the gfs0 file system. This fset is not yet linked and hence in the Path column you see no value. We can check that the fileset is created successfully by checking the lsfset command. The example also shows the command output of lsfset. In this example, the new fileset used is newfileset create on filesystem gpfs0.
Example 10-17 Command usage and output for CLI command mkfset and lsfset

[Furby.storage.tucson.ibm.com]$ mkfset --help usage: mkfset device filesetName [-c <cluster name or id>] [-t <comment>] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified.

Chapter 10. SONAS administration

373

filesetName Specifies the name of the newly created fileset. -c,--cluster <cluster name or id> define cluster -t <comment> comment [Furby.storage.tucson.ibm.com]$ mkfset gpfs0 newfileset -t This is a new Fileset EFSSG0070I Fileset newfileset created successfully! [Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 2:06 AM 1 newfileset Unlinked -5/5/10 2:06 AM this is a new fileset 5/5/10 2:06 AM 3. Link file sets: When the file sets are linked, a junction is created. The junction is a special directory entry, much like a POSIX hard link, that connects a name in a directory of one file set, the parent, to the root directory of a child file set. From the users viewpoint, a junction always appears as if it were a directory, but the user is not allowed to issue the unlink or rmdir commands on a junction. Instead, the unlinkfset command must be used to remove a junction. As a prerequisite, the file system must be mounted and the junction path must be under the mount point of the file system. Using the GUI: When you create a fileset it is not linked by default. You need to manually link it to a directory which is not existing. In the GUI, when you click the new fileset that you have created in the table, the section below in the panel for file sets, displays the information about the file set. In our example, we have created a new file set called newfileset which is not yet linked. The lower section displays the details such as Name, Status, and more. Along with this, if the fileset is not yet linked, the Link button is enabled, and you can click it. A new window opens asking for the path. Click OK when done and the fileset will then be linked to this directory. See Figure 10-72.

Figure 10-72 Details of the fileset created is seen. The fileset is currently not linked

In case the fileset is already linked, the Unlink button will be enabled and the text box for the path and the Link button will be disabled.

374

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In our example, we now link the file set to a path /ibm/gpfs0/redbook. The file set newfileset is now linked to this path. See Figure 10-73, which shows the dialog box that opens to enter the path to link the file set.

Figure 10-73 Linking the fileset to the path /ibm/gpfs0/redbook

The task bar for the progress of the task appears. Click Close when the task is completed successfully. The details for the file set are shown in Figure 10-74.

Figure 10-74 Fileset details after linking file set

Using the CLI: You can link the file set using the CLI linkfset command. The command will link the file set to the directory specified. This directory is the junctionPath in the command. In the example, we also run lsfset to confirm the file set is linked. See Example 10-18. The fileset used is newfileset created on filesystem gpfs0.
Example 10-18 Linking fileset using CLI command linkfset. lsfset verifies the link

[Furby.storage.tucson.ibm.com]$ linkfset --help usage: linkfset device filesetName [junctionPath] [-c <cluster name or id>] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified. filesetName Specifies the name of the fileset for identification. junctionPath Specifies the name of the junction. The name must not refer to an existing file system object. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ linkfset gpfs0 newfileset /ibm/gpfs0/redbook EFSSG0078I Fileset newfileset successfully linked! [Furby.storage.tucson.ibm.com]$ [root@st002.mgmt001st002 ~]# lsfset gpfs0

Chapter 10. SONAS administration

375

ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 3:10 AM 1 newfileset Linked /ibm/gpfs0/redbook 5/5/10 2:06 AM this is a new fileset 5/5/10 3:10 AM 4. Unlink file sets: Using the GUI: You can unlink the file set by clicking the Unlink button. From the table that lists all the file set, click the file set you want to unlink. Upon clicking the file set, you will see the file set details below the table. In the details of the file set, you have the Unlink button. See Figure 10-74, which displays the details of file set and the Unlink button. When you click this button, a new window opens asking for confirmation. See Figure 10-75.

Figure 10-75 Confirm to unlink fileset

Click OK to confirm. The task bar for the progress of the task appears. Click Close when task completed successfully. The fileset will be successfully unlinked. Using the CLI: You can unlink the file set using the unlinkfset CLI command. The command unlinks a linked file set. The specified file set must exists in the specified file system. See the command usage and output in Example 10-19. The example also shows the output for command lsfset confirming that the fileset was unlinked. In example the fileset used is newfileset created on filesystem gpfs0.
Example 10-19 Command usage and output for unlinking file set using CLI command unlinkfset and lsfset to verify

[Furby.storage.tucson.ibm.com]$ unlinkfset --help usage: unlinkfset device filesetName [-c <cluster name or id>] [-f] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified. filesetName Specifies the name of the fileset for identification. -c,--cluster <cluster name or id> define cluster -f force [Furby.storage.tucson.ibm.com]$ unlinkfset gpfs0 newfileset EFSSG0075I Fileset newfileset unlinked successfully! [Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment Timestamp 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset 5/5/10 3:26 AM 1 newfileset Unlinked -5/5/10 2:06 AM this is a new fileset 5/5/10 3:26 AM

376

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

5. Remove Filesets: Using the GUI: You can remove the fileset by selecting the fileset you want to delete and click the Delete button. The tasks opens a new window asking for confirmation before deleting (see Figure 10-76).

Figure 10-76 Delete file set confirmation

Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-77. The new disk will be successfully added. Click the Close button to close the window.

Figure 10-77 Task bar showing progress of deleting fileset

Using the CLI: You can delete a fileset using the CLI command rmfset. The command asks for confirmation and then on confirmation, deletes the file set specified. The rmfset command fails if the file set is currently linked into the namespace. By default, the rmfset command fails if the file set contains any contents except for an empty root directory. The root file set cannot be deleted. Example 10-20 shows the command usage and output for deleting a fileset. In this example the fileset used is newfileset created on filesystem gpfs0.
Example 10-20 rmfset command example

[Furby.storage.tucson.ibm.com]$ rmfset --help rmfset usage: rmfset device filesetName [-c <cluster name or id>] [-f] [--force] device The device name of the file system to contain the new fileset. File system names need not be fully-qualified. filesetName Specifies the name of the fileset for identification. -c,--cluster <cluster name or id> define cluster -f Forces the deletion of the file set. All file set contents are deleted. Any child file sets are first unlinked. --force enforce operation without calling back the user

[Furby.storage.tucson.ibm.com]$ rmfset gpfs0 newfileset


Chapter 10. SONAS administration

377

Do you really want to perform the operation (yes/no - default no): yes EFSSG0073I Fileset newfileset removed successfully!

[Furby.storage.tucson.ibm.com]$ lsfset gpfs0 ID Name Status Path CreationTime Comment 0 root Linked /ibm/gpfs0 3/18/10 5:54 PM root fileset

Timestamp 5/5/10 4:25 A

10.5 Creating and managing exports


Data stored in the directories, file sets, and file system can be accessed using data access protocols such as CIFS, NFS, FTP and HTTPS. For this, you need to configure the services and also create shares or exports on the GPFS filesystem with which you can then access the data using any of the aforementioned protocols. SONAS as of now does not support SCP. Services are configured during the installation and configuration of the SONAS appliance. After the services are configured it is possible for you to share your data using any of the protocols by creating exports using the command line option or the GUI. You can add more protocols if not already. You can also remove protocols from the export if you do not want to export the data using the service or protocol. You can also activate and deactivate the export. Finally you can delete the existing export. All this is explained later in this section. Click the Exports link under the Files category on the SONAS GUI to view and manage all the exports in SONAS. A table that lists all existing exports is seen. Read point 2 from Files on page 327 for more on the Exports configuration page. Exports: Right now, IBM SONAS supports FTP, CIFS, and NFS exports. Even though the GUI and CLI commands both show the options of adding an HTTP and SCP export, this is not officially supported yet.

10.5.1 Creating exports


Using the GUI: Click the Add button to create a new export. This opens up a new page which asks for more details on the export such as the Name of export which will be seen by the end users. This is the sharename. You also have to enter the Directory path that you want to export. This is the actual data directory that you want to export to the end users. If not existing, it will create the directory. The path till the directory to be exported however needs to be already existing. You can also assign it with an owner name, so that the owner gets the required ACLs to access this directory. Last step is to identify the protocols by which you want to share this directory. You can choose any or all of them which are CIFS, FTP and NFS. Click next when done. See Figure 10-78.

378

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-78 Panel to create a new export. Provide the sharename, pathname, owner and services

A new page will open which will ask you for protocol related information. Each of them are described here: 1. FTP: FTP does not take any parameters during its configuration. Proceed to click Next as shown in Figure 10-79.

Figure 10-79 Panel that shows that directory in path given is created but has default ACLs

Chapter 10. SONAS administration

379

Attention: The warning message here is because the folder in the directory path mentioned does not exist. However, this directory is created by this operation. This warning message informs you that the directory has been created and has the default ACLs that need to be modified if required. 2. NFS Export: NFS exports are accessed by per clients or hosts and not users. Hence, you need to mention which hosts or clients can access the NFS exports. On the new page that opens, you need to add the client details in the Client settings section as follows: Client Name: Add name of host who can access the export. You can individual hostnames or * for all clients/host. Read Only: Check this box if you want the clients to have Read Only access. Unchecking will give the clients both read and write access. Sync: Check this box if you want replies to the requests only after the changes are committed to stable storage. Root Squash: This option, maps requests from uid/gid 0 to the anonymous uid/gid. Click the Add Client button to successfully add the client. When added, it is added to the table that displays all clients for the NFS export. Now click the Next button. See Figure 10-80.

Figure 10-80 NFS configuration panel. Add client and other properties

380

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

3. CIFS Export: The CIFS configuration parameters follow: Comment: This can be any user defined comment. Browsable: This check box, if checked, allows the export to be visible in the net view command and in the browse list. ACL / Access Rights: If checked, the export has only read-only access. See Figure 10-81.

Figure 10-81 Panel for CIFS configuration

Click the Next button to proceed. The next page is the Final page, which asks for confirmation before configuring the exports. See Figure 10-82.

Figure 10-82 Final configuration page

Click the Finish button to confirm. Click Back to go back and make some changes. Click Cancel to cancel the creation of exports. This will bring you to the main page of Exports. After you have confirmed and clicked to finish, the task is carried out. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). In case of an error, check the logs, correct the problem, and retry. See Figure 10-83. The new disk will be successfully removed. Click the Close button to close the window.

Chapter 10. SONAS administration

381

Figure 10-83 Task Progress bar for completion

The newly created exports will be added to the table on the main page of the exports. Using the CLI: You can create an export using the mkexport CLI command. This command takes the name of the sharename and the directory path of the share you want to create. You can create FTP, CIFS and NFS share with this command. FTP share does not need any parameters. CIFS and NFS take some parameters. Using the command you can also create an inactive share. Inactive shares are when the creation of the share is complete however the share cannot be used by the end users. By default, the share is active. You can also add owner which will give the required ACLs to the user to access the share. The command usage and output is shown in Example 10-21. In this example, FTP, CIFS, and NFS share are created.
Example 10-21 Command usage and output for creating export using CLI command mkexport

[Furby.storage.tucson.ibm.com]$ mkexport --help usage: mkexport sharename path [-c <cluster name or id>] --cifs <CIFS options> | --ftp | --http | --nfs <NFS client definition> | --scp [--inactive][--owner <owner>] sharename Specifies the name of the newly created export. path Specifies the name of the path which will be share. -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp enable FTP protocol --http enable HTTP protocol --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --owner <owner> directory owner --scp
[Furby.storage.tucson.ibm.com]$ mkexport shared /ibm/gpfs0/shared --ftp --nfs "*(rw,no_root_squash,async)" --cifs browseable=yes,comment="IBM SONAS" --owner "SONASDM\eebanell

382

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

You can also create an inactive share using the --inactive option in the mkexport command. You cannot do this from the GUI.

10.5.2 Listing and viewing status of exports created


Using the GUI: You can view the exports that have been created by clicking the Exports link under the Files category from the left panel. More about the listing of exports is already covered in the point 2 of section Files on page 327. Using the CLI: You can list the exports or shares using the lsexport CLI command. This commands lists all the exports as a list for each protocol they are created. Example 10-22 shows the command usage and output.
Example 10-22 Command usage and output for listing exports using CLI command lsexport

[Furby.storage.tucson.ibm.com]$ lsexport --help usage: lsexport [-c <cluster name or id>] [-r] [-v] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -v,--verbose extended list -Y format output as delimited text

[Furby.storage.tucson.ibm.com]$ lsexport Name Path Protocol Active Timestamp 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 FTP true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 HTTP true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 NFS true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 CIFS true 4/28/10 3:35 AM 1.1.0.2-5 /ibm/gpfs0/1.1.0.2-5 SCP true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 FTP true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 HTTP true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 NFS true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 CIFS true 4/28/10 11:03 AM 1.1.0.2-6 /ibm/gpfs0/1.1.0.2-6 SCP true 4/28/10 11:03 AM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 FTP true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 HTTP true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 NFS true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 CIFS true 4/28/10 18.38 PM 1.1.0.2-7 /ibm/gpfs0/1.1.0.2-7 SCP true 4/28/10 18.38 PM

10.5.3 Modifying exports


Using the GUI: You can modify an export by adding more services or protocol to the export or by changing protocol parameters for the existing export. You cannot delete protocols using this button. 1. Add Protocols: If you have already created and export for FTP access and you want to provide it with NFS and CIFS too, you can click the export already existing in the table that lists the exports and click the Modify button. This opens a new window asking for the protocols you want to add to the existing export. See Figure 10-84.

Chapter 10. SONAS administration

383

Figure 10-84 Panel to add new protocols to the export already existing

As you see, the protocols that are already added are disabled. Also, sharename and path are disabled so that you cannot change it. You can click the protocols that you want to add and click Next. The same procedure as creating the export is followed from this step on. Provide details for protocol that you add as in the example, for NFS protocol. FTP takes none. Click the Next button to continue till you finish. For detailed steps, see Creating exports on page 378. 2. Change Protocol Parameters: You can change parameters for both NFS and CIFS. On the main page of Exports under the Files category you can see the table that displays all the existing exports. If you click any export, in the section lower on that same page, you can see protocol information for that export. You can see details for only CIFS and NFS protocols. a. NFS details: You can change NFS details by adding more clients or removing existing. You can also edit an existing client and add more options as seen in Figure 10-85.

Figure 10-85 Modifying NFS configuration by editing clients or adding new clients

You can click the edit link to change options of the client added, you can remove the client added using the remove link. You can also add new client using the Add Client button. When you want to edit or add a new client, a new window opens asking for details of the client as shown in Figure 10-86.

384

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-86 Modify settings for clients

For a new client, you need to add Client name, check or uncheck read-only, root-squash, sync and, others as required. For an existing client, the name field will be disabled because it is an existing one. To remove the client, click Remove link. b. CIFS details: You can modify a CIFS export parameters by editing the Details such as comment, Browsable option, and ACLs. You can also add, modify or remove advanced options for a CIFS share using the Advanced Option - Add, Modify, and Delete buttons. See Figure 10-87.

Figure 10-87 Modify configuration for CIFS sharecs

Using the CLI: You can modify an existing share or export using the chexport CLI command. Using the CLI command, unlike the GUI, you can remove or add protocols using the same command. Each has different options to use. In this section, adding new protocols will be discussed. You can add new protocols by adding the --cifs, --ftp and --nfs options and the protocol definitions. The command usage and output are shown in Example 10-23. For this example, the existing export was a CIFS export. FTP and NFS are added using chexport.
Example 10-23 Command usage and output for adding new protocols to existing share

[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options]
Chapter 10. SONAS administration

385

--ftp <arg> --http <arg> --inactive --nfs <NFS client definition> --nfsadd <NFS clients> --nfsremove <NFS clients> --scp <arg>

FTP HTTP share is inactive enable NFS protocol [using clients(NFSoption)] add NFS clients remove NFS clients SCP

[Furby.storage.tucson.ibm.com]$ chexport shared --ftp --nfs "*(rw,no_root_squash,async)" EFSSG0022I Protocol FTP is configured for share shared. EFSSG0034I NFS Export shared is configured, added client(s): *, removed client(s): None. You can add more or remove clients to the NFS protocols, or modify CIFS options using the [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] and the [--cifs <CIFS options>] options. In the Example 10-24, a new client is added to the NFS export.
Example 10-24 Command output to add new NFS clients to existing NFS share

[Furby.storage.tucson.ibm.com]$ chexport shared --nfsadd "9.1.2.3(rw,no_root_squash,async)" EFSSG0034I NFS Export shared is configured, added client(s): 9.1.2.3, removed client(s): None.

10.5.4 Removing service/protocols


Using the GUI: You can remove protocols from an existing export using the Remove Service button found on the main page of the Exports page in the Files category. Click the existing export or share that you want to modify. Click the Remove Service button. A new window will be opened which asks for the protocols that you want to remove. It has a check box against each protocol. Select the one you want to remove. Only select the ones that the export is already configured with. You will get an error if you select protocols that are not configured already. See Figure 10-88. Click OK when done.

Figure 10-88 Panel to remove protocols from share

On clicking the OK button on that window, the task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-89. The new disk will be successfully removed. Click the Close button to close the window. The export is successfully modified. 386
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-89 Task Progress bar for completion

Using the CLI: You can remove a protocol from an existing share using the command chexport and off option for a protocol. The command usage and output is shown as follows in Example 10-25. In this example the existing share or export is configured for CIFS, FTP, and NFS. The command removes FTP and NFS.
Example 10-25 Command usage and output to remove protocols from existing share

[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp <arg> FTP --http <arg> HTTP --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --nfsadd <NFS clients> add NFS clients --nfsremove <NFS clients> remove NFS clients --scp <arg> SCP [Furby.storage.tucson.ibm.com]$ chexport shared --ftp off --nfs off EFSSG0023I Protocol FTP is removed from share shared. EFSSG0023I Protocol NFS is removed from share shared.

10.5.5 Activating exports


Exports when created are active unless specified. When an export is active, you can access the data in it. When an export is inactive, the configuration data related to the export is removed from all the nodes. Even though the export exists, it cannot be hence accessed by an end user. Using the GUI: You can activate an existing export that has been deactivated using the Activate button in the GUI. This task will create all the configuration data needed for the exports on all the nodes so that the share/export is available for access to the end users. Using the CLI: You can activate a share using the CLI command chexport and --active option. The command usage and output are shown in Example 10-26.

Chapter 10. SONAS administration

387

Example 10-26 Command usage and help to activate an existing share

[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp <arg> FTP --http <arg> HTTP --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --nfsadd <NFS clients> add NFS clients --nfsremove <NFS clients> remove NFS clients --scp <arg> SCP

[Furby.storage.tucson.ibm.com]$ chexport shared --active EFSSG0037I The share shared is activated.

10.5.6 Deactivating exports


Using the GUI: You can deactivate an existing export that is active using the Deactivate button in the GUI. This task will create all the configuration data needed for the exports on all the nodes so that the share/export is available for access to the end users. Using the CLI: You can activate a share using the CLI command chexport and --active option. The command usage and output are shown in Example 10-27. Exports: You can also create an export that is already deactive by using the --inactive option in the command mkexport. For more information, see Creating exports on page 378.
Example 10-27 Command usage and output to deactivate an existing share

[Furby.storage.tucson.ibm.com]$ chexport --help usage: chexport sharename [--active] [-c <cluster name or id>] [--cifs <CIFS options>] [--ftp <arg>] [--http <arg>] [--inactive] [--nfs <NFS client definition>] [--nfsadd <NFS clients>] [--nfsremove <NFS clients>] [--scp <arg>] sharename Specifies the name of the export for identification. --active share is active -c,--cluster <cluster name or id> define cluster --cifs <CIFS options> enable CIFS protocol [using CIFS options] --ftp <arg> FTP --http <arg> HTTP --inactive share is inactive --nfs <NFS client definition> enable NFS protocol [using clients(NFSoption)] --nfsadd <NFS clients> add NFS clients --nfsremove <NFS clients> remove NFS clients --scp <arg> SCP

388

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

[Furby.storage.tucson.ibm.com]$ chexport shared --inactive EFSSG0037I The share shared is inactivated.

10.5.7 Removing exports


Using the GUI: You can remove existing exports using the Remove export button on the main page of the Exports under the Files category. Click the export you want to remove from the table that lists all exports. Click the Remove Export button. You will be asked to confirm the removal of export. See Figure 10-90.

Figure 10-90 Confirmation to remove exports

Click the OK button, which will proceed to remove the export or share. After it has been removed, it will not be seen in the table that shows existing exports. Using the CLI: You can remove an existing export using the rmexport CLI command. The commands asks for your confirmation. If entered, the command removes all the configuration details of the export from all nodes. See the command usage and output in Example 10-28.
Example 10-28 Command usage and output to remove an existing export. [Furby.storage.tucson.ibm.com]$ rmexport --help usage: rmexport sharename [-c <cluster name or id>] [--force] sharename Specifies the name of the export for identification. -c,--cluster <cluster name or id> define cluster --force enforce operation without calling back the user

[Furby.storage.tucson.ibm.com]$ rmexport shared Do you really want to perform the operation (yes/no - default no): yes EFSSG0021I The export shared has been successfully removed.

Chapter 10. SONAS administration

389

10.5.8 Testing accessing the exports


In this section we explain how to access the shares. We look at how NFS and CIFS can be accessed by mounting the exports.

CIFS
CIFS export needs to be mounted before accessing. A CIFS share can be accessed using both Windows and UNIX machine. 1. Accessing CIFS using Windows: To mount a CIFS share from Windows, right-click My computers and click Map a Network Drive as shown in Figure 10-91.

Figure 10-91 Mapping a drive on windows to access CIFS share

A new window opens that asks you to enter the Drive and path details. Choose a drive letter from the drop-down list. Enter the path for the share you want to access in the following format: \\cluster_name\sharename where cluster_name is the name of the cluster you want to access and sharename is the name of the share that you want to mount.

390

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In our example as seen in Figure 10-92, cluster_name, we specify as IP: 9.11.137.219 and sharename is shared. We mount the share on the X drive.

Figure 10-92 Choose Drive letter and path to mount

Click the different user name link on the previous window and enter the Windows user name and password. This user must have access or ACLs set to access this share. In our example, the user is: STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-93.

Figure 10-93 Adding user name and Password to access the share

Click Finish. The share should be mounted successfully. You can then access the share by accessing My Computer and the X drive which you just mounted.

Chapter 10. SONAS administration

391

Double-click the Drive and you will be able to see the contents of the share as shown in Figure 10-94.

Figure 10-94 Data seen from mounted share

2. Accessing CIFS using UNIX: Mount the CIFS share using the mount.cifs command as shown in Figure 10-95. In our example, we use the client, sonaspb44 which is a Linux client. We create a directory, cifs_export, in the /mnt directory, where we mount the share. The cluster is Furby.storage.tucson.ibm.com and share is shared. The user we have used to access is STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-95.

Figure 10-95 Command to mount and access the data from UNIX

392

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

NFS
NFS shares are to be mounted too for accessing data. Following we show how to mount on UNIX clients. In our example, we use the Linux client sonaspb44 and have created a directory, nfs_export in the /mnt directory, where we mount the NFS export. The cluster is Furby.storage.tucson.ibm.com and share is shared. See Figure 10-96.

Figure 10-96 NFS share mount

FTP
FTP shares can be accessed by both Windows and UNIX. Use the ftp command to access the export. You can also use external FTP client applications on windows to access the share. Next we explain access from both Windows and UNIX. 1. Accessing FTP from Windows: You can use any FTP client to access data from the FTP export. We use the command prompt to display the same. In our example, cluster is Furby.storage.tucson.ibm.com and share is shared. See Figure 10-97. On running, FTP, you are prompted to enter the user and password. In this example, the user is STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-97. You then need to run a cd at the FTP prompt to the sharename which you want to access. As shown next, we do an ftp> cd shared to access the FTP export, shared.

Figure 10-97 Accessing FTP share from Windows

Chapter 10. SONAS administration

393

2. Accessing FTP from UNIX: You can access the FTP data by running the FTP command from the UNIX client. In our example, the cluster is Furby.storage.tucson.ibm.com and share is shared. We use a Linux client sonaspb44. On running FTP, you are prompted to enter the user and password. In this example, the user is STORAGE3\\eebenall belonging to the domain STORAGE3. See Figure 10-98. You then need to run a cd at the FTP prompt to the sharename which you want to access. As shown next, we do an ftp> cd shared to access the FTP export, shared.

Figure 10-98 Accessing the FTP share from the Linux Client

10.6 Disk management


Each of the disks that exists in the SONAS appliance can be managed. You can view the status of the disks and also perform actions such as Suspend and Resume. You can also Start disks.

10.6.1 List Disks and View Status


Using the GUI: Click the Disks link under the Storage category. This will display a table with all the disks and the information about each. You can see the name, filesystem it is attached to, usage details, failure group, storage pool, and more. See Disks in Storage on page 336.

394

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Using the CLI: You can list the disks in the cluster using the CLI command lsdisk. This command lists the existing disks along with the information such as the file system it is attached to, the failure group, storage pool, type of disk, many more. The command usage and output are shown in Example 10-29.
Example 10-29 Command usage and help to list the disks in the cluster

[Furby.storage.tucson.ibm.com]$ lsdisk --help usage: lsdisk [-c <cluster name or id>] [-d <arg>] [-r] [-v] [-Y] -c,--cluster <cluster name or id> define cluster -d,--device <arg> define device -r,--refresh refresh list -v,--verbose extra columns -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsdisk Name File system Failure group Type gpfs1nsd gpfs0 4004 dataAndMetadata gpfs2nsd gpfs0 4004 dataAndMetadata gpfs3nsd gpfs0 4004 dataAndMetadata gpfs4nsd gpfs1 4004 dataAndMetadata gpfs5nsd 4004

Pool system system system system system

Status ready ready ready ready ready

Availability up up up up

Timestamp 4/28/10 3:03 4/28/10 3:03 4/28/10 3:03 4/28/10 3:03 4/28/10 4:42

AM AM AM AM AM

Suspending disks
Using the GUI: You can suspend disks using the Suspend button. Select the disk you want to suspend and click the Suspend button. The operation opens a new window asking for your confirmation before suspending the disk. See Figure 10-99.

Figure 10-99 Confirmation before suspending disk

Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-100. The new disk will be successfully removed. Click the Close button to close the window.

Figure 10-100 Task Progress bar for completion

Chapter 10. SONAS administration

395

When suspended, the disk appears in the table with status Suspended as shown in Figure 10-101.

Figure 10-101 Panel shows disk is suspended

Using the CLI: There is no CLI command as of now to suspend a disk.

Resuming disks
Using the GUI: You can Resume disks using the Resume button. Select the suspended disk you want to resume and click the Resume button. The operation opens a new window which asks for confirmation. See Figure 10-102.

Figure 10-102 Confirmation before resuming a suspended disk

Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-103. The new disk will be successfully removed. Click the Close button to close the window.

Figure 10-103 Task Progress bar for completion

The disk that was suspended before will be have status ready as shown in Figure 10-104 on page 397.

396

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-104 Panel shows that the disk has been successfully resumed

Using the CLI: There is no CLI command to resume a node as of now.

10.6.2 Changing properties of disks


Using the GUI: As of now there is no way to change the properties of a disk using the GUI. Using the CLI: You can change the properties of a disk using the CLI command chdisk. The properties that you can modify for a disk is the Failure Group, Storage Pool and the Usage Type. Usage of the command is shown in Example 10-30.
Example 10-30 Command usage for changing properties of a disk

[Furby.storage.tucson.ibm.com]$ chdisk --help usage: chdisk disks [-c <cluster name or id>] [--failuregroup <failuregroup>] [--pool <pool>] [--usagetype <usagetype>] disks The name of the device -c,--cluster <cluster name or id> define cluster --failuregroup <failuregroup> failure group --pool <pool> pool name --usagetype <usagetype> usage type Each of parameters that can be changed is explained in detail. 1. Failure Group: You can change the Failure Group of a disk by using the option --failuregroup along with the command chdisk. 2. Storage Pool: You can change the Failure Group of a disk by using the option --storagepool along with the command chdisk. 3. Usage Type: You can change the Failure Group of a disk by using the option --usagetype along with the command chdisk. In Example 10-31 we change each of the parameters for one of the disks array1_sata_60001ff0732f85f8c0b000b. The example also shows the state of the disk before changing and the disk whose information is changed in bold.
Example 10-31 Command output for CLI command lsdisk and using chdisk to change failure group of disk [Furby.storage.tucson.ibm.com]$ lsdisk Name File system Failure group Type Timestamp Pool Status Availability

Chapter 10. SONAS administration

397

array0_sata_60001ff0732f8548c000000 4/26/10 3:03 AM array0_sata_60001ff0732f8568c020002 4/26/10 3:03 AM array0_sata_60001ff0732f8588c040004 4/26/10 3:03 AM array0_sata_60001ff0732f85a8c060006 4/26/10 3:03 AM array1_sata_60001ff0732f8558c010001 4/26/10 3:03 AM array1_sata_60001ff0732f8578c030003 4/26/10 3:03 AM array1_sata_60001ff0732f8598c050005 4/26/10 3:03 AM array1_sata_60001ff0732f85d8c090009 4/26/10 3:03 AM array0_sata_60001ff0732f85e8c0a000a 4/26/10 3:03 AM array1_sata_60001ff0732f8608c0f000c 4/26/10 3:03 AM array0_sata_60001ff0732f85c8c080008 4/23/10 10:00 AM array1_sata_60001ff0732f85f8c0b000b 4/24/10 3:05 AM

gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 gpfs0 tms0 tms0

1 1 1 1 2 2 2 2 1 2 1 2

dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system dataAndMetadata system

ready up ready up ready up ready up ready up ready up ready up ready up ready up ready up ready

dataAndMetadata newpool ready

[Furby.storage.tucson.ibm.com]$ chdisk array1_sata_60001ff0732f85f8c0b000b --failuregroup 200 --pool newpool --usagetype descOnly EFSSG0122I The disk(s) are changed successfully! [Furby.storage.tucson.ibm.com]$ lsdisk Name File system Failure group Type Pool Status Timestamp array0_sata_60001ff0732f8548c000000 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f8568c020002 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f8588c040004 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f85a8c060006 gpfs0 1 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8558c010001 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8578c030003 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8598c050005 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f85d8c090009 gpfs0 2 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f85e8c0a000a tms0 1 dataAndMetadata system ready 4/26/10 3:03 AM array1_sata_60001ff0732f8608c0f000c tms0 2 dataAndMetadata system ready 4/26/10 3:03 AM array0_sata_60001ff0732f85c8c080008 1 dataAndMetadata system ready 4/23/10 10:00 AM array1_sata_60001ff0732f85f8c0b000b 200 descOnly newpool ready 4/28/10 10:14 PM

Availability up up up up up up up up up up

398

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

10.6.3 Starting disks


Using the GUI: Select the disk you want to start. Click the Start button. Using the CLI: There is no CLI command as of now to start the disk.

10.6.4 Removing disks


Using the GUI: Select the disk you want to remove. Click the Remove button. Using the CLI: There is no CLI command as of now to remove the disk.

10.7 User management


Users who can access the SONAS appliance can be of two types. The first type is for the system administrator who needs to manage the SONAS system and the second type is for end users who will be accessing, reading, and writing the data in the file systems. In this section, we look at each type in detail.

10.7.1 SONAS administrator


The SONAS administrator is the user who has all administrative rights to carry out operations on the SONAS appliance. The administrator manages the cluster nodes, storage, file systems, exports and also sets up the monitoring and thresholds required to be set up and can view the status or health of the whole cluster. The SONAS administrator can further be a CLI user or the GUI user. User roles are currently defined only for GUI users. As of now all CLI users will have the rights to perform all commands. Now let us look at the details: 1. SONAS CLI user: A SONAS CLI user is created using the SONAS CLI command mkuser. This user is a special user with a restricted bash shell. The user can run only selected UNIX commands: grep, initnode, man, more, sed, sort, cut, head, less, tail, uniq. All the other commands that the administrator can run are the SONAS CLI commands. For the list of commands, run help at the command prompt after you are logged in as a CLI user. The output is as shown in Example 10-32.
Example 10-32 Commands that a SONAS user can execute.

[Furby.storage.tucson.ibm.com]$ cli help Known commands: addcluster Adds an existing cluster to the management. addnode Adds a new cluster node. attachnw Attach a given network to a given interface of a network group. backupmanagementnodeBackup the managament node cfgad configures AD server into the already installed CTDB/SMABA cluster.Previously configured authentication server settings will be erased cfgbackupfs Configure file system to TSM server association cfgcluster Creates the initial cluster configuration cfghsm Configure HSM on each client facing node cfgldap configure LDAP server against an existing preconfigured cluster. cfgnt4 configure NT4 server against an existing preconfigured cluster. cfgsfu Configures user mapping service for already configured AD cfgtsmnode Configure tsm node.
Chapter 10. SONAS administration

399

chavailnode Change an available node. chcurrnode Changes current node chdisk Change a disk. chexport Modifies the protocols and their settings of an existing export. chfs Changes a new filesystem. chfset Change a fileset. chkauth Check authentication settings of a cluster. chkpolicy validates placement rules or get details of management rules of a policy on a specified cluster for specified device chnw Change a Network Configuration for a sub-net and assign multiple IP addresses and routes chnwgroup Adds or removes nodes to/from a given network group. chservice Change the configuration of a protocol service chuser Modifies settings of an existing user. confrepl Configure asynchronous replication. dblservice stop services for an existing preconfigured server. detachnw Detach a given network from a given interface of a network group. eblservice start services for an existing preconfigured server. enablelicense Enable the license agreement flag initnode Shutdown or reboot a node linkfset Links a fileset lsauth List authentication settings of a cluster. lsavailnode List available nodes. lsbackup List information about backup runs lsbackupfs List file system to tsm server and backup node associations lscfg Displays the current configuration data for a GPFS cluster. lscluster Lists the information of all managed clusters. lscurrnode List current nodes. lsdisk Lists all discs. lsexport Lists all exports. lsfs Lists all filesystems on a given device in a cluster. lsfset Lists all filesets for a given device in a cluster. lshist Lists system utilization values lshsm Lists configured hsm file systems cluster lslog Lists all log entries for a cluster. lsnode Lists all Nodes. lsnw List all public network configurations for the current cluster lsnwdns List all DNS configurations for the current cluster lsnwgroup List all network group configurations for the current cluster lsnwinterface List all network interfaces lsnwnatgateway List all NAT gateway configurations for the current cluster lsnwntp List all NTP configurations for the current cluster lspolicy Lists all policies lspool Lists all pools. lsquota Lists all quotas. lsrepl List result of the asynchronous replications. lsservice Lists services lssnapshot Lists all snapshots. lstask Lists all (background) tasks for the management node. lstsmnode Lists defined tsm nodes in the cluster lsuser Lists all users of this mangement node. mkavailnode Add an available node to the database. mkcurrnode Makes current node mkexport Creates a new export using one or more protocols. mkfs Creates a new filesystem.

400

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

mkfset Creates a fileset mknw Create a new Network Configuration for a sub-net and assign multiple IP addresses and routes mknwbond Makes a network bond from slave interfaces mknwgroup Create a group of nodes to which a network configuration can be attached. See also the commands mknw and attachnw. mknwnatgateway Makes a CTDB NAT gateway mkpolicy Makes a new policy into database mkpolicyrule Appends a rule to already existing policy mkservice Configure services mksnapshot creates a snapshot from a filesystem mktask Schedule a prefedined task for mkuser Creates a new user for this management node. mountfs Mount a filesystem. querybackup Query backup summary restripefs Rebalances or restores the replication of all files in a file system. resumenode Resumes an interface node. rmbackupfs Remove file system to TSM server association rmcluster Removes the cluster from the management (will not delete cluster). rmexport Removes the given export. rmfs Removes the given filesystem. rmfset Removes a fileset rmlog Removes all log entries from database rmnode Removes a node from the cluster. rmnw Remove an existing public network configuration rmnwbond Deletes a regular bond interface. rmnwgroup Remove an existing group of nodes. A maybe attached public network configuration must be detached in advance rmnwnatgateway Unconfigures a CTDB NAT gateway. rmpolicy Removes a policy and all the rules belonging to it rmpolicyrule Removes one or more rules from given policy rmsnapshot Removes a filesystem snapshot rmtask Removes the given scheduled task. rmtsmnode Remove TSM server stanza for node rmuser Removes the user from the management node. rpldisk Replaces current NSD of a filesystem with a free NSD runpolicy Migrates/deletes already existing files on the GPFS file system based on the rules in policy provided setnwdns Sets nameservers setnwntp Sets NTP servers setpolicy sets placement policy rules of a given policy on cluster passed by user. setquota Sets the quota settings. showbackuperrors Shows errors of a backup session showbackuplog Shows the log of the recent backup session. showrestoreerrors Shows errors of a restore session showrestorelog Shows the log of the recent restore session. startbackup Start backup process startreconcile Start reconcile process startrepl Start asynchronous replication. startrestore Start restore process stopbackup Stops a running TSM backup session stoprepl Stop asynchronous replication. stoprestore Stops a running TSM restore session suspendnode Suspends an interface node. unlinkfset Unlink a fileset.

Chapter 10. SONAS administration

401

unmountfs

Unmount a filesystem.

Plus the UNIX commands: grep, initnode, man, more, sed, startmgtsrv, stopmgtsrv, sort, cut, head, less, tail, uniq For additional help on a specific command use 'man command'. To get more help on each of the commands, the administrator can check the manpage by running man <command_name> or <command_name> --help as shown. In the example the command mkuser is used. As mentioned previously, the CLI user as of now has no roles defined. As of now, a CLI user can run all the administrative commands to manage the cluster, storage, filesystems and exports. The administrator can also look into the logs and utilizations charts for information about the health of the cluster and its components. 2. SONAS GUI user: The SONAS GUI user must be added into the GUI by the root user. After an install, the root user is automatically added into the GUI. Log in as the root user and password. Click the link Console User Authority link under the Settings category of the GUI. This will open up a page on the left, which will have a table that lists all the GUI users who can access the GUI and their roles. See point 2 under Settings on page 344. The tasks that you can perform in that panel are, add a new user or remove a GUI user. You can do this using the Add and Remove buttons respectively, as explained next. Add user: Add a user by clicking the Add button. A new page asking for the user details will open. Type in the user name. This user should be an existing CLI user already created using the mkuser command before. You need also specify the role for the user. You can have different roles such as these: Administrator : This user will have all the administrator rights and can perform all the operations such as the CLI user. Storage Administrator: This user will have rights to manage the storage. Tasks to operate on the storage can all be done by this user. Operator: Operator would have only read access. This user can view the logs, health status and overall topology of the cluster. System administrator: This user can administer the system as a whole. Click OK when done. Figure 10-105 shows the panel to add a new user.

Figure 10-105 Panel to add user and user roles to the Users

After a user is added, the table will display the newly added user as shown in Figure 10-106.

402

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-106 Panel displaying the newly added user

Remove User: Select the user to delete and click the Remove button. The user will be successfully deleted from the GUI. CLI: Deleting a user from the GUI does not delete a user from the CLI. The CLI user still exists. Logout: A user selected can be logged out using the Logout button. Depending on the Role given to the user, the GUI user will have different access permissions and can perform different operations.

10.7.2 SONAS end users


The SONAS end users are the users who will be accessing the data stored in the file system. They can write data and also read data. Data from the cluster can be accessed by the end users only through the data exports. The protocols that SONAS supports currently is CIFS, FTP and NFS. To access the data using the protocols, the users need to authenticate. SONAS supports Windows AD authentication server and LDAP server. To learn more about integrating the authentication server into the SONAS appliance refer to SONAS authentication and authorization on page 91. NFS, an exception, does not need users to authenticate because it checks for authenticity of the client or hosts. The other protocols such as FTP and CIFS as of now require that the users authenticate. CIFS authenticates with the Windows AD server while FTP will work for both Windows AD users and LDAP.

Authentication is the process of verifying the identity of the user. Users confirms that they are indeed the users they are claiming to be. This is typically accomplished by verifying the user ID and password from the authentication server. Authorization is the process of determining if the users are allowed to access. The users might have permissions to access certain files but might not have permissions to access others. This is typically done by ACLs.
The fIle system ACLs supported in the current SONAS are GPFS ACLs, which are NFSV4 ACLs. The directories and exports need to be given the right ACLs for the users to be able to access. As of now you can give the owner the rights or permissions to an export, by specifying the owner option while creating one from both the GUI or CLI. If you want to give other users access, you need to modify the ACL file in GPFS for the directory or export using the GPFS mmeditacl command. You can view ACLs by using the GPFS mmgetacl command.

Chapter 10. SONAS administration

403

ACLs: Right now, you need to use the GPFS command to view or edit ACLs. This command requires root access. Example 10-33 shows how you can provide ACLs to a directory or export.
Example 10-33 Viewing current ACLs for an export using GPFS command mmgetacl

export EDITOR=/bin/vi $ mmgetacl /ibm/gpfs0/Sales #NFSv4 ACL #owner:root #group:root special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Example 10-34 adds ACLs for another user. Consider in the example, we are giving Read-Write access to the Windows AD user David, for an already existing export named Sales in the /ibm/gpfs0 filesystem.
Example 10-34 Adding ACL for giving user DAVID access to the export

$ mmeditacl /ibm/gpfs0/Sales #NFSv4 ACL #owner:root #group:root special:owner@:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED user:STORAGE3\david:rwxc:allow (X)READ/LIST (X)WRITE/CREATE (X)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-READ_NAMED (-)DELETE (X)DELETE_CHILD (X)CHOWN (X)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r-x-:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (X)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED Save the file and when you quit, click yes when asked to confirm the ACLs. The new ACLs will then be written for the user and the export.

404

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Depending on the users you want to give access, you can add them in the ACLs file. You can also give group access in a similar way as before and add users to the group.

10.8 Services Management


In this section we discuss the Management Service function and administration.

10.8.1 Management Service administration


The Management service takes care of both the GUI interface of SONAS as well as the CLI interface of SONAS. For working on any of these interfaces it is required that the Management Service be running. 1. Stop Management Service: Using the GUI: As of now, the Management Service cannot be stopped from the GUI. Using the CLI: You can stop the Management Service by running the stopmgtsrv CLI command. Command usage and output is shown in Example 10-35.
Example 10-35 Command usage and output for CLI command stopmgtsrv to stop Management or CLI service

[Furby.storage.tucson.ibm.com]$ stopmgtsrv --help usage: stopmgtsrv stop the management service [Furby.storage.tucson.ibm.com]$ stopmgtsrv EFSSG0008I Stop of management service initiated by root 2. Start Management Service: Using the GUI: You cannot start the Management Service using the GUI. Using the CLI: You can start the Management Service using the startmgtsrv CLI command. Command usage and output is shown in Example 10-36.
Example 10-36 Command usage and output for starting Management or CLI service

[Furby.storage.tucson.ibm.com]$ startmgtsrv --help usage: startmgtsrv [-f | --force] start the management service -f, --force restart gui if already running [Furby.storage.tucson.ibm.com]$ startmgtsrv EFSSG0007I Start of management service initiated by root After the service has started, you can verify by running the CLI help command on the CLI or access the GUI. CLI help should display all the commands that are available for the CLI user. The GUI should prompt for user ID and password.

Chapter 10. SONAS administration

405

If you are unable to access the GUI or CLI commands, either restart the Management service using the command startmgtsrv with the --force option. This restarts the Management Service. Command output is as shown in Example 10-37.
Example 10-37 CLI command starmgtsrv to start CLI and Management service forcefully

[Furby.storage.tucson.ibm.com]$ startmgtsrv --force EFSSG0008I Stop of management service initiated by root EFSSG0007I Start of management service initiated by root

10.8.2 Managing services on the cluster


The services that are running on SONAS appliance are CIFS, FTP, HTTP, NFS and SCP. These services are needed for clients to access the SONAS data exports. All these services would be already configured during the configuration of the SONAS appliance using the fgad and cfgldap commands. You can view the status of the services configured. You can also enable and disable them. They need to be configured for you to carry out any operations on them. Next we discuss each task that can be carried out on the services. 1. List the Service Status: Using the GUI: You can view the services that are active from the GUI by checking the Services tab under the Clusters section of the category Cluster. You cannot disable or enable any service. See point 1.f under Clusters on page 320 for more information. Using the CLI: You can list the services using the lservice CLI command. This command lists all the services, its state and also if it is configured or not. The command usage and output are shown in Example 10-38.
Example 10-38 Example for usage and command output for CLI command lsservice

[Furby.storage.tucson.ibm.com]$ lsservice --help usage: lsservice [-c <cluster name or id>] [-r] [-Y] -c,--cluster <cluster name or id> define cluster -r,--refresh refresh list -Y format output as delimited text [Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol yes yes HTTP HTTP protocol yes yes NFS NFS protocol yes yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes In the example, you can see that all the services are configured. This means, all the configuration files for the services are up to date on each node of the cluster. Under the column Is Active, you can see if the service is active or inactive. Active denotes that the service is up and running. Exports can be accessed using that service. Users or clients can access the data exported using that protocol or service. Inactive means that the service is not running and hence all data connections will break.

406

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

2. Enable Service: Using the GUI: You cannot enable service using the GUI. Using the CLI: You can enable service using the eblservice CLI command. The command usage and output is as in Example 10-39. To enable service, you need to pass the clustername or clusterid as mandatory parameters and also names of services as a comma separated list, which you want to enable. To enable all, you can pass all. The command asks for confirmation. You can also use the --force option to force the operation and override the confirmation. In our example, we have the services FTP and NFS disabled. We enable them using the eblservice command.
Example 10-39 Example showing usage and command output for CLI command eblservice

[Furby.storage.tucson.ibm.com]$ eblservice --help usage: eblservice -c <cluster name or id> [--force] [-s <services>] -c,--cluster <cluster name or id> define cluster --force enforce operation without prompting for confirmation -s,--services <services> services

[Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol no yes HTTP HTTP protocol yes yes NFS NFS protocol no yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes [Furby.storage.tucson.ibm.com]$ eblservice -c st002.vsofs1.com -s ftp,nfs --force [Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol yes yes HTTP HTTP protocol yes yes NFS NFS protocol yes yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes

3. Disable service: Using the GUI: You cannot enable service using the GUI. Using the CLI: You can disable service using the CLI dblservice command. The command usage and output is as in Example 10-40. To disable service, you need to pass the names of services, as a comma separated list, which you want to disable. To disable all, you can pass all. The command asks for your confirmation. You can also skip the confirmation by using the --force option which forces the disabling of service. You can also confirm using the command lsservice as shown in Example 10-40. CIFS and SCP need to be always running. CIFS is required to be running for CTDB to be healthy. SCP is the SSH service and cannot be stopped as all the internal communication between the nodes is done using SSH. In case you pass CIFS and SCP, they will not be stopped and a warning message is issued. The other services will be stopped.

Chapter 10. SONAS administration

407

Example 10-40 Usage for CLI command dblservice and output when disabling FTP only

[Furby.storage.tucson.ibm.com]$ dblservice --help usage: dblservice [-c <cluster name or id>] [--force] -s <services> -c,--cluster <cluster name or id> define cluster --force enforce operation without prompting for confirmation -s,--services <services> services

[Furby.storage.tucson.ibm.com]$ dblservice -s ftp Warning: Proceeding with this operation results in a temporary interruption of file services Do you really want to perform the operation (yes/no - default no): yes EFSSG0192I The FTP service is stopping! EFSSG0194I The FTP service is stopped! [Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol no yes HTTP HTTP protocol yes yes NFS NFS protocol yes yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes The second Example 10-41 shows disabling all the services with the --force option. You can also see the warning message for CIFS and SCP in this case.
Example 10-41 Example where all services are disabled - CIFS and SCP show warning message

[Furby.storage.tucson.ibm.com]$ dblservice -s all --force EFSSG0192I The NFS service is stopping! EFSSG0192I The HTTP service is stopping! EFSSG0193C Disable SCP services failed. Cause: Never stop scp/sshd service. We didn't stop scp/sshd service but other passed services were stopped. EFSSG0192I The FTP service is stopping! EFSSG0193C Disable CIFS services failed. Cause: Never stop cifs service. We didn't stop cifs service but other passed services were stopped. EFSSG0109C Disable services failed on cluster st002.vsofs1.com. Cause: SCP : Never stop scp/sshd service. We didn't stop scp/sshd service but other passed services were stopped.CIFS : Never stop cifs service. We didn't stop cifs service but other passed services were stopped.

[Furby.storage.tucson.ibm.com]$ lsservice Name Description Is active Is configured FTP FTP protocol no yes HTTP HTTP protocol no yes NFS NFS protocol no yes CIFS CIFS protocol yes yes SCP SCP protocol yes yes

4. Change service configuration: Using the GUI: You can change the configuration for each configured service using the GUI. As seen in point 1.f under Clusters on page 320 you can see the table containing list of services. Each of these services are a link you can click, upon which a new window opens allowing you to change different configuration parameters for the service.

408

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

FTP: A new window as in Figure 10-107 shows the various parameters that you can change for the FTP configuration. Click Apply when done. The new configuration data is written into the CTDB registry and also the FTP configuration files on each node.

Figure 10-107 FTP configuration parameters

HTTP: HTTP requires you install a HTTP certificate. When you click the HTTP link, you see a window as in Figure 10-108. You can install an existing certificate or generate a new one.

Figure 10-108 HTTP Configuration panel Chapter 10. SONAS administration

409

Upload an Existing Certificate: You can upload an existing certificate which is a .crt or .key file. Click the Upload Certificate button to upload a new certificate. A new window as shown in Figure 10-109 opens up. It asks you for the path for the certificate. Click Browse and search for the certificate file. Click the Upload button to upload the file. This window then closes. Click the Install Certificate button as shown in Figure 10-108.

Figure 10-109 Uploading the certificate file

Generate a New Certificate: To generate a new certificate fill out all the text boxes as shown in Figure 10-108. Click the Generate and Install certificate button. This will generate a new certificate and installs it.

NFS: NFS as of now does not have any configuration parameters to modify. CIFS: As shown in Figure 10-110, you can see the different parameters that you can change for CIFS. As you can see in the figure, you can change some common parameters and also some Advanced Options. You can Add, Modify, or Remove advanced parameters using the respective buttons in the panel. Click the Apply button when done. The configuration will be successfully written on all nodes.

Figure 10-110 CIFS configuration parameters

SCP: When you select the SCP protocol by clicking its link, a new window opens. Figure 10-111 shows the different parameters you can modify for SCP service or protocol. SCP protocol also provides SFTP method for data access. You can allow or disallow SFTP by using the check box for it. Click Apply to apply the changes.

410

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-111 SCP Configuration Details

Using the CLI: At this time, you cannot change the configuration parameters from the SONAS CLI.

10.9 Real-time and historical reporting


The SONAS GUI allows for real time and historical reporting. You can generate Reports and Charts to display the cluster utilization. This can be both the System Utilization or File System Utilization. As seen in Performance and reports on page 338, the GUI has panels for both System and File System Utilization. You can also generate emails to be sent to administrator to report an event or threshold. You can set this up using the GUI. This feature is not available in the CLI. Reporting and generating charts are explained in the following sections.

10.9.1 System utilization


You can use this section for generating reports. To access this feature, click the Performance and Reports category to expand the options. Click the System Utilization link. The System utilization table displays information about the nodes in the cluster. Select the nodes you want to generate charts for. Choose the Measurement variable from the drop-down menu. This is a list of system variables that you can measure utilization for. Some of them include, CPU Usage, Memory Usage, Network usage and errors, Disk I/O and usage. See Figure 10-112.

Figure 10-112 Measurement Variable for System Utilization charts

Select the Measurement duration from the drop-down menu. This list allows you to select a duration of time for which you want to measure the utilization of the system. You can choose durations such as, Daily, Weekly, Monthly, 6 monthly, 18 monthly as shown in Figure 10-113.

Chapter 10. SONAS administration

411

Figure 10-113 Measurement Duration for System Utilization charts

After you are done selecting the Node variable whose Utilization you want to check and duration of check, click the Generate Charts button, and the chart is generated. The figures show two examples: one displaying charts of Daily - Memory Usage for the Management Node (Figure 10-114) and the other Weekly - Disk I/O for Interface Node 2 (Figure 10-115).

Figure 10-114 Daily memory Utilization charts for Management Node

412

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-115 Weekly Disk I/O Utilization charts for Interface Node 2

The previous examples only show some of the available options. You can also generate charts for all nodes or select nodes.

10.9.2 File System utilization


You can use this section for generating reports. To access this feature, click the Performance and Reports category to expand the options. Click the System Utilization link. The File System utilization table displays information about the filesystem in the cluster. Figure 10-35 on page 339 shows the panel for the File System Utilization. Select filesystem whose usage charts you want to generate. Select the duration from the drop-down menu. This list allows you to select a duration of time for which you want to measure the utilization of the system. You can choose durations such as Daily, Weekly, Monthly, 6 monthly, 18 monthly and more as shown in Figure 10-116.

Chapter 10. SONAS administration

413

Figure 10-116 Duration period for File System Utilization charts

After you are done with selecting the Node and duration, click Generate Charts button. The chart is generated. In our figures we have just a single filesystem. We generate charts for Daily - File System Usage (Figure 10-117) and the other Weekly - File system (Figure 10-118).

Figure 10-117 Weekly Filesystem Utilization charts for gpfs0

414

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-118 Daily File System Utilization Charts for gpfs0

10.9.3 Utilization Thresholds and Notification


Click the Utilization Thresholds link under the category SONAS Console Settings in the Management GUI. This pane displays all thresholds for utilization monitoring. See Figure 10-36 on page 340. The table displays all the threshold details that have been added. You can add threshold by clicking the Add Threshold button. A new window opens as shown in Figure 10-119. Enter the details of the threshold you want to add. You can choose from the drop-down menu the variable you want to monitor such as CPU, Filesystem, GPFS, Memory and Network usage.

Figure 10-119 Add new utilization thresholds details panel

Chapter 10. SONAS administration

415

Choose the warning, error level and reoccurrences you want to track as shown in Figure 10-120.

Figure 10-120 Utilization thresholds panel parameters

When done, click OK. A new threshold will be added to the list. You need to configure the recipients in order to receive email notifications. Click the link Notification Settings under the SONAS Console settings in the left pane in the Management GUI.

10.10 Scheduling tasks in SONAS


SONAS allows you to schedule some tasks to be performed without any manual intervention. There are some GUI tasks that can be scheduled and some CRON tasks that can be scheduled. There is a fixed list of tasks that you can schedule as of now on the SONAS appliance. You need to create a task which schedules a predefined task for the management node. A predefined task can be a GUI task or a cron task. GUI tasks can be scheduled only one time and only run on the management node. Whereas cron tasks can be scheduled multiple times and for the different clusters managed by the management node. Cron tasks are predefined to run either on all nodes of the selected cluster or on the recovery master node only. An error is returned to the caller if either of the following conditions is met: 1. An already scheduled GUI task is scheduled for another time. 2. A task with the denoted name does not exist. There are many operations you can perform on the tasks using the GUI or the CLI that we discuss in this section.

10.10.1 Listing tasks


In this section we discuss listing tasks using both the GUI and CLI. Using the GUI: You can lists the tasks that are defined already. Click the Scheduled Tasks links under SONAS Console Settings category. This will display all the tasks that are already added to the cluster. These are the predefined tasks. The GUI panel for listing is previously described under SONAS console settings on page 339. The tasks that are executed on the GUI run only on the Management Nodes. The Cron tasks can be run on one or all nodes of the cluster.

416

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Using the CLI: The tasks can be scheduled using the CLI command mktask. The command takes some input values such as cluster name, seconds, minutes, hours and other time values for the task to run. Along with this, there is an option called parameter. This parameter is optional and only valid for a cron task. The GUI tasks currently do not have any parameters. An error is returned to the caller if this option is denoted for a GUI task. The parameter variable is a space separated parameter. The command usage and output for adding both GUI and CRON is as shown in Example 10-42. The following CRON tasks are available in SONAS: MkSnapshotCron: Parameter: The cron job expects two parameters in the following order: clusterName: The name of the cluster the file system belongs to filesystem: The file system description (for example, /gpfs/office) StartReplCron: Parameter: The cron job expects two parameters in the following order: source_path: The directory that shall be replicated target_path: The directory to which the data shall be copied StartBackupTSM: Parameter: The cron job expects one parameter clusterName: The cluster of the file systems which must be backed up StartReconcileHSM: Parameter: The cron job expects three parameters in the following order: clusterName: The cluster of the file systems which must be backed up filesystem: The file system to be reconciled node: The node on which the file system is to be reconciled BackupTDB: Parameter: The cron job expects one parameter target_path: The directory to which the backup shall be copied For more information about how to add these parameters for these CRON tasks, refer to the manpage for the command mktask. Example 10-42 shows the adding of the MkSnapshotCron task. This task is a CRON task. This task takes 2 parameters, Clustername and Filesystem name. For our example, we have clustername as Furby.storage.tucson.ib.com and filesystem as gpfs0. In the second example in Example 10-42, we add a task that is a GUI task.
Example 10-42 Command usage and output in adding CRON and GUI tasks using CLI command mktask

[Furby.storage.tucson.ibm.com]$ mktask --help usage: mktask name [-c <cluster name or id>] [--dayOfMonth <dayOfMonthdef>] [--dayOfWeek <dayOfWeekdef>] [--hour <hourdef>] [--minute <minutedef>] [--month <monthdef>] [-p <parameter>] [--second <seconddef>] name Specifies the name of the newly created task. -c,--cluster <cluster name or id> define cluster --dayOfMonth <dayOfMonthdef> define the scheduler option for the dayOfMonth --dayOfWeek <dayOfWeekdef> define the scheduler option for the dayOfWeek
Chapter 10. SONAS administration

417

--hour <hourdef> --minute <minutedef> --month <monthdef> -p,--parameter <parameter> --second <seconddef>

define define define denote define

the the the the the

scheduler scheduler scheduler parameter scheduler

option option option passed option

for the minute for the minute for the month to the scheduled cron task for the second

[Furby.storage.tucson.ibm.com]$ mktask MkSnapshotCron --parameter "Furby.storage.tucson.ibm.com gpfs0" --minute 10 --hour 2 --dayOfMonth */3 EFSSG0019I The task MkSnapshotCron has been successfully created.

[Furby.storage.tucson.ibm.com]$ mktask FTP_REFRESH --minute 2 --hour 5 --second 40 EFSSG0019I The task FTP_REFRESH has been successfully created.

10.10.2 Removing tasks


Using the GUI: You can remove the task, by selecting the task from the table of tasks and click the Remove button. The operation opens a new window asking for confirmation as seen in Figure 10-121.

Figure 10-121 Confirmation to remove the tasks

Click OK to confirm. The task progress bar shows you the progress of operation, and when successful, will have Green check marks (). If any error, the error message will be shown and the window will show a Red cross sign (x). If any error, check the logs, correct the problem and retry. See Figure 10-122. The new disk will be successfully removed. Click the Close button to close the window.

Figure 10-122 Task Progress bar for completion

Using the CLI: You can remove the task added using the CLI command rmtask. This command deletes the command from the list of tasks to be scheduled by the system. An error is returned to the caller if a task that does not exist is denoted. The command usage and output is shown in Example 10-43. In the first example, we delete a CRON task added, MkSnapshotCron and in the second example we delete the GUI task FTP_REFRESH.

418

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Example 10-43 rmtask CLI command example

[Furby.storage.tucson.ibm.com]$ rmtask --help usage: rmtask name [-c <cluster name or id>] name Specifies the name of the task for identification. -c,--cluster <cluster name or id> define cluster [Furby.storage.tucson.ibm.com]$ rmtask MkSnapshotCron EFSSG0021I The task MkSnapshotCron has been successfully removed. [Furby.storage.tucson.ibm.com]$ rmtask FTP_REFRESH EFSSG0021I The task FTP_REFRESH has been successfully removed.

10.10.3 Modifying the schedule tasks


Using the GUI: You can modify a task from the GUI. Select the task from the table of tasks, that you need to modify. Click it such that the details are displayed below the table. The details have some parameters that can be modified. Schedules for these tasks are modifiable. For CRON tasks, the parameters for each task is also modifiable. The following figures display the panel that allows you to modify the tasks details. Figure 10-123 shows the panel for the CRON tasks. In this example, the CRON task MkSnapshotCron is seen. Here you can modify both the Schedule for task and the Task parameters. Click Apply for the changes to be applied.

Figure 10-123 Panel to modify a CRON task- the Schedule and Task parameter can be modified

Chapter 10. SONAS administration

419

Figure 10-124 shows the GUI task. As you can see, the Schedule for the task can be modified. Click Apply when done to apply the changes. In this example, we have considered modifying the GUI task FTP_REFRESH.

Figure 10-124 Panel to modify the GUI task FTP_REFRESH

10.11 Health Center


From the GUI you can access the Health Summary located on the left panel. This Health Summary can provide you detailed information regarding any components inside your SONAS Storage Solution, this through topology, alert or system logs features as described in GUI tasks on page 314. In addition to the Call Home and SNMP traps features, this are the three components of the SONAS Health Center.

10.11.1 Topology
From the GUI, In the Health Summary section on the left panel, you can reach the topology feature. This will display a graphical representation of the varied SONAS architectural components. Indeed you will find there information regarding Management and Interface Nodes status, but also the public and data networks, Storage Pods, File System, and exports.

420

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Overview
The first view as shown in Figure 10-125, will give you a big picture of your system, for more details on a specific area you will have to expand this area.

Figure 10-125 SONAS Topology overview

In the Topology view, all components of your SONAS system are described. When moving your cursor over the selected component, you will see a tooltip as shown in Figure 10-125. Then, for more information about one of these components, click the appropriate link.

Topology layered displays and drilldown


The Topology view gives a quick cluster status overview by the Health Center back-end and the displayed data is retrieved from the SONAS Health Center back-end database. The topology web page heavily uses the Dojo toolkit to display and retrieve data and data is retrieved by AJAX calls in different intervals, depending on the selected view. The accuracy and level of detail of the displayed data is split into 3 layers, each going into greater detail.

Chapter 10. SONAS administration

421

Layer 1 shown in Figure 10-126 gives a short status overview of the main system components, and the view is updated every 5 seconds. You can click the Layer 1 icon to drill down to level 2 displays.

Component Status Summary (click to open Level 3 Status view) Component Title Component Icon Click to display Level 2 or Level 3 Details view Click to display Level 2 view

Component Details

Figure 10-126 Topology view at layer 1

Layer 2, shown in Figure 10-127, display details about the Interface Nodes and Storage building blocks and is updated every 5 seconds. Clicking a Layer 2 icon brings up the Layer 3 view.
Displays Interface Nodes based on logical internal name (i.e. 001 represents int001st001)

Click to open Level 3 view

Figure 10-127 Topology view at layer 2

422

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Layer 3, an example of which is shown in Figure 10-128, gives the deepest level of accuracy. Modal dialog windows are opened to display the details that are updated every 30 seconds, or can be refreshed manually by clicking the refresh icon.

Figure 10-128 Topology view at layer 3 example

All Interface, Management and Storage Node Details have the following same tabs: Hardware, Operating System, Network, NAS Services and status.

Interface Node
For instance, if you need more information regarding the interface nodes because of the warning message in the previous figure, click the Interface Nodes (6) link in that figure and you will see information as shown in Figure 10-129.

Figure 10-129 Interface Nodes overview

The new windows show you an overview of the Interface Nodes configuration of your SONAS Storage Solution. We can see here that the Warning message propagated in the global overview is actually not a warning message for a particular node, but for all of them. Here again in order to have more details for a given interface node, click the chosen target and you will see all information related as described in Figure 10-130.

Chapter 10. SONAS administration

423

10.11.2 Default Grid view


In all following section we describe the Graphical version dealing with icons, but you can request the listing view by clicking the List tab beneath the Interface Nodes icons. The Icon is part of the default Grid view. This new window provides you information about varied domains such as the hardware, the Operating System, the Network, the NAS services and the status in each corresponding tab. Figure 10-130 shows you information regarding the Hardware section, and in this section you can even have a finer granularity with Motherboard, CPU, FAN, HDD, Memory Modules, Power or Network Card tabs.

Figure 10-130 Interface Node Hardware information

In the next Operating System section, you will find details regarding the Computer System details, the Operating System Details, or the Local File System as shown in Figure 10-131.

Figure 10-131 Interface Node Operating System information

424

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

If you need information about the Network status on that particular Interface Node, then choose the Network section and you will find there all information regarding all network bonding interface configured on the selected Interface Node as shown in Figure 10-132.

Figure 10-132 Interface Node Network information

Similarly if you need input regarding the NAS services or the Interface Node status, then choose the appropriate tabs as described in Figure 10-133 and Figure 10-134.

Figure 10-133 Interface Node NAS Services information

Chapter 10. SONAS administration

425

Figure 10-134 Interface Node Status Message Information

The NAS Services section shows you all Exports such as CIFS, NFS, HTTP, FTP and SCP status, or service status such as CTDB or GPFS for instance; and the Status section gather all previous information section with more details. Whereas the three first sections are static, indeed you will find there only configuration information, the two last ones are dynamic and the warning icon seen on higher level, interface nodes level or topology level, refers only to the Status section (NAS services issues would also be included in Status section), to the first line with the degraded Level to be more precise. After this issue fixed, the warning icon will disappear.

Management Node
Back to the Topology overview, if you are more interested in Management Node information, then click the Management Node link and you will see the same windows and hierarchy as described sooner for the Interface Nodes. Exact same section and tab except the NAS Service section which does not exist anymore and is replaced by the Management Section as you can see in Figure 10-135.

Figure 10-135 Management Node Management information

426

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Interface Network
From the Topology overview, you can also get information from the Interface Network by clicking the Interface Network link. There you will find information regarding the Public IP addresses in use and the authentication method as described in Figure 10-136 and Figure 10-137.

Figure 10-136 Interface Network Public IPs information

Figure 10-137 Interface Network Authentication information

Data Network
Again, from the Topology overview, if you need to pick up some information regarding the data network, or InfiniBand network, then click the Data Network link and you will see something similar to Figure 10-138. In the first tab, Network, you will find information regarding State, IP dress or throughput for each InfiniBand connection filtered by Interface, Management, and Storage Nodes in left tabs. The second tab, Status, gathers information similar to the Status tab for each individual interface node in the Interface Node Topology.

Chapter 10. SONAS administration

427

Figure 10-138 Data Network information

Storage Building Block


The last hardware component of the Topology overview is the Storage Building Block. This part is the latest hardware component of the SONAS Storage Solution. The Storage Building Block is a list of Storage Pods used for your SONAS file system, on top of which you have built your SONAS shares. After you have selected the appropriate component, you will see a window similar to Figure 10-139. This represents the first (and only in this example) Storage Pod used.

Figure 10-139 Storage Pod overview

428

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

If you click the Storage Pod Icon, you will see another familiar window that enumerates all components of this Storage Pod, as described in Figure 10-140.

Figure 10-140 Storage Controller view in Storage Building Block

The First tab describes Storage Components of the Storage Pod. In our case we have a single Storage Controller, but you can have up to two Storage Controllers and two Storage Expansion units. The First tab of this Storage components view shows storage details, more precisely controller details in our example. The status tab shows you the same kind of details you might see in the status tab from the Interface Node in Figure 10-134 on page 426. We have shown information related to the Storage part of the Storage Pod. If you are looking for information related to Storage Node, you have a dedicated tab for both Storage Nodes inside the Storage Pod. If you click the Storage Node name tabs, you will find more detailed information as shown in Figure 10-141 and Figure 10-142.

Figure 10-141 Storage Node view in the Storage Building Block

Chapter 10. SONAS administration

429

Figure 10-142 Second Storage Node in the Storage Building Block

For these two Storage Nodes, you can find similar information we presented previously for Interface Nodes. The only difference is regarding the Storage tab where you might find information regarding the SONAS File System as shown in Figure 10-143.

Figure 10-143 SONAS File System information for each Storage Node

This Storage Building Block view, where you can find any information regarding Storage Pod used in your SONAS File System is the latest hardware component of the SONAS Storage Solution, but from the overview windows you might also find some information related to the File System and the exports shares.

430

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

File System
Indeed, from the overview window, you might request File System information by clicking the File system component. You will then see a window as shown in Figure 10-144.

Figure 10-144 SONAS File System information

The previous windows show you typical File System information such as the device name, the mount point, the size and available space left for instance. Each SONAS File System created will result of one entry in this table.

Shares
As for SONAS file System, you might request to have information about the shares you created from these file system. To have such information from the topology overview, click the appropriate component and you will see details as shown in Figure 10-145.

Figure 10-145 Shares information

In the previous windows you can see the status, the name, directory associated to your share, but more important the protocol from which SONAS user can access this share. In our example the share is accessible by FTP, NFS and CIFS. These two last components complete the Topology view from the Health Center. Following sections will describe System logs, Call Home and SNMP features.

Chapter 10. SONAS administration

431

10.11.3 Event logs


Event logs are basically composed by two kind of logs, Alert and System logs. The Red Hat Linux Operating system reports all internal information, event, issues or failures in a syslog file located in /var/log/messages. Each Interface and Storage Node has its own syslog file. In SONAS all nodes send their syslog files to the management node which consolidate all these files and display in the System Log which is available from the GUI. It is a raw display of these files with some filtering tools as shown in Figure 10-146. Each page displays around 50 logs. System logs are of three levels, Information (INFO), Warnings (WARNING) and Severe (SEVERE). You can filter the logs by the log level, component, host and more.

Figure 10-146 System Logs window

The Alert log panel displays specific information warning and critical events from the syslog and displays them in a summarized view. As SONAS administrator you should have a look first at this log when looking for problems. Each page has around 50 logs displayed, one per event which can be an Info, Warning or Critical message, they are displayed in Blue, Yellow and Red respectively. You can filter logs in the table depending on the severity, time period of logs and source. Source of logs is the host on which the event occurred on as shown in Figure 10-147.

432

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 10-147 Alert Logs window

The System Log panel displays system log events that are generated by the SONAS Software, which includes management console messages, system utilization incidents, status changes and syslog events. Figure 10-146 on page 432 shows how the System log panel in the GUI looks.

10.12 Call home


The SONAS Storage Solution has been designed in order to provide you with full support. We have described how to use the SONAS GUI and find information in the Topology overview or directly from the Event Logs. Actually each SONAS hardware component has at least one Error Detection Code method. These method can be the Denali code which is a Director API module for performing Interface, Storage and Management Nodes checking and monitoring. Or it can be the System Checkout code, based on tape products, which monitors components such as InfiniBand switches, Ethernet switches, Fibre Channel connection or Storage Controller. Last is the SNMP mechanism, used inside SONAS only, and monitors every component, server, switch and Storage Controller.

Chapter 10. SONAS administration

433

The Denali method use CIM providers which are also used by the System Checkout method, whereas SNMP traps are converted also into CIM providers. All these methods provide inputs to the GUI Health Center Event Log as described in the previous section. Depending on the severity of this issue, it can raise an Electronic Customer Care (ECC) Call Home. The Call Home feature has been designed to start first with hardware events based on unique error codes. This Call Home feature is configured as part of first time installation. It is used to send hardware events to IBM support. Call Homes are based only on Denali and System Checkout errors, but SNMP traps do not initiate Call Home. The valid machine models that will call home are: 2851-SI1 Interface Nodes 2851-SM1 Management Nodes 2851-SS1 Storage Nodes 2851-DR1Storage Controller 2851-I36 36 Port InfiniBand Switch 2851-I96 96 Port InfiniBand Switch Note that there will be no Call Homes against a 2851-DE1 Storage Expansion unit, because any errors from it will call home against its parent 2851-DR1 Storage Controller Unit. Similarly, any errors against the Ethernet switches will call home against the 2851-SM1 management node. Figure 10-148 shows an example of Call Home, which initiates a Error ID-based Call Home using a 8-character hex value as defined in the RAS Error Code Mapping File.

Figure 10-148 Sample Error ID Call Home

Figure 10-149 shows a Call Home test with the -t option:

Figure 10-149 Call Home test

434

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

11

Chapter 11.

Migration overview
In this chapter we discuss how to migrate your existing file server or NAS filer into the SONAS system. Migration of data on file systems is more complex then migration of data on block devices.There is no universal tool or method for file migration. We cover the following topics: Migration of user authentication and ACLs Migration of files and directories Migration of CIFS shares and NFS exports

Copyright IBM Corp. 2010. All rights reserved.

435

11.1 SONAS file system authentication


In this section we illustrate the authentication services offered by the SONAS file system.

11.1.1 SONAS file system ACLs


The SONAS filesystem is provided by GPFS technology, so most of the implementation details and considerations are similar to those in a GPFS environment. We refer interchangeably to the SONAS file system and to the SONAS GPFS file system. The SONAS file system supports the NFSv4 ACL model and so offers much better support for Windows ACLs. For more information about NFSv4 ACLs, see section 5.11 in the RFC3530 at: http://www.nfsv4.org/ NFS V4 ACLs are very different than traditional ACLs, and provide much more fine control of file and directory access. The NFS version 4 ACL attribute is an array of access control entries (ACE). Although, the client can read and write the ACL attribute, the NFSv4 model is the server does all access control based on the server's interpretation of the ACL. If at any point the client wants to check access without issuing an operation that modifies or reads data or metadata, the client can use the OPEN and ACCESS operations to do so. In the case of NFS V4 ACLs, there is no concept of a default ACL. Instead, there is a single ACL and the individual ACL entries can be flagged as being inherited either by files, directories, both, or neither. SONAS file ACLs can be listed issuing the mmgetacl command as shown in Example 11-1.
Example 11-1 mmgetacl output for a file
[root@plasma.mgmt001st001 b031]# mmgetacl pump0426_000019a4 #NFSv4 ACL #owner:root #group:root special:owner@:rw-c:allow (X)READ/LIST (X)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (X)CHOWN (-)EXEC/SEARCH (X)WRITE_ACL (X)WRITE_ATTR (-)WRITE_NAMED special:group@:r---:allow:DirInherit (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED user:redbook:r---:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED group:library:r---:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED special:everyone@:r---:allow (X)READ/LIST (-)WRITE/CREATE (-)MKDIR (X)SYNCHRONIZE (X)READ_ACL (X)READ_ATTR (-)READ_NAMED (-)DELETE (-)DELETE_CHILD (-)CHOWN (-)EXEC/SEARCH (-)WRITE_ACL (-)WRITE_ATTR (-)WRITE_NAMED

An NFS V4 ACL consists of a list of ACL entries, the GPFS representation of NFS V4 ACL entries are three lines each, due to the increased number of available permissions beyond the traditional rwxc.

436

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The first line has several parts separated by colons (:). The first two parts identify the user or group and the name of the user or group. The third part displays a rwxc translation of the permissions that appear on the subsequent two lines. The fourth part is the ACL type. NFS V4 provides both an allow and deny type: allow deny Means to allow (or permit) those permissions that have been selected with an X. Means to not allow (or deny) those permissions that have been selected with an X.

The fifth, optional, and final part is a list of flags indicating inheritance. Valid flag values are: FileInherit DirInherit Indicates that the ACL entry should be included in the initial ACL for files created in this directory. indicates that the ACL entry should be included in the initial ACL for subdirectories created in this directory (as well as the current directory). Indicates that the current ACL entry should NOT apply to the directory, but SHOULD be included in the initial ACL for objects created in this directory.

InheritOnly

As in traditional ACLs, users and groups are identified by specifying the type and name. For example, group:staff or user:bin. NFS V4 provides for a set of special names that are not associated with a specific local UID or GID. These special names are identified with the keyword special followed by the NFS V4 name. These names are recognized by the fact that they end with the character @. For example, special:owner@ refers to the owner of the file, special:group@ the owning group, and special:everyone@ applies to all users.

11.1.2 File sharing protocols in SONAS


SONAS supports multiple file sharing protocols such as CIFS, NFS, FTP and others to access files over the network. We introduce some of these protocols and their implications with SONAS.

NFS protocol
The Network File System (NFS) protocol specifies how computers can access files over the network in a similar manner to how files are accessed locally. NFS is now an open standard and is implemented in most major operating systems. There are multiple versions of NFS, NFSv4 is the most current and emerging version and the most widespread in use is NFSv3. SONAS supports NFSv3 as a file sharing protocol for data access and the SONAS filesystem implements NFSv4 ACLs. The NFS protocol is a client server protocol where the NFS client accesses data from a NFS server. The NFS server, and SONAS acts as an NFS server, exports directories. NFS allows parameters such as read only, read write and root squash to be specified for a specific export. The NFS client mounts exported directories using the mount command. Security in NFS is managed as follows: Authentication, the process of verifying if the NFS client machine is allowed to access the NFS server, in NFS is performed on the IP address of the NFS client. NFS client IP addresses are defined on the NFS server when configuring the export.

Chapter 11. Migration overview

437

Authorization, or verifying if the user can access a specific file, is done based on the user and group of the originating NFS client and this is matched against the file ACLs. As the user on the NFS client is passed as is to the NFS server, a NFS client root user will have root access on the NFS server; to avoid a NFS client gaining root access to the NFS server you can specify the root_squash option.

FTP protocol
File Transfer Protocol (FTP) is a protocol to copy files from one computer to another over a TCP/IP connection. FTP is a client server architecture where the FTP client accesses files from the FTP server. Most current operating systems support the FTP protocol natively and so do most web browsers. FTP supports user authentication and anonymous users. SONAS supports FTP authentication through the SONAS AD/LDAP servers. File access authorization is done with ACL support. SONAS supports enforcement of ACLs and the retrieval of POSIX attributes but ACLs cannot be modified using FTP.

CIFS protocol
The protocol used in Windows environments to share files is the Server Message Block (SMB) protocol, sometimes called the Common Internet File System (CIFS) protocol. The SMB protocol originated in IBM and was later enhanced by Microsoft and was renamed to CIFS. Among the services that Windows file and print servers provides are browse lists, authentication, file serving and print serving. Print serving is out of the scope of our discussion. Browse lists offer a service to clients that need to find a share using the Windows net use command or Windows Network Neighborhood. The file serving function in CIFS comprises the following functions: Basic server function Basic client function Distributed File System (Dfs) Offline files/Client side caching Encrypted File System (EFS) Backup and restore Anti-virus software Quotas The protocol also includes authentication and authorization and related functions such as: NT Domains NT Domain trusts Active Directory Permissions and Access Control Lists Group policies User profile and logon scripts Folder redirection Logon hours Software distribution, RIS and Intellimirror Desktop configuration control Simple file serving function in SONAS is relatively straightforward, however, duplicating some of the more advanced function available on Windows servers can be more difficult to set up. SONAS uses the CIFS component to serve files. Authentication is provided through LDAP or AD with or without the Microsoft SFU component. Authorization is supported using ACLs are enforced on files and directories, for users with up to 1020 group memberships. Windows tools can be used to modify ACLs. ACL inheritance is similar, not identical, to Microsoft Windows and SACLs are not supported.

438

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

11.1.3 Windows CIFS and SONAS considerations


Windows clients can use the CIFS protocol to access files on the SONAS file server. The SONAS CIFS implementation is nearly transparent to the users. SONAS like a Windows file and print server, but cannot behave completely as a Windows 2000 or XP server. Server side encryption and file transfer compression or server side compression is not supported. SONAS does not support signed SMB requests. SONAS can participate in a DFS infrastructure but cannot act as the DFS root. Transparent failover between nodes is supported providing the application supports network retry. To insure consistency files are synced to disk when CIFS closes the file. SONAS CIFS supports byte range strict locking. SONAS supports lease management for client side caching, it supports level1 opportunistic locks (oplocks) but not level2 oplocks. Files migrated by SONAS HSM will be shown as offline files and marked with the hourglass symbol in Windows Explorer. SONAS supports the standard CIFS timestamps for the client: Created Time stamp: When the file was created in current directory, when copied to a new directory a new time stamp is created Modified Time stamp: When the file is last modified. When the file is copied to another directory the modified time stamp remains the same. Accessed Time stamp: This is the time when the file is last accessed. This value is set by the application program that sets the value. It is application dependent, not all applications update this timestamp. Meta-Data change Timestamp. last change to file metadata. SONAS snapshots are exported to Windows CIFS clients via the VSS API, This means that the snapshot data can be accessed through the previous versions dialog in Windows Explorer. SONAS supports case insensitive file lookup for CIFS clients. SONAS also supports the DOS attributes on files, The read-only bit is propagated to POSIX bits to make it available to NFS and other clients. SONAS supports the automatic generation of MS-DOS 8.3 character file names. Symbolic links are supported for clients such as Linux and MAC OSX that use the SMB unix extensions. Symbolic links are followed on the server side for Microsoft SMB clients but will not be displayed as symbolic links but as files or directories that are referenced by the link. SONAS does not support access-based enumeration, also called hide unreadable, that hides directories and files from users that have no read access. SONAS does not support sparse files. SONAS does not support interoperations with WINS to appear in the Network Neighborhood of Windows clients. SONAS does not currently supports the SMB2 and SMB2.1 enhancements of the SMB protocol introduced by Windows 2008 and Windows 7. Multiple protocols can be selected and configured when creating an export. If there is not a specific need to configure multiple protocols a single protocol should be used to increase performance by not propagating leases and share modes into the Linux kernel to allow proper interaction to direct file system access by multiple protocols.

Chapter 11. Migration overview

439

11.2 Migrating files and directories


When you deploy a new SONAS infrastructure in an existing environment, you might need to migrate files and directories from you current file servers to SONAS. The process of migrating files and directories need to be planned carefully as the migration of files and directories might require a considerable amount of time and the migration of file metadata requires careful evaluation.

11.2.1 Data migration considerations


The migration of files and directories consists in copying the data from a source file server to the destination SONAS appliance using specific tools such as robocopy or rsynch. These tools work by reading data from the source file server and writing to the destination SONAS appliance as shown in Figure 11-1. We have an intermediate system, a data mover system, that mounts both the old file server shares and the new SONAS appliance shares and executed the migration software tool copies all the data over.

Figure 11-1 Migration scenario with data mover

In the diagram above the data flows twice through the network, read from old file server and write to new SONAS. Depending on the type of the file server it might be possible to run the migration software tool on the old file server system and eliminate one network hop. The amount of time to copy the data over is affected by multiple factors such as these: The amount of data to migrate, the more the data the longer the time it will take to migrate the data. The network bandwidth available for the migration, the greater the bandwidth the shorter the time. One way is to dedicate network links for the migration process. The data mover system will have a greater bandwidth requirements as it will have to read the data from the source system and write it out again to the destination system so it will need twice the network bandwidth as the SONAS appliance. One way to reduce contention is to use two different adapters, one to the source filer and a separate one to the SONAS system. The utilization of the file server. Contention for file server resources might slow down the migration. The file server might be still in production use, therefore, evaluate file server disk and server utilization before the migration. Average file size might impact migration times as smaller files will have more metadata overhead to manage for a given amount of data and so will take longer to migrate. Disk fragmentation on the source file server might slow down the reading of large sequential files. 440
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Applications and users typically need access to a whole export or share or a whole subdirectory tree in an export or share. In general the application does not allow to have access to a subset of directories or shares. Consequently, the migration of files needs downtime. During the migration process some files are already migrated while others are not and there are no mechanisms for synchronization between migration and user or application access to the file. Consequently applications and users cannot access data while files are being migrated. The data migration process can be executed in different ways: Migration of a file server in a single step Needs long downtime for larger file servers Requires downtime for all applications/users IP address of old file server can be replaced by IP address of SONAS in DNS server File access path does not change from an application/user point of view

Migration of a file server one share/export after the other Shorter downtime as above Requires downtime for some apps/users DNS update does not work Old file server and SONAS run in parallel Applications and users must use new access path, once files are migrated, and this requires client side changes Migration of a file server one subdirectory after the other Requires a shorter downtime than the case above Same considerations as for migration by share/export The use of tools that allow incremental resyncronization of changes from source to target open up additional possibilities and we show the two options: Stopping the client applications, copying the files from source system to destination and then redirecting clients to the SONAS target. This approach requires potentially large downtimes to copy all the data. Copy the data to the SONAS target while clients access the data. After most of the data has been copied the client applications are stopped and only the modified data is copied. This approach reduces the downtime to a synchronization of the updated data since the last copy was performed. It requires that the file copy tool that you use support incremental file resynch.

11.2.2 Metadata migration considerations


The migration of file metadata needs careful planning. SONAS 1.1 stores access control lists in GPFS NFSv4 ACLs, this is made possible because GPFS supports the NFSv4 ACL model. UNIX POSIX bits are mapped to GPFS ACLs by SONAS internally. Windows CIFS ACLs are mapped to GPFS ACLs and Windows DOS attributes such as hidden, system, archive and readonly are stored in the GPFS inode. Data migration tools such as xcopy and robocopy need to copy the metadata correctly. NFS v3 installations can use standard UNIX tools such as cp and rsync to copy the data from the source to the destination system. Files owned by root might require special attention as different UNIX platforms might have different default groups for root, also the root squash export option might remap the root user UID/GID to a different UID/GID, the one assigned to the nobody account on tho SONAS server or on the data mover machine. NFS v4 client access is currently not supported by SONAS.

Chapter 11. Migration overview

441

Installations using the CIFS client access can use standard Windows tools for file migration such as xcopy or robocopy. SONAS ACLs are not fully interoperable with Windows ACLs. If you have complex ACL structures, for example, structures that contain large numbers of users and groups or nested groups, an expert assessment of the ACL structure is strongly preferable, and a proof of concept might be needed to verify differences and develop a migration strategy. If you have a mixture of NFS v3 and CIFS access to your file systems you must decide wether to use Windows or UNIX copy tools as only one tool can be used for the migration. As Windows metadata tends to be more complex than the UNIX metadata we suggest that you use the Windows migration tools for migration and then verify if UNIX metadata is copied correctly. Additional challenges might be present when you must migrate entities such as sparse files, hard and soft links and shortcuts. For example, using a program that does not support sparse files to read a sparse file that occupies 10MB of disk space and represents 1 Gb of space will cause 1 Gb of data to be transferred over the network. You need to evaluate these cases individually to decide how to proceed. ACLs: The migration of ACLs makes sense only when the destination system will operate within the same security context as the source system, meaning that they will use the same AD or LDAP server.

11.2.3 Migration tools


There are multiple tools that can be used to migrate files from one file server to another file server or SONAS appliance. We illustrate some of the available tools: xcopy The xcopy utility is an extended version of the Windows copy command. This utility comes with the Windows NT and 200x operating systems. Xcopy was developed to copy groups of files between a source and a destination. As of Windows 2000 xcopy can copy file and directory ACLs, these were not copied in Windows NT. xcopy is deprecated and substituted by the robocopy command. Xcopy should be used to migrate CIFS shares. Robocpy is a Windows tool and has been developed as a follow-on to xcopy. Robocopy was introduced with the Windows NT 4.0 resource kit and has become a standard feature of Windows Vista, Windows 7 and Windows server 2008. Robocopy offers mirroring of directories and can copy NTFS file data together with attributes, timestamps and NTFS ACLs. It supports network failure and restart and also bandwidth throttling. Robocopy also supports a mirror mode to align the contents of directories and remove destination files that have been removed from the local directory. Robocopy offers a GUI interface that can be used to execute file migrations or generate a command line script for deferred execution. The robocopy utility should be used to migrate CIFS shares. Richcopy is a freely available Microsoft tool similar to robocopy that has a GUI interface. One of the advantages of richcopy is that it allows multitheading of copy operations and this can improve migration performance. The secure copy or scp command is an implementation of the secure copy protocol. It allows you to securely transmit files and directories and timestamps and permissions for files. It should be used to transport data between NFS shares.

robocopy

richcopy

secure copy

442

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

rsync

The rsync tool is a UNIX application to synchronize files and directories between different servers. It minimizes data transfer between the sites as it can find and trisomic only file differences, this can be useful when performing data migration in incremental steps. The rsync tool supports compression and encrypted transmission of data and it offers bandwidth throttling to limit both bandwidth usage and load on the source system. It supports the copying of links, devices, owners, groups, and permissions and ACLs; it can exclude files from copy and copy links and sparse files. The rsync tool should be used to transport data between NFS shares.

net rpc share Samba offers a utility called net that when used as net rpc share migrate files can be used to copy files and directories with full preservation of ACLs and DOS file attributes. To use this utility to migrate files from an existing Windows file server to a SONAS system you need a separate data mover system running Linux with samba. This migration approach can be used to transport data between CIFS shares. For more information about the commands, see: http://www.samba.org/samba/docs/man/Samba-HOWTO-Collection/NetCommand.html There are other tools for file and directory copy but they are outside the scope of our discussion. Whatever tool is chosen you should test the migration process and the resulting migrated files and directories before performing the real migration and switchover. Special care should be placed in verifying the permissions and ACL migration. Tools such as Brocade VMF/StorageX have been discontinued. For information about the F5 file virtualization solution, see: http://www.f5.com/solutions/technology-alliances/infrastructure/ibm.html There are various products on the market that can perform transparent file migration from a source file server to a destination file server such as SONAS. These products act as a virtualization layer that sits between the client application and the file servers, migrates data in the background while redirecting user access to the data. The F5 intelligent file virtualization solutions enable you to perform seamless migrations between file servers and NAS devices such as SONAS. No client reconfiguration is required and the migration process that runs in the background does not impact user access to data. For more information, see: http://www.f5.com/solutions/storage/data-migration/ The AutoVirt file virtualization software offers a policy-based file migration function that can help you schedule and automate file migration tasks and then perform the file migration activities transparently in the background while applications continue to access the data. For more information, see: http://www.autovirt.com/ The SAMBA suite offers a set of tools that can assist in migration to a Linux SAMBA implementation. The net rpc vampire utility can be used to migrate one or more NT4 or later domain controllers to a SAMBA domain controller running on Linux, the vampire utility acts as a backup domain controller and replicates all definitions from the primary domain controller. SAMBA also offers the net rpc migrate utility that can be used in multiple ways as illustrated: net rpc share migrate all migrates shares from remote to a destination server. net rpc share migrate files migrates files and directories from remote to a destination server. net rpc share migrate security migrates share-ACLs from remote to destination server. net rpc share migrate shares migrates shares definitions from remote to destination server.
Chapter 11. Migration overview

443

11.3 Migration of CIFS shares and NFS exports


After having migrated files and their related permissions, you need to access the shares over the network using a protocol such as CIFS or NFS. The access path for a user or an application comprises the following components: The DNS name or the IP address of the file server The name of the CIFS network drive or the NFS export directory The file path inside the network drive/export directory also called sub directory tree and filename After migration to the new server you can either change the IP address of the file server on the clients to point to the new server. Alternatively you can change the IP addresses in the DNS name to point to the new server, this approach requires less effort as only a DNS name change is required. It can only be used with offline migration as in coexistence mode, when both the old and new file servers are used, you will require two distinct IP addresses. The name of the CIFS network drive or of the NFS export should be maintained unchanged on the new server to simplify migration. The file path should also be maintained unchanged by the file and directory migration process to minimize application disruption. Samba offers a utility called net rpc share migrate shares that can be used to copy share definitions from a source file server to a destination file server. This utility can be used to get a list of all shares on a source server with the net rpc share list

11.4 Migration considerations


File data migration in NAS environments is quite different from block data migration that is traditionally performed on storage subsystems. In a block environment you migrate LUNs in relatively straightforward ways. When you migrate files in a NAS environment you have to take into account additional aspects such as the multiple access protocols that can be used and the multiple security and access rights mechanisms that the customer uses and how these fit in with SONAS. The challenges in migrating file data in are: Keep downtime to a minimum or even no down time Ensure there is no data loss or corruption Consolidation, where multiple source file servers are migrated into one target For completeness of our discussion, one way to avoid migration challenges is to avoid data migration and repopulate the new environment from scratch. This is easier to do for specific environments such as digital media or data mining but can be problematic for user files as you cannot expect end users to start from scratch with their files. When planning for the migration you should consider the downtime required to perform the migration, downtime during which filesystems will not be accessible to the users. The downtime duration is the time taken to copy all the files from the source filesystem to the target filesystem and reconfigure users to point to the target filesystem and it is proportional to the amount of data to be copied. To reduce the downtime you might plan to do an initial copy of all the data while users keep accessing it, then terminate user access and copy only the files that have changed since the full copy was completed, thus reducing the downtime required to perform the migration.

444

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

11.4.1 Migration data collection


Before initiating file data migration to SONAS, gather information about the customer environment and the amount of data to be migrated. Table 11-1 illustrates the main information that is required to plan for file data migration.
Table 11-1 Migration data collection Value number of filesystems total amount of data GB average file size MB size of largest filesystem GB list of filesystems with size & number of files data change rate GB Description How many individual filesystems need to be migrated The quantity of data to migrate in all filesystems in GB. This is the used space in the filesystems, not the allocated space. The average file size on all filesystems. are there any single very large filesystems A list of all filesystems with the size and number of files for each one is useful for detailed planning, if available. The amount of data that is changed in the filesystem on a daily or weekly basis. This can be obtained from backup statistics as the change rate corresponds to the number of files backed up each day with incremental backups. The number of users accessing the filesystems What kind of protocols are currently being used for authentication: AD, LDAP, SAMBA PDC? Are files shared between Windows and unix environments? Using what kind of method for mapping Windows SIDs to UNIX UIDs/GIDs? How are the filers connected to the network. What is the maximum bandwidth available for the data migration? Will migration of data impact the same network connections used by the end users? Are there routers and firewalls between the source and destination file servers?

number of users authentication protocols Windows to UNIX sharing

Network bandwidth

routers and firewalls

With this information you can start to estimate aspects such as the amount of time it will physically take to migrate the data, in what timeframe this can be done and what impact it will have on users.

11.4.2 Types of migration approaches


There are multiple migration methods, including these: Block level migration, which is not applicable in our case but included to show differences File system migration based on network copy from source to target File level backup and restore A block-level migration can be performed when moving data to a new file server using the same operating system platform. When performing block-level migration the advantages are that you do not have to concern yourself with file metadata, file ACLs or permissions, the migration can often be performed transparently in the background and is fast. The possible disadvantages are that it might require additional hardware with associated installation costs

Chapter 11. Migration overview

445

and might require multiple service interruptions to introduce and remove the migration appliance. This approach is not applicable in SONAS as it is an appliance and the data to migrate comes from external appliances. A file system migration uses tools to copy files and file permissions from a source file server to a target file server. The advantages are that there are multiple free software tools such as xcopy and rsync to do this and they are relatively easy to set up, require little or no new hardware. The disadvantages are that the migration of ACLs needs administrative account rights for the duration of the migration, it is generally slower then block-level migration as the throughput is gated by the network and the migration server and you must plan for mapping CIFS to NFS shares. File level backup and restore is also a viable migration option. It has to be a file-level backup, so NDMP backups are not an option as they are full backups and are written in an appliance specific format. Also the file level backups have to come from the same operating system type as the target system, so in the case of SONAS the source system should be a UNIX or Linux system. The advantages of this approach are that it is fast, the backup environment most likely already in place and there are minimal issues due to ACLs and file attributes. The possible disadvantages include that restores from these backups need to be tested before the migration date, tapes might get clogged up by the migration so scheduled backups might be at risk and also network congestion if there is no dedicated backup network. The diagram in Figure 11-2 shows the components and flows for a file-system copy migration:

UNIX NFS client

3 6
Authentication server with AD or LDAP Windows CIFS client

Authentication flow Data access flow Data migration flow

4
Source file server

Windows server for CIFS w/robocopy Linux server for NFS w/rsync

2
Target SONAS

Figure 11-2 File system copy migration flow

First note that all components share and access one common authentication service (3) that runs one of the protocols supported in a SONAS environment. UNIX and Linux clients (6) are connected to the source file server (1). We have a server to migrate each file sharing protocol, so for UNIX file systems, we use the Linux server with robocopy (5) and for Windows filesystems, we use the Windows server with robocopy (4). The UNIX server (5) and Windows server (4) connect to both the source file server (1) and to the SONAS target server (2) over the customer LAN network. The robocopy or rsync utilities running on these servers will read file data and metadata from the source file server (1) and copy them file by file to the target SONAS (2). The migration steps in this scenario are as follows: 1. 2. 3. 4. Copy one group of shares or exports at a time, from source filer to SONAS Shutdown all clients using those shares or exports Copy any files that have been changed since last copy Remap the clients to access the SONAS and restart the clients

446

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

11.4.3 Sample throughput estimates


The performance you can get in migrating data depends on many factors, we show a sample scenario based on the following environment: Network: 10 G Brocade Ethernet network switch TurboIron X24 1 G Cisco Ethernet network switch Servers: IBM System X 3650-M2 Windows server with Qlogic 10G Ethernet card and 1G Broadcom Extreme II IBM System X 3650-M2 Linux server with Qlogic 10G Ethernet card and 1G Broadcom Extreme II Storage: Nseries N6070 with 28 disks for NFS share Nseries N6070 with 28 disks for CIFS share SONAS 2851 base config (3x IO nodes, 2x storage nodes, 120x disk drives 60xSAS and 60xSATA) Here are the migration throughput test results that were obtained in this environment: For 10G Ethernet we got the following values: Rsync, NFS mount on Linux server with 10 GigE 140MB/s Robocopy, CIFS share on Windows 2008/2003 server with 10 GigE 70 MB/s Richcopy, CIFS share on Windows 2008/2003 server with 10 GigE 90 MB/s For 10G Ethernet with jumbo frames enabled we got the following values: Rsync, NFS mount on Linux server with 10 GigE interface 140 MB/s Robocopy, CIFS share on Windows 2008/2003 server with 10 GigE 95 MB/S Richcopy, CIFS share on Windows 2008/2003 server with 10 GigE 110 MB/s For 1G Ethernet we got the following values: Rsync, NFS mount on Linux server with 1 GigE interface 35-40 MB/s Robocopy, CIFS share on Windows 2008/2003 server with 1 GigE 25-35 MB/S Richcopy, CIFS share on Windows 2008/2003 server with 1 GigE 28-35MB/s To get a definitive performance estimate for your environment you should run migration test runs where you copy data to the new server without affecting or modifying end user access to the source file server.

11.4.4 Migration throughput example


Let us discuss one migration example. Our company has 2000 users. Each user has a mail box files about 500 MB and a archive files of 1,5 GB. Assuming the mail box file changes about 25% a day per user, we calculate the daily change rate. 500MB * 2000 *0,25 =250000MB ~ 244GB/day Assuming we can start the migration after office hours gives us a window of about 8-10 h. So to copy 244 GB /10h = 25G B/h would require a migration speed in the order of 6,9 MB/s or 55Mb/s.

Chapter 11. Migration overview

447

Assume that after test runs in our environment we measured the following throughputs: 63 MB/s or 504 Mb/s on a 10 Gb link 48MB/s or ~380 Mb/s on a 1 Gb link In this case the migration of the new data would last about 1,10 h on the 10 Gb link and 1,44 h on the 1 Gb link. In our migration test this would translate to about 1- 1.5h migration time for 244GB data and the maximum amount of data changes per day of about 1.3 TB, assuming a 6 hour migration window and a 63MB/sec data rate on the 10 Gb link. Continuing with the example above, in addition to the 244 GB/day change to the mailbox, users do also archiving of the changes. Assuming that the complete archive file will also be migrated this will result in the following duration: 10 Gb link: (500MB+1500MB)*2000/63MB/s = ~17h 1 Gb link:(500MB+1500MB)*2000/48MB/s = ~23h In this case the migration would run longer then the allocated window. You now have two options: Split the migration load to two separate migration servers or run the migration tool more frequently as most tools only migrate the delta between the source and the target file. As mentioned before, the right measure will probably only be determined by test runs.

448

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

12

Chapter 12.

Getting started with SONAS


In this chapter we take you through common tasks from the basic implementation through monitoring your SONAS system, using a scenario that we developed based on our experience writing this book.

Copyright IBM Corp. 2010. All rights reserved.

449

12.1 Quick start


In this scenario, we go through tasks in a day in the life of a SONAS administrator. These tasks include the following: Login using CLI and GUI Create a new administrative user Create a filesystem Create exports Access the export from a client Create and access a snapshot Backup and restore files with Tivoli Storage Manager Create and apply a policy Monitoring: Add a a new public IP address

12.2 Connecting to the SONAS appliance


We can connect to the SONAS appliance using either the Command Line Interface (CLI) or the Graphical User Interface (GUI) We show how to connect to the SONAS appliance at address 9.11.102.6, that is the management node public IP address of the SONAS appliance in our environment, your address will be different.

12.2.1 Connecting to the SONAS appliance using the GUI


To connect to the appliance using the GUI open a web browser. In our environment we use the following URL to connect: https://9.11.102.6:1081/ibm/console You will see a Login screen as shown in Figure 12-1.

Figure 12-1 SONAS login screen

450

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Enter the administrator user and password and click the Log in button and you will be connected to the SONAS admin GUI as shown in Figure 12-2. Note that the default userid and password for a newly installed SONAS appliance is root and Passw0rd.

Figure 12-2 : SONAS GUI welcome screen

Help: You can access SONAS help information from the SONAS at the following URL: https://9.11.102.6:1081/help

12.2.2 Connecting to the SONAS appliance using the CLI


To connect to the appliance using the CLI start a ssh client session to the SONAS management node public address as shown in Figure 12-1 on page 451.
Example 12-1 Connect to SONAS using CLI

# ssh root@9.11.102.6 root@9.11.102.6's password: Last login: Mon Aug 3 13:37:00 2009 from 9.145.111.26

Chapter 12. Getting started with SONAS

451

12.3 Creating SONAS administrators


SONAS administrators manage the SONAS cluster. You can create an administrator using either the GUI or the command line:

12.3.1 Creating a SONAS administrator using the CLI


To create a SONAS user using the CLI you use the mkuser command as shown in Example 12-2.
Example 12-2 Create admin user via CLI

[root@sonasisv.mgmt001st001 ~]# mkuser my_admin -p segreta EFSSG0019I The user my_admin has been successfully created. We can list the users with the lsuser command as shown in Example 12-3.
Example 12-3 List users via CLI

[root@sonasisv.mgmt001st001 ~]# lsuser Name ID GECOS Directory Shell cluster 901 /home/cluster /usr/local/bin/rbash cliuser 902 /home/cliuser /usr/local/bin/rbash my_admin 903 /home/my_admin /usr/local/bin/rbash

12.3.2 Creating a SONAS administrator using the GUI


From the SONAS GUI select Setting and Console User Authority and you will see a list of defined users as shown in Figure 12-3.

Figure 12-3 Display the user list via GUI

To add a user created via the CLI to the GUI, for example, my_admin, you use the Add button in the Console User Authority window as shown in Figure 12-3.

452

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

You will see a panel as shown in Figure 12-4. Enter the administrator name and select the administrator role to grant this administrator maximum privileges and click OK.

Figure 12-4 Create a new admin user via the GUI

SONAS offers multiple GUI administrative roles to limit the administrator working scope within the GUI. The following roles are available: administrator The administrator has access to all features and functions provided by the GUI. It is the only role that can manage GUI users and roles, and is the default when adding a user with the CLI. The operator can do the following: Check cluster health, view cluster configuration, verify system and file system utilization and manage thresholds and notifications settings. The export administrator is allowed to create and manage shares, plus perform the tasks the operator can execute. The storage administrator is allowed to manage disks and storage pools, plus perform the tasks the operator can execute. The system administrator is allowed to manage nodes and tasks, plus perform the tasks the operator can execute.

operator

export administrator storage administrator system administrator

Roles: These user roles only limit the working scope of the user within the GUI. This limitation does not apply to the CLI, which means the user has full access to all CLI commands.

12.4 Monitoring your SONAS environment


You can monitor your SONAS system using the GUI. The GUI offers multiple tools and interfaces to view the health of the system. Selected resources can also be monitored using the command line.

Chapter 12. Getting started with SONAS

453

12.4.1 Topology view


Select health Summary Topology view as shown Figure 12-5.

Figure 12-5 Topology view from SONAS GUI

The Topology view offers a high level overview of the SONAS appliance, it highlights errors and problems and allows you to quickly drill down to get more detail on individual components. The Topology view offers an overview of the following components: Networks: interface and data networks Nodes: interface, management, and storage File systems and exports

454

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In Figure 12-6 we see that the interface node is in critical status as it is flagged with a red circle with an x inside. To expand the interface node, click the blue Interface Nodes link or on the plus (+) sign at the bottom right of the interface nodes display and you will see the interface node list and the current status as shown in Figure 12-6.

Figure 12-6 Interface node status list from Topology view

To see the reason for the critical error status for a specific node, click the node entry in the list and you get a status display of all events as shown in Figure 12-7.

Figure 12-7 Node status messages

Chapter 12. Getting started with SONAS

455

The first line shows that the problem originated from a critical SNMP error, after having corrected the error situation you can mark it as resolved by right-clicking the error line and and clicking the Mark Selected Errors as Resolved box as shown in Figure 12-8.

Figure 12-8 :Marking an error as resolved

From the Topology view you can display and easily drill down to SONAS appliance information, for example, to view the filesystem information, click the Filesystems link in the Topology view as shown in Figure 12-9.

Figure 12-9 Open filesystem details information

You will see a display such similar to the one shown in Figure 12-10.

Figure 12-10 Filesystem details display

If you click the new window sign as shown in Figure 12-11 you will see the SONAS filesystem configuration window.

Figure 12-11 Open filesystem page

456

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 12-12 shows the SONAS filesystem configuration window.

Figure 12-12 Filesystems page display

12.4.2 SONAS logs


SONAS offers multiple logs for monitoring its status. The alert log contains SONAS messages and events. It is accessed from Health Summary Alert Log and a sample is shown in Figure 12-13.

Figure 12-13 Alert log display

The system log contains operating system messages and events. It is accessed from Health Summary System Log and a sample is shown in Figure 12-14.

Figure 12-14 System log display

Chapter 12. Getting started with SONAS

457

12.4.3 Performance and reports


The performance and reports option allows you to generate and display SONAS hardware component and filesystem utilization reports.

Hardware component reports


Select Performance and Reports System Utilization and you will see a list of SONAS nodes as illustrated in Figure 12-15.

Figure 12-15 System utilization display

You can report on CPU, memory, network and disk variables and generate reports from a daily basis up to 3 years. To generate a disk I/O report for strg001st001 select the storage node, select Disk I/O as Measurement Variable and select Monthly Chart for Measurement Duration and click the Generate Charts button and you will get a chart as illustrated in Figure 12-16.

Figure 12-16 Local disk IO utilization trend report

458

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Filesystem utilization
Select Performance and Reports Filesystem Utilization and you will see a list of SONAS filesystems as illustrated in Figure 12-17.

Figure 12-17 Filesystem utilization selection screen

You can generate space usage charts by selecting a filesystem and a duration such as Monthly chart, click Generate Charts and you will get a chart as shown in Figure 12-18.

Figure 12-18 Disk space utilization trend report

12.4.4 Threshold monitoring and notification


SONAS can be configured to monitor specific events and thresholds and send emails and SNMP traps. To set up notification connect to the SONAS GUI and select SONAS Console settings Notification Settings. This will bring up the recipients settings screen. Enter the values on the panel illustrated in Figure 12-19 and click the Apply button.

Chapter 12. Getting started with SONAS

459

Figure 12-19 Notification settings panel

The next step is to configure notification recipients. Select SONAS Console Settings Notification Recipients Add Recipient and you are presented with the panel shown in Add recipients panel (Figure 12-20).

Figure 12-20 Add recipients panel

460

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The notification recipients screen is now updated with the email recipient as shown in Figure 12-21.

Figure 12-21 Notification recipients panel

You can monitor specific utilization thresholds by going to SONAS Console Settings Utilization Thresholds and will see a panel as illustrated in Figure 12-22 and clicking the Add Thresholds button:

Figure 12-22 Utilization threshold display panel

Chapter 12. Getting started with SONAS

461

You are prompted for a threshold to monitor from the following list: File system usage GPFS usage CPU usage Memory usage Network errors Specify warning and error levels and also recurrences of the event as shown in Figure 12-23 and click OK.

Figure 12-23 Add new utilization thresholds panel

12.5 Creating a filesystem


You can create a new filesystem using either the GUI or the CLI.

12.5.1 Creating a filesystem using the GUI


To create a new filesystem using the GUI select Files Filesystems and you are presented with the panel shown in Figure 12-24.

Figure 12-24 Filesystems display panel

462

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Click the Create a File System button and you will be presented with the panel shown in Figure 12-25.

Figure 12-25 Create filesystem panel - select NSDs

On this panel you will see multiple tabs. The Select NSDs tab allows you to choose what Network Shared Disks (NSDs) to use. Basic - to select mount point, block size and device name Locking and ACLs Replication settings Automount settings Limits - maximum nodes Miscellanies - for quota management settings Choose one or more NSDs and then select the Basic tab and specify mount point and device name as shown in Figure 12-26. Accept the default for all remaining options.

Figure 12-26 Create filesystem panel - basic information

Now click the OK button at the bottom of the screen (not shown in our example). A progress indicator is displayed as shown in Figure 12-27. Click Close to close the progress indicator.

Chapter 12. Getting started with SONAS

463

Figure 12-27 Filesytem creation task progress

After completion you will see the filesystems list screen with the new redbook filesystem as shown in Figure 12-28.

Figure 12-28 Filesystem display panel

Tip: To display additional information about a given filesystem, click the filesystem name in the list. The name will be highlighted and the detailed filesystem information for the selected filesystem will be shown.

464

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Note that the redbooks filesystem is not mounted, as 0 nodes appears in the Mounted on Host column as shown in Figure 12-28. Select the redbooks filesystem entry and click the Mount button and you will be presented with a box asking where to mount the filesystem. Select Mount on all nodes and click OK as shown in Figure 12-29.

Figure 12-29 Filesystem mount panel

The file system will now be mounted on all interface nodes and on the management node.

12.5.2 Creating a filesystem using the CLI


To create a filesystem called redbook2 using the command line proceed as follows. List available disks using the lsdisk command as shown in Example 12-4.
Example 12-4 Check available disks with lsdisk

[sonas02.virtual.com]$ lsdisk Name File system Failure group gpfs1nsd gpfs0 1 gpfs2nsd gpfs0 1 gpfs3nsd redbook 1 gpfs4nsd 1 gpfs5nsd 1 gpfs6nsd 1

Type dataAndMetadata dataAndMetadata dataAndMetadata dataOnly dataAndMetadata

Pool system system system userpool system system

Status ready ready ready ready ready ready

Availability up up up

Timestamp 4/21/10 11:22 4/21/10 11:22 4/21/10 11:22 4/21/10 10:50 4/21/10 10:50 4/21/10 11:58

PM PM PM PM PM PM

To create a new filesystem using the gpfs5nsd disk, use the mkfs command as shown in Example 12-5.
Example 12-5 Create the file redbook2 filesystem

mkfs redbook2 /ibm/redbook2 -F "gpfs5nsd" --noverify -R none To list the new filesystems you can use the lsfs command as shown in Example 12-6.
Example 12-6 List all filesystems
[sonas02.virtual.com]$ lsfs Cluster Devicename Mountpoint Type Remote device Quota Def. quota Blocksize Locking type Data replicas Metadata replicas Replication policy Dmapi Block allocation type Version Last update sonas02.virtual.com gpfs0 /ibm/gpfs0 local local user;group;fileset 64K nfs4 nfs4 1 whenpossible F cluster 11.05 4/22/10 1:34 AM sonas02.virtual.com redbook /ibm/redbook local local user;group;fileset 256K nfs4 nfs4 1 whenpossible T scatter 11.05 4/22/10 1:34 AM sonas02.virtual.com redbook2 /ibm/redbook2 local local user;group;fileset 256K nfs4 nfs4 1 whenpossible T scatter 11.05 4/22/10 1:34 AM ACL type Inodes 33.536K 1 33.792K 1 33.792K 1

lsfs command: The lsfs command returns a subset of the information available in the SONAS GUI. Information not available in the lsfs command includes if and where mounted, and space utilization. To get this information from the command line, you need to run GPFS commands as root.

Chapter 12. Getting started with SONAS

465

To make the filesystem available you mount it on all interface nodes using the mountfs command as shown in Example 12-7.
Example 12-7 Mount filesystem redbook2

[sonas02.virtual.com]$ mountfs redbook2 EFSSG0038I The filesystem redbook2 has been successfully mounted. The file system can also be unmounted as shown in Example 12-8.
Example 12-8 Unmount filesystem redbook2

[sonas02.virtual.com]$ unmountfs redbook2 EFSSG0039I The filesystem redbook2 has been successfully unmounted.

12.6 Creating an export


You can configure exports using either the GUI or the CLI.

12.6.1 Configuring exports using the GUI


Connect to the SONAS GUI and select Files Exports and a screen similar to Figure 12-30 will be displayed.

Figure 12-30 Exports configuration

466

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

To create a new export for the redbook filesystem click the Add button and you will see the first screen of the export configuration wizard as shown in Figure 12-31. Select an export name and directory path and select the protocols you want to configure and click the Next> button.

Figure 12-31 Export configuration wizard e

You are presented with a NFS configuration screen shown in Figure 12-32. Add a client called * that represents all hostnames or IP addresses used by the clients. Unselect the read only and root squash attributes and click the Add Client button. When all users have been added click the Next button.

Chapter 12. Getting started with SONAS

467

Figure 12-32 Export configuration wizard NFS settings

You now re presented with the CIFS configuration screen shown in Figure 12-33. Accept the defaults and click the Next button.

Figure 12-33 Export configuration wizard CIFS settings

468

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

On the last screen click the Finish button to finalize the configuration. Close the task progress window that will appear and you will see the exports list screen shown in Figure 12-34.

Figure 12-34 Export list screen

Tip: To display additional information about a given export, click the export name in the list. The name will be highlighted and the detailed export information for the selected export will be shown below.

12.6.2 Configuring exports using the CLI


Use the mkexport command to create an export using the CLI.
Example 12-9 Create an export using CLI
[sonas02.virtual.com]$ mkexport my_redbook2 /ibm/redbook2/export1 --cifs "browseable=yes" --owner "VIRTUAL\administrator" EFSSG0019I The export my_redbook2 has been successfully created.

To list the newly created export use the lsexport command as shown Example 12-10.
Example 12-10 List all defined exports [sonas02.virtual.com]$ lsexport -v Name Path Protocol my_redbook /ibm/redbook/export1 NFS my_redbook /ibm/redbook/export1 CIFS my_redbook2 /ibm/redbook2/export1 CIFS Active true true true Timestamp Options 4/22/10 3:05 AM *=(rw,no_root_squash,fsid=1490980542) 4/22/10 3:05 AM browseable 4/22/10 3:05 AM browseable

Tip: The SONAS CLI does not show all export attributes, for example, the owner value is not shown. To determine the owner, use the GUI or the root account.

Chapter 12. Getting started with SONAS

469

12.7 Accessing an export


In this section, we show how to access an export from Windows and Linux.

12.7.1 Accessing a CIFS share from Windows


To access a CIFS share from a Windows system we logon to a Windows system that is part of the same domain as SONAS. Active Directory must be configured prior to performing this step. We then navigate to Start menu and select My Computer. Select Tools Map network Drive. You will see a window as shown in Figure 12-35. Enter the SONAS cluster name and export name: \\sonas2\my_redbook in the folder field and select a drive letter. Click the Finish button.

Figure 12-35 Map network drive

Open the My Computer and verify that you can see the mapped network drive called my_redbook as shown in Figure 12-36.

Figure 12-36 Verify mapped drive

470

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

12.7.2 Accessing a CIFS share from a Windows command prompt


Connect to a Windows system. Select Start Menu and select Command Prompt and enter the Windows net use command shown in Example 12-11.
Example 12-11 Windows net use to access shares I C:\Documents and Settings\administrator.ADS>net use z: \\sonas02.virtual.com\my_redbook * Type the password for \\sonas02.virtual.com\my_redbook: The command completed successfully.

Verify that the share is mounted again using the net use command as shown in Example 12-12.
Example 12-12 Listing an export using CLI C:\Documents and Settings\administrator.ADS>net use New connections will not be remembered. Status Local Remote Network ------------------------------------------------------------------------------OK Z: \\sonas02.virtual.com\my_redbook Microsoft Windows Network The command completed successfully.

12.7.3 Accessing a NFS share from Linux


To access a NFS share from a Linux host connect to the Linux with a user that is defined with the same authentication server used by the SONAS appliance. Create a mount point for your SONAS export for example: mkdir /sonas02/my_redbook Now enter the mount command to mount the filesystem exported from SONAS and then repeat the mount command without arguments to display all mounted filesystems as shown in Example 12-13.
Example 12-13 Mounting a SONAS filesystem on Linux [root@tsm001st010 ~]# mount -t nfs sonas02.virtual.com:/ibm/redbook/export1 /sonas02/my_redbook [root@tsm001st010 ~]# mount /dev/sda1 on / type ext3 (rw) proc on /proc type proc (rw) sysfs on /sys type sysfs (rw) devpts on /dev/pts type devpts (rw,gid=5,mode=620) tmpfs on /dev/shm type tmpfs (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) sonas02.virtual.com:/ibm/redbook/export1 on /sonas02/my_redbook type nfs (rw,addr=sonas02.virtual.com)

Chapter 12. Getting started with SONAS

471

To verify what is exported for your client and can be mounted you can use the smbclient -L command as shown in Example 12-14.
Example 12-14 Listing available exports [root@tsm001st010 ~]# smbclient -L sonas02.virtual.com -U "virtual\administrator" Enter virtual\administrator's password: Domain=[VIRTUAL] OS=[Unix] Server=[CIFS 3.4.2-ctdb-20] Sharename Type Comment -----------------IPC$ IPC IPC Service ("IBM SONAS Cluster") my_redbook Disk my_redbook2 Disk Domain=[VIRTUAL] OS=[Unix] Server=[CIFS 3.4.2-ctdb-20] Server --------Workgroup --------Comment ------Master -------

12.8 Creating and using snapshots


You can create snapshots using either the CLI or the GUI.

12.8.1 Creating snapshots with the GUI


To create a snapshot connect to the SONAS GUI and navigate to Files Snapshots and you will see a panel as shown in Figure 12-37.

Figure 12-37 Snapshots list window

To create a snapshot select the name of an active, that is mounted, file system from the list. We select the filesystem called redbook and then push the Create a new... button. Accept the default snapshot name in the panel shown in Figure 12-38 and click the OK button. By accepting the default snapshot name the snapshots will be visible in the Windows previous versions tab for Windows client systems.

472

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure 12-38 Snapshot name window

Figure 12-39 on page 473 shows the current status and list of snapshots for a specific filesystem, redbook in our case.

Figure 12-39 List of snapshots for filesystem

12.8.2 Creating snapshots with the CLI


To create a snapshot from the CLI using the default snapshot naming convention you use the mksnapshot command as shown in Example 12-15.
Example 12-15 Create a snapshot with the CLI
[sonas02.virtual.com]$ mksnapshot redbook EFSSG0019I The snapshot @GMT-2010.04.22-03.14.07 has been successfully created.

To list all available snapshots for the redbook filesystem you use the lssnapshot command as shown in Example 12-16.
Example 12-16 List all snapshots with the CLI
[sonas02.virtual.com]$ lssnapshot -d redbook Cluster ID Device name Path 720576040429430977 redbook @GMT-2010.04.22-03.14.07 720576040429430977 redbook @GMT-2010.04.22-03.06.14 720576040429430977 redbook @GMT-2010.04.22-02.55.37 Status Valid Valid Valid Creation Used (metadata) Used (data) ID 22.04.2010 05:14:09.000 256 0 3 22.04.2010 05:10:41.000 256 0 2 22.04.2010 05:05:56.000 256 0 1 Timestamp 20100422051411 20100422051411 20100422051411

12.8.3 Accessing and using snapshots


Windows offers a previous versions function to view previous versions of a directory. You can view the previous versions for a mounted share. Open My Computer and right-click a SONAS network drive or on any subdirectory in the network drive and select Properties from the pull-down menu and you will see a screen like that shown in Figure 12-40. Then select a version from the list and choose an action such as View Copy or Restore files.

Chapter 12. Getting started with SONAS

473

Figure 12-40 Viewing the Windows previous versions tab snapshot4

Tip: To access and view snapshots in a NFS share you must export the root directory for the filesystem as snapshots are stored in a hidden directory called .snapshots in the root directory. To view snapshots from a Linux client, connect to the Linux client, mount the file system from a root export and list the directories.
Example 12-17 Mount the filesystem and list the snapshots
[root@tsm001st010 sonas02]# mount -t nfs 10.0.1.121:/ibm/redbook /sonas02/my_redbook [root@tsm001st010 sonas02]# df Filesystem 1K-blocks /dev/sda1 14877060 tmpfs 540324 10.0.1.121:/ibm/redbook 1048576 [root@tsm001st010 total 129 dr-xr-xr-x 5 root drwxr-xr-x 4 root drwxr-xr-x 4 root drwxr-xr-x 4 root drwxr-xr-x 4 root

Used Available Use% Mounted on 3479016 10630140 25% / 0 540324 0% /dev/shm 155904 892672 15% /sonas02/my_redbook

export1]# ls -la /sonas02/my_redbook/.snapshots/ root root root root root 8192 32768 32768 32768 32768 Apr Apr Apr Apr Apr 22 22 22 22 22 05:14 02:32 02:32 02:32 02:32 . .. @GMT-2010.04.22-02.55.37 @GMT-2010.04.22-03.06.14 @GMT-2010.04.22-03.14.07

474

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

12.9 Backing up and restoring data with Tivoli Storage Manager


We illustrate how to configure Tivoli Storage Manager to perform backup and restore. All actions are performed from the SONAS command line as the GUI does not offer Tivoli Storage Manager configuration panels. We back up a filesystem called tsm0 that has been configured on the SONAS appliance called humboldt.storage.tucson.ibm.com and will use the Tivoli Storage Manager server called slttsm2.storage.tucson.ibm.com. Our SONAS cluster has six interface nodes and we configure Tivoli Storage Manager backup to run on the first three nodes. We start by configuring a new set of Tivoli Storage Manager clients to the Tivoli Storage Manager server for the redbook filesystem Connect as an administrator to the Tivoli Storage Manager server and define a target node for the tsm0 file system called redhum and we define three agent nodes, one for each interface node that will run backups, with a Tivoli Storage Manager client name such as the nodename. We then list the defined clients as shown in Example 12-18. Mount points: Each Tivoli Storage Manager client running on an interface node can start up to eight parallel sessions to the Tivoli Storage Manager server and we start the Tivoli Storage Manager client in parallel on three of the six interface nodes in the cluster, giving us a total of 24 parallel sessions to the Tivoli Storage Manager server. As each session to a sequential Tivoli Storage Manager storage pool, for example, file or tape, requires one mount point, you must configure the proxy target client with a number of mount points equal or greater than 24.
Example 12-18 Create Tivoli Storage Manager client nodes tsm: SLTTSM2>reg n redhum redhum domain=standard maxnummp=24 ANR2060I Node REDHUM registered in policy domain STANDARD. ANR2099I Administrative userid REDHUM defined for OWNER access to node REDHUM. tsm: SLTTSM2>reg n redhum1 redhum1 ANR2060I Node REDHUM1 registered in policy domain STANDARD. ANR2099I Administrative userid REDHUM1 defined for OWNER access to node REDHUM1. tsm: SLTTSM2>reg n redhum2 redhum2 ANR2060I Node REDHUM2 registered in policy domain STANDARD. ANR2099I Administrative userid REDHUM2 defined for OWNER access to node REDHUM2. tsm: SLTTSM2>reg n redhum3 redhum3 ANR2060I Node REDHUM3 registered in policy domain STANDARD. ANR2099I Administrative userid REDHUM3 defined for OWNER access to node REDHUM3. tsm: SLTTSM2>q node Node Name

Platform

Policy Domain Name -------------STANDARD STANDARD STANDARD STANDARD

------------------------REDHUM REDHUM1 REDHUM2 REDHUM3

-------(?) (?) (?) (?)

Days Since Last Access ---------<1 <1 <1 <1

Days Since Password Set ---------<1 <1 <1 <1

Locked?

------No No No No

Chapter 12. Getting started with SONAS

475

We now associate the three Tivoli Storage Manager agent nodes to the redhum target node as shown in Example 12-19.
Example 12-19 Grant Tivoli Storage Manager proxy node tsm: SLTTSM2>grant proxy target=redhum agent=redhum1,redhum2,redhum3 ANR0140I GRANT PROXYNODE: success. Node REDHUM1 is granted proxy authority to node REDHUM. ANR0140I GRANT PROXYNODE: success. Node REDHUM2 is granted proxy authority to node REDHUM. ANR0140I GRANT PROXYNODE: success. Node REDHUM3 is granted proxy authority to node REDHUM.

Now connect to the SONAS CLI and define the Tivoli Storage Manager server configuration information to the Tivoli Storage Manager client by using the cfgtsmnode command as shown in Example 12-20.
Example 12-20 Configure Tivoli Storage Manager server to SONAS [Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum1 redhum int001st001 redhum1 EFSSG0150I The tsm node was configured successfully. [Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum2 redhum int002st001 redhum2 EFSSG0150I The tsm node was configured successfully. [Humboldt.storage.tucson.ibm.com]$ cfgtsmnode slttsm2 9.11.136.30 1500 redhum3 redhum int003st001 redhum3 EFSSG0150I The tsm node was configured successfully.

You can list the Tivoli Storage Manager server configuration with the lstsmnode command as shown in Example 12-21.
Example 12-21 List Tivoli Storage Manager client configuration [Humboldt.storage.tucson.ibm.com]$ lstsmnode Node name Virtual node name TSM server alias int001st001 redhum slttsm2 int002st001 redhum slttsm2 int003st001 redhum slttsm2 int004st001 server_a int005st001 server_a int006st001 server_a TSM server name 9.11.136.30 9.11.136.30 9.11.136.30 node.domain.company.COM node.domain.company.COM node.domain.company.COM TSM node name redhum1 redhum2 redhum3

We are now ready to perform Tivoli Storage Manager backup and restore operations using the cfgbackupfs command as shown in Example 12-22. After configuring the filesystem backup we list configured filesystem backups with the lsbackupfs command.
Example 12-22 Configure and list filesystem backup to Tivoli Storage Manager
[Humboldt.storage.tucson.ibm.com]$ cfgbackupfs tms0 slttsm2 int002st001,int003st001 EFSSG0143I TSM server-file system association successfully added EFSSG0019I The task StartBackupTSM has been successfully created. [Humboldt.storage.tucson.ibm.com]$ lsbackupfs -validate File system TSM server List of nodes Status Start time End time Message Validation Last update tms0 slttsm2 int002st001,int003st001 NOT_STARTED N/A N/A Node is OK.,Node is OK. 4/23/10 4:51 PM

476

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

To start a backup you use the startbackup command and specify a filesystem as shown in Example 12-23. You can then list the backup status with the lsbackupfs command and verify the status.
Example 12-23 Start a TSM backup [Humboldt.storage.tucson.ibm.com]$ startbackup tms0 EFSSG0300I The filesystem tms0 backup started. [Humboldt.storage.tucson.ibm.com]$ lsbackupfs File system TSM server List of nodes Status Start time End time Message Last update tms0 slttsm2 int002st001,int003st001 BACKUP_RUNNING 4/23/10 4:55 PM N/A log:/var/log/cnlog/cnbackup/cnbackup_tms0_20100423165524.log, on host: int002st001 4/23/10 4:55 PM

Chapter 12. Getting started with SONAS

477

478

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

13

Chapter 13.

Hints, tips, and how to information


This chapter contains information we found useful in the development of this book. It includes our hands-on experiences as well as useful information from white papers, developers, and implementers of SONAS.

Copyright IBM Corp. 2010. All rights reserved.

479

13.1 What to do when you receive an EFSSG0026I error message


This section contains information about the EFSSG0026I error message.

13.1.1 EFSSG0026I error: Management service stopped


When you issue a CLI command such as lsfs and receive a message such as the one shown in Figure 13-1: [SONAS]$ lsfs EFSSG0026I Cannot execute commands because Management Service is stopped. Use startmgtsrv to restart the service
Figure 13-1 Management service stopped message

You can proceed to restart the management service with the startmgtsrv command as shown in Figure 13-2: [SONAS]$ startmgtsrv EFSSG0007I Start of management service initiated by cliuser1
Figure 13-2 Starting the management service

13.1.2 Commands to use when management service not running


When the management service is stopped by the CLI, the GUI will not work with some exceptions. Only the following commands can be used when the management service is not running: initnode startmgtsrv

13.2 Debugging SONAS with logs


This section contains useful information you can derive from logs.

13.2.1 CTDB health check


CTDB gives a fair understanding of the health of the cluster. The cluster is healthy if all the nodes have CTDB state as OK. To check the status of CTDB, on command prompt, run the command, # ctdb status This is a root command and not a CLI command. However, as a CLI user on the Management GUI, you can also check the status from the Management GUI by checking the Interface Node details by clicking the link, Interface Node under the Clusters category. The column in the table for the Interface Node shows the status of CTDB.

480

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

CTDB can be unhealthy for many reasons. In case it is monitoring the services and GPFS, it will go to unhealthy state if any of these services is down or have some issues. In case, CTDB is unhealthy, check for the logs. The Management GUI system logs will give certain idea of what is wrong. You can also collect latest logs from all the nodes by running the command: #cndump This command needs root access. It collects all the logs from the nodes and creates a compressed zip file. It will take some time to complete and when done, will also show the path where it is stored. When you uncompress the file, you will see a directory for each node, like Management Node, each Interface Node and each Storage Node. Inside each you will find the directories with log information and more from each node.

13.2.2 GPFS logs


For GPFS logs you can look into: ~/var/mmfs/gen/mmfslog Where ~ is the path to each node collected by cndump command. or login to each node and check in path: /var/mmfs/gen/mmfslog

13.2.3 CTDB logs


You can check the CTDB logs in each node at path: /var/log/messages The logs on the management node will have consolidate logs of each node. You can check for individual Interface and Storage Node by checking for /var/log/messages file on each node or in the directory collected by cndump for each node.

13.2.4 Samba and Winbind logs


You can also check the Samba or Winbind logs in each node at path: /var/log/messages The logs on the management node will have consolidate logs of each node. You can check for individual Interface and Storage Node by checking for /var/log/messages file on each node or in the directory collected by cndump for each node.

Chapter 13. Hints, tips, and how to information

481

13.3 When CTDB goes unhealthy


CTDB manages the health of the cluster. It monitors the services and file systems. There are many reasons why CTDB can go unhealthy. Some can be rectified by you as an administrator of the system. Described next are some of the quick checks that you can make.

13.3.1 How CTDB manages services


Here are some of the configurable parameters for services in CTDB. You can change these values through the GUI. CTDB_MANAGES_VSFTPD CTDB_MANAGES_NFS CTDB_MANAGES_WINBIND CTDB_MANAGES_HTTPD CTDB_MANAGES_SCP CTDB_MANAGES_SAMBA By default, these variables are set to yes. CTDB manages these services in this case. Whenever the services are down, CTDB goes unhealthy. If CTDB has gone unhealthy, check on all the Interface Nodes, if these services are up and running. In case it is not, start the process. If you do not want to monitor CTDB and would like to have some services not running, you might want to turn off this variable. In this case, you will not be notified in future. Turn off this variable at your own risk. Use the GUI to change the parameter. Refer to the Clusters on page 320 for details.

13.3.2 Master file system unmounted


CTDB can go unhealthy, if the master filesystem is down. This means that it can no longer access the reclock file that it uses to hold a lock on. Check for the master filesystem and make sure that it is mounted.

13.3.3 CTDB manages GPFS


If all services are up and running, and also the master filesystem is mounted, you can check for other GPFS file systems that have exports. By default, the CTDB variable, CTDB_MANAGES_GPFS is set to yes. This means, if there is any file system on which there are exports created for users to access, and that file system is not mounted, CTDB goes unhealthy. Check for the mounts by running the command, #mmlsmount all -L This command is a root command and will display mount information of all the filesystems on each node.

482

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In case, you see that any filesystem is unmounted, and there are exports created on the exports, mount the filesystem so that CTDB becomes healthy. You can check for the exports information by running the command, #lsexports For additional information about these CLI commands, see Chapter 10, SONAS administration on page 313. If you have some exports created for testing and do not want to have the filesystem mounted any longer, you might not want to have CTDB monitor the GPFS filesystem. However, remember that anytime a filesystem is unmounted, whether needed or not, CTDB will not notify you by changing its status. Change this value at your own risk.

13.3.4 GPFS unable to mount


CTDB can go unhealthy if any GPFS filesystem is not mounted and there are data exports created on that file system. As mentioned in, 13.3.3, CTDB manages GPFS, you can try and mount the filesystem. But, if GPFS filesystem refuses to mount, CTDB will remain unhealthy. To check why GPFS is not mounting, you might want to check the file system attribute and check the value set for Is DMAPI enabled? or -z option. You can run the command in Example 13-1 to check the value.
Example 13-1 Verify DMAPI value for GPFS

#mmlsfs gpfs1 -z flag value description ---- ---------------- -----------------------------------------------------z yes Is DMAPI enabled? In above example, consider filesystem gpfs1, is not mounting. Here, the value of -z option is set to yes. In this case, the GFPS filesystem is waiting for a DMAPI application and will only mount when it becomes available. If you do not have any DMAPI applications running and do not want GPFS to wait on any DMAPI application, you need to set this -z option to no. This value is set to yes by default and DMAPI is enabled. Remember to create a filesystem with --nodmapi option when you create filesystem using the CLI command mkfs if you do not want to enable it. If already set to yes you can use command mmchfs and change the value for -z option.

Chapter 13. Hints, tips, and how to information

483

484

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Appendix A.

Additional component details


In this appendix we provide more details on the common components, terms, concepts, and products that we refer to throughout this book.

Copyright IBM Corp. 2010. All rights reserved.

485

CTDB
In this section we discuss the various features of Clustered Trivial Data Base (CTDB).

Introduction to Samba
Samba is software that can be run on a platform other than Microsoft Windows, for example, UNIX, Linux, IBM System 390, OpenVMS, and other operating systems. Samba uses the TCP/IP protocol that is installed on the host server. When correctly configured, it allows that host to interact with a Microsoft Windows client or server as if it is a Windows file and print server. Thus, on a single Linux NAS server, Samba provides the mechanism to access data from a Windows client. Samba stores its information in small databases called TDB. Each TDB has metadata information of the POSIX to CIFS semantic and vice versa. The local TDB file also contains the messaging and locking details for files and information about open files that are accessed by many clients. All this is for a single NAS server. For a clustered file system like SONAS, which provides a clustered NAS environment to allow many clients to access data from multiple nodes in the cluster, this becomes a bit tricky. Samba process running on the local nodes does not know about the locking information held by the samba process running locally on the other nodes. Taking an example, say a file, file1 has been accessed through two different nodes by two clients. These two nodes do not know about the locks and hence do now know that each of them have accessed the same file. In this case, if both the nodes write to the file, the file content might be corrupted or last save might be the latest file stored. There was no way to co-ordinate the samba processes (smbd) that run on different nodes. To have consistency in the data access and writes, there must be a way in which the samba processes (smbd) running on each node can communicate with each other and share the information to avoid shared data corruption.

Cluster implementation requirements


A clustered file server ideally has the following properties: All clients can connect to any server which appears as a single large system. Distributed filesystem and all servers can serve out the same set of files. Provide Data integrity. A server can fail and clients are transparently reconnected to another server. All file changes are immediately seen on all servers. Minimize the latency of any checks that might require cross-cluster communication. Ability to scale by adding more servers/disk back-end.

Clustered Trivial Database


To overcome the shortcomings on a traditional Samba and to provide a clustered file server efficiently, the clustered TDB was implemented. The Clustered TDB is a shared TDB approach to distributing locking state. In this approach, all cluster nodes access the same TDB files. CTDB provides the same types of functions as TDB but in a clustered fashion, providing a TDB-style database that spans multiple physical hosts in a cluster. The cluster filesystem takes care of ensuring the TDB contents are consistent across the cluster. The 486
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

prototypes include extensive modifications to Samba internal data representations to make the information stored in various TDBs node-independent. CTDB also provides failover mechanism to ensure data is not lost if any node goes down while serving data. It does this with the use of Virtual IP addresses or Public IP addresses. More on this is explained in detail later. Figure A-1 shows the CTDB implementation.

Figure A-1 CTDB implementation.

As you can see, there is a virtual server which encloses all the nodes as though all the samba processes on each node talk to each other and update each other about the locking and other information held by the samba process.

CTDB architecture
The design is particularly aimed at the temporary databases in Samba, which are the databases that get wiped and re-created each time Samba is started. The most important of those databases are the 'brlock.tdb' for byte range locking database and the 'locking.tdb' for open file database. There are a number of other databases that fall into this class, such as 'connections.tdb' and 'sessionid.tdb', but they are of less concern as they are accessed much less frequently. Samba also uses a number of persistent databases, such as the password database, which must be handled in a different manner from the temporary databases.

Appendix A. Additional component details

487

Here is a list of databases that CTDB uses: account_policy.tdb: NT account policy settings such as pw expiration, etc... brlock.tdb: Byte range locks. connections.tdb: Share connections. Used to enforce max connections, etc. gencache.tdb: Generic caching database. group_mapping.tdb: Stores group mapping information; not used when using LDAP back-end locking.tdb: Stores share mode and oplock information. registry.tdb: Windows registry skeleton (connect via regedit.exe). sessionid.tdb: Session information to support utmp = yes capabilities. We mentioned above that the Clustered TDB is a shared TDB and all nodes access the same TDB files. This means, all these databases are shared by all nodes such that each one of them can access and update the records. This means that these databases must be stored in the shared file system, in this case, GPFS. Hence, each time, a record is to be updated, the smbd daemon on every node will update on the shared database and write to the shared disks. Since the shared disks can be over network, it could make it very slow and be a major bottleneck. To make it simpler, each node of the cluster has CTDB daemon ctdbd running and will have a local, old-style tdb stored in a fast local filesystem. The daemons negotiate only the metadata for the TDBs over the network. The actual data read and writes always happens on the local copy. Ideally this filesystem will be in-memory, such as on a small ramdisk, but a fast local disk will also suffice if that is more administratively convenient. This makes the read write approach really fast. The contents of this database on each node will be a subset of the records in the CTDB (clustered tdb). However, for Persistent database, when a node wants to write to a persistent CTDB, it locks the whole database on the network with a transaction, performs its read and write, commits and finally distributes the changes to all the nodes and write locally too. This way the persistent database is consistent. CTDB records typically looks as shown in Example A-1.
Example A-1 CTDB records

typedef struct { char *dptr; size_t dsize; } TDB_DATA; TDB_DATA key, data; All CTDB operations are finally converted into operations based on these TDB records. Each of these records are augmented with an additional header. The header contains the information in Example A-2 and Figure A-2.
Example A-2 TDB header records

uint64 uint32 uint32 uint32

RSN DMASTER LACCESSOR LACOUNT

(record sequence number) (VNN of data master) (VNN of last accessor) (last accessor count)

488

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure A-2 TDB Header Records

RSN: Record Sequence Number


The RSN is used to identify which of the nodes in the cluster has the most recent copy of a particular record in the database during a recovery after one or more nodes have died. It is incremented whenever a record is updated in a local TDB by the 'DMASTER' node.

DMASTER: Data Master


The DMASTER is the virtual node number of the node that 'owns the data' for a particular record. It is only authoritative on the node which has the highest RSN for a particular record. On other nodes it can be considered a hint only. The node that has the highest RSN for a particular record will also have its VNN (Virtual Node Number) equal to the local DMASTER field of the record, and that no other node will have its VNN equal to the DMASTER field. This allows a node to verify that it is the 'owner' of a particular record by comparing its local copy of the DMASTER field with its VNN. If and only if they are equal then it knows that it is the current owner of that record.

Appendix A. Additional component details

489

LACCESSOR
The LACCESSOR field holds the VNN of the last node to request a copy of the record. Its mainly used to determine if the current data master should hand over ownership of this record to another node.

LACCOUNT
LACOUNT field holds a count of the number of consecutive requests by that node.

LMASTER: Location Master


In addition to the above, each record is also associated with LMASTER (location master). This is the VNN of the node for each record that will be referred to when a node wants to contact the current DMASTER for a record. The LMASTER for a particular record is determined solely by the number of virtual nodes in the cluster and the key for the record.

RECOVERY MASTER
When a node fails, CTDB performs a process called recovery to re-establish a proper state. The recovery is carried through by the node that holds the role of the RECOVERY MASTER. It collects the most recent copy of all records from the other nodes. Only one node can become the RECOVERY MASTER and this is determined by an election process. This process involves a lock file, called the recovery lock or reclock that is placed in the MASTER file system of the clustered file system. At the end of the election, the newly nominated recovery master holds an lock on the recovery lock file. The RECOVERY MASTER node is also responsible of monitoring the consistency of the cluster and to perform the actual recovery process when required. You can check for the reclock path by using the command shown in Example 13-2. In this example, /ibm/gpfs0 is the MASTER filesystem.
Example 13-2 Checking for the reclock path

# ctdb getreclock Example output: Reclock file:/ibm/gpfs0/.ctdb/shared

How CTDB works to synchronize access to data


The key insight is that one node does not need to know all records of a database. Most of the time, it is sufficient when a node has an up to date copy of the records that affect its own client connections. Even more importantly, when a node goes down, it is acceptable to lose those data that are just about the client connections on that node. Therefore, for a normal TDB, a node only has those records in its local TDB that it has already accessed. Data is not automatically propagated to other nodes and just transferred upon request. When a node in the cluster wants to update a record, the ctdb daemon tries to find out who the DMASTER is, for a record. DMASTER is the node that owned record. To get the VNN for DMASTER, it contacts the LMASTER, which will reply with the VNN of the DMASTER. The requesting node then contacts that DMASTER, but must be prepared to receive a further redirect, because the value for the DMASTER held by the LMASTER could have changed by the time the node sends its message. This step of returning a DMASTER reply from the LMASTER is skipped when the LMASTER also happens to be the DMASTER for a record. In that case the LMASTER can send a reply to the requesters query directly, skipping the redirect stage. 490
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The dispatcher daemon will listen for CTDB protocol requests from other nodes, and from the local smbd via a unix domain datagram socket.The dispatcher daemon follows an event driven approach, executing operations asynchronously. This Figure A-3 explains the working mentioned above when DMASTER is as mentioned as LMASTER.

Figure A-3 Fetching sequence for CTDB and contacting DMASTER as directed by LMASTER

Figure A-4 explains when the DMASTER changes and there is another request made to get the VNN of new DMASTER.

Figure A-4 Fetching when DMASTER changes

Appendix A. Additional component details

491

Figure A-5 also shows working of the dispatcher daemon. When the node wants to write or read data, it gets the VNN of the current DMASTER for the record. It them contacts the dispatcher on the node corresponding to the VNN that is listening to CTDB requests from other nodes, gets the updated copy for the record on its node and updates it locally.

Figure A-5 Clustered Samba: dispatcher daemon

At the time of node failure, the LMASTER gives the VNN of the record that was last updated. If the node that has the latest information is the node that fails, it is OK to loose this information as it is only connection information for files. For persistent database, the information is always available on all nodes and is the up to date copy.

Providing high availability for node failure


It is highly essential for a filesystem to provide High Availability feature so that data access by the end users are not disrupted upon a node failure. Many application like Banking, Security, Satellite applications need continuos and real time access to data and cannot afford to have any interruptions. Hence, High availability is absolutely necessary. Broadly, there are two kinds of systems: Active-Passive systems and Active-Active systems. Active-Passive Failover System: In these systems, one of the servers is always down and is like the backup server. All data access happens through the Active server. When the server goes down, the backup server is brought up and starts to service requests. In this case, the data connections break since there is a time lag between the active server going down and backup server coming up. This is how the traditional NAS system works. Figure A-6 below shows an Active-Passive failover system where Node2 is always down when Node1 is active and servicing requests. When Node1 fails, Node2 becomes active and services the requests.

492

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure A-6 Active-Passive failover systems

Active-Active Systems: In these systems, all the nodes in the cluster are active. When a node fails, the other nodes take over. The service requests of the failed nodes are transferred to the other and they immediately start servicing requests. The application might see a slight pause in data transfer but as long as the application can handle failed TCP connections and start again, the data transfer does not fail and happens uninterruptedly. Figure A-7 shows the Active-Active failover systems where it can be seen that when Node1 fails, all the requests are passed on to Node2 which is always active. It is done transparent to the users and hence data transfer does not stop as long as the applications can failover a TCP connection.

Figure A-7 Active-Active Failover system.

With the help CTDB, SONAS provides with this feature of Node Failure.

CTDB features
CTDB uniquely identifies each of the nodes in the cluster by the Virtual Node number, VNN. It maps the physical addresses with the VNN. Also, CTDB works with two IP networks. The internal network on the InfiniBand for the CTDB communication between the nodes. This is same as that for the clusters internal network for communication between the nodes. The second type is the public addresses through which the clients access the nodes for data. You can check for the public IPs set for the nodes by running the command in Example A-3 on each node.

Appendix A. Additional component details

493

Example A-3 Public IP list

# ctdb ip Example output: Number of addresses:4 12.1.1.1 0 12.1.1.2 1 12.1.1.3 2 12.1.1.4 3 The configuration of the CTDB is stored in /etc/sysconfig/ctdb on all nodes. The node details which carries the list of all the IP addresses of the nodes in the CTDB cluster is stored in the /etc/ctdb/nodes file. These are the private IP addresses of the nodes. The public addresses of the clustered system is stored in /etc/ctdb/public_addresses file. These addresses are not physically attached to a specific node and is managed by CTDB. It is attached/detached to a physical node at runtime. Each node needs to specify the public_addresses that it services in the /etc/ctdb/public_addresses file. For example, if a cluster has six nodes and six IP addresses, each node should specify all of the six IP addresses in order to be able to service any one at any point of time in case of a failure. If certain IP address is not mentioned, that IP will not be serviced by this node. Hence, its a good practice to specify all the Public IP addresses on each node, so each node can failover for the IP if required. Even though a node has all the public IP specified, CTDB assigns a unique IP address for it to service. This means, for example, if we have a six node cluster and six public IP addresses, then each node can hold all the six IP addresses. However, CTDB will assign just unique IP addresses to each node such that at any point in time, a single IP address is serviced only by a single node. As another example, consider a six node cluster with twelve IP addresses. In this case, each node can take any of the twelve IP addresses, but CTDB will assign two IP addresses that it will service, which is unique while each node. CTDB uses round robin to assign IP addresses to the nodes. CTDB makes a table of all the VNN number and assigns or maps each VNN with an IP address in a round robin way. When a node fails, the CTDB remakes this table of IP addresses mapping for each node. It considers all the nodes whether or not it is down. It assigns each node again with IP addresses in a round robin way. Once this is done, the CTDB then picks the IP addresses assigned to the node that has failed. It now counts the number of IP address each node is servicing and redistributes the IP addresses to the nodes that have the least IP addresses. In case, all are equal, it uses round robin mechanism.

CTDB Node recovery mechanism


Following are some of the steps broadly done by the CTDB on a Node recovery: Freeze the cluster. Verify database state. Pull and merge records based on RSN. Push updated records. Cleanup databases. Assign IP takeover nodes. Build and distribute new LMASTER mapping. Create and distribute new GENERATION NUMBER. Set recovery node as the DMASTER for all the records.

494

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

IP failover mechanism
When a node goes leaves the cluster, CTDB moves its public IP addresses to other nodes that have the addresses listed in their public addresses pool. But now the clients connected to that node have to reconnect to the cluster. In order to reduce the nasty delays that come with these IP switches to a minimum, CTDB makes use of a clever trick called tickle-ACK. How it works is, the client does not know that the IP he/she is connected to has moved, while the new CTDB node only knows the TCP connection has become invalid, but does not know the TCP sequence number. So the new CTDB node sends an invalid TCP packet with sequence and ACK number set to zero. This tickles the client to send a valid ACK packet back to the new node. Now CTDB can validly close the connection by sending a RST packet and force the client to reestablish the connection.

How CTDB manages the cluster


CTDB manages the cluster by monitoring the services and the health of the cluster. CTDB has some configurable parameters like CTDB_MANAGES_GPFS, CTDB_MANAGES_FTP, CTDB_MANAGES_NFS and more which when set to true, manages these services. By default these variables are set to yes. CTDB then manages these services and in case the services on any node is down, CTDB goes to unhealthy state. In case you set this to no, CTDB does not manage it any more and will remain healthy even if the service is down. If you do not want to monitor any service, you can set these variables to no. In case of the SONAS appliance, the Management GUI provides with mechanism to modify these configurable parameters. You can find all the configurable parameters in the /etc/sysconfig/ctdb file. CTDB status displays the status of each node. Node status reflects the current status of the node. There are five possible states: OK - This node is fully functional. DISCONNECTED - This node could not be connected through the network and is currently not participating in the cluster. If there is a public IP address associated with this node it should have been taken over by a different node. No services are running on this node. DISABLED - This node has been administratively disabled. This node is still functional and participates in the CTDB cluster but its IP addresses have been taken over by a different node and no services are currently being hosted. UNHEALTHY - A service provided by this node is malfunctioning and should be investigated. The CTDB daemon itself is operational and participates in the cluster. Its public IP address has been taken over by a different node and no services are currently being hosted. All unhealthy nodes should be investigated and require an administrative action to rectify. BANNED - This node failed too many recovery attempts and has been banned from participating in the cluster for a period of RecoveryBanPeriod seconds. Any public IP address has been taken over by other nodes. This node does not provide any services. All banned nodes should be investigated and require an administrative action to rectify. This node does not participate in the CTDB cluster but can still be communicated with. that is, ctdb commands can be sent to it. STOPPED - A node that is stopped does not host any public ip addresses, nor is it part of the VNNMAP. A stopped node cannot become LVSMASTER, RECMASTER or NATGW.

Appendix A. Additional component details

495

This node does not participate in the CTDB cluster but can still be communicated with. that is, ctdb commands can be sent to it. You can check the status using the command: # ctdb status

CTDB tunables
CTDB has a lot of tunables that can be modified. However, this is rarely necessary. You can check the variables by running the command shown in Example A-4.
Example A-4 Checking CTDB databases

# ctdb listvars Example output: MaxRedirectCount = 3 SeqnumInterval = 1000 ControlTimeout = 60 TraverseTimeout = 20 KeepaliveInterval = 5 KeepaliveLimit = 5 MaxLACount = 7 RecoverTimeout = 20 RecoverInterval = 1 ElectionTimeout = 3 TakeoverTimeout = 5 MonitorInterval = 15 TickleUpdateInterval = 20 EventScriptTimeout = 30 EventScriptBanCount = 10 EventScriptUnhealthyOnTimeout = 0 RecoveryGracePeriod = 120 RecoveryBanPeriod = 300 DatabaseHashSize = 10000 DatabaseMaxDead = 5 RerecoveryTimeout = 10 EnableBans = 1 DeterministicIPs = 1 DisableWhenUnhealthy = 0 ReclockPingPeriod = 60

CTDB databases
You can lists all clustered TDB databases that the CTDB daemon has attached to. Some databases are flagged as PERSISTENT, this means that the database stores data persistently and the data will remain across reboots. One example of such a database is secrets.tdb where information about how the cluster was joined to the domain is stored.

496

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

You can check the databases available by running the command in Example A-5.
Example A-5 Checking CTDB databases

# ctdb getdbmap Example output: Number of databases:10 dbid:0x435d3410 name:notify.tdb path:/var/ctdb/notify.tdb.0 dbid:0x42fe72c5 name:locking.tdb path:/var/ctdb/locking.tdb.0 dbid:0x1421fb78 name:brlock.tdb path:/var/ctdb/brlock.tdb.0 dbid:0x17055d90 name:connections.tdb path:/var/ctdb/connections.tdb.0 dbid:0xc0bdde6a name:sessionid.tdb path:/var/ctdb/sessionid.tdb.0 dbid:0x122224da name:test.tdb path:/var/ctdb/test.tdb.0 dbid:0x2672a57f name:idmap2.tdb path:/var/ctdb/persistent/idmap2.tdb.0 PERSISTENT dbid:0xb775fff6 name:secrets.tdb path:/var/ctdb/persistent/secrets.tdb.0 PERSISTENT dbid:0xe98e08b6 name:group_mapping.tdb path:/var/ctdb/persistent/group_mapping.tdb.0 PERSISTENT dbid:0x7bbbd26c name:passdb.tdb path:/var/ctdb/persistent/passdb.tdb.0 PERSISTENT You can also check the details of a database by running the command in Example A-6.
Example A-6 CTDB database status

# ctdb getdbstatus <dbname> Example: ctdb getdbstatus test.tdb.0 Example output: dbid: 0x122224da name: test.tdb path: /var/ctdb/test.tdb.0 PERSISTENT: no HEALTH: OK You can get more information about CTDB by running manpage for CTDB as follows: #man ctdb

File system concepts and access permissions


File systems are a way of organizing and storing files where files are named sequences of bytes. A file system arranges the named files into a structure such as a unix tree hierarchy to facilitate location, access and retrieval of individual files by the operating system. File systems generally store data on an underlying storage device such as disk or tape in blocks or clusters of a defined size. Files are named and the name is used by the users to locate and access the files. Files can be organized in directory structures with subdirectories to facilitate the organization of data. Other than the actual file data, a file can contain associated metadata that contain attributes such as last update time, the type such as file or directory and attributes that control access such as user, group and access permissions or access rights that control what use can be made of the file, such as execute it or read only. Filesystems offer functions to create, access, move and delete files and directories. File systems might also offer hierarchies between storage devices and offer quota mechanisms to control the amount of space used by users and groups.

Appendix A. Additional component details

497

Permissions and access control lists


The implementation of permissions and access rights differs between file systems. UNIX and POSIX file systems support traditional unix permissions and also generally support POSIX.1e or NFSv4 access control lists.

Traditional UNIX permissions


Permissions or access rights control the access of users to files in a file system. In UNIX file systems permissions are grouped into three classes: users, group and other. The files in a filesystem are owned by a user. This user or owner defines the files owner class. Files are also assigned a group which define the group class, and it can have different permissions from the user. The owner could also not be part of the file group but belong to a different group. There are three types of permissions for each class: read That permits read to a file. When set on a directory it allows to list the contents of the directory but not to read the contents or attributes of individual files That permits writes and modification to a file or directory. Write also allows file creation and deletion and rename. That permits the user to run an executable file. Execute on a directory allows the execution of files in that directory but does not enable listing or viewing them.

write execute

If a permission is not set, the access it would allow is denied. Permissions are not inherited from the upper level directory.

Access control lists


An access control list (ACL) is a list of permissions associated with a given file. The ACL controls what users and groups in the system are allowed access to a given file. Each entry in an access control list is called an access control element (ACE) and the ACE contains two parts: a user or group that is the subject of the authorization operation and an operation that can be performed on the file such as execute or delete. Windows uses an ACL model that differs considerably from the POSIX model and mapping techniques between the two ar not completely satisfactory, so mapping between Windows and Posix ACL should be avoided if possible. NFSv4 introduces an ACL model that is similar to the Windows ACL model and so simplifies mapping between the two models. IBM GPFS and SONAS implement NFSv4 ACLs natively.

Permissions and ACLs in Windows operating systems


There have been two main progressions of file and folder permissions on Windows operating systems: DOS attributes and NTFS security.

DOS attributes
There are four DOS attributes which can be assigned to files and folders. Read Only Archive File cannot be written to File has been touched since the last backup

498

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

System Hidden Lost

File is used by the operating system File is relatively invisible to the user File is gone

These attributes apply to FAT, FAT32 and NTFS file systems.

NTFS security
There are 13 basic permissions which are rolled up into six permission groups. These apply only to the NTFS file system, not FAT nor FAT32. The six permission groups are: Full Control Modify Read Write Read and execute List folder contents Allow all 13 basic permissions Allow all permissions except Delete subfolders and files, Change permission, and Take ownership Allow List folder/Read data, Read attributes, Read extended attributes and Read permissions Allow Create files/Append data, Write attributes, Write extended attributes, Delete subfolders and files, and Read Permissions Allow all that the Read permission group allows plus Traverse Folder/Execute File This is for folders only, not files. It is the same as Read and Execute for files

The 13 basic permissions are the following; some of them differ depending on whether they apply to folders or files: Traverse folders (for folders only)/Execute file (for files only) List folder/Read data Read attributes Read extended attributes Create files/Append data Write attributes Write extended attributes Delete subfolders and files Delete Read permissions Change permissions Take ownership To view the permission groups, right-click any file or folder in Windows explorer, choose the Properties menu item and then choose the Security tab. More information about Windows file and folder permissions is available on the Microsoft technet site at: http://technet.microsoft.com/en-us/library/bb727008.aspx

GPFS overview
Smart Storage Management with IBM General Parallel File System. Enterprise file data is often touched by multiple processes, applications and users throughout the lifecycle of the data. Managing the data workflow is often the highest cost of part storage processing and management, in regards to processing and people time. In the past, companies have addressed this challenge using different approaches including clustered servers and network attached storage. Clustered servers are typically limited in scalability and often require redundant copies of data. Traditional network attached storage solutions are restricted in performance, security and scalability.

Appendix A. Additional component details

499

To effectively address these issues, you need to look at a new more effective data management approach. Figure A-8 describes a typical infrastructure with unstructured data. This is a data storage approach, but not data management.

Figure A-8 Unstructured data

500

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

In Figure A-9, GPFS provides a real data management solution with the following capabilities: File management Performance Enhanced availability Better automation Scale-out growth Indeed because GPFS allows you to bring together islands of information, redundant data and under-utilized segment of storage it provides a strong File Management solution. GPFS is also a solution which is able to scale and integrate emerging technologies, providing you both performance and security regarding your storage investment. By design GPFS is an enhanced availability solution, ensuring data consistency through varied mechanism. These mechanisms can even be easily automated thanks to powerful ILM tools integrated inside GPFS.

Figure A-9 Structured data

To fulfill above capabilities, GPFS provides you a single global namespace with a centralized management. This allows better Storage utilization and performances for varied workloads as described in Figure A-10. Indeed both database application, archive or application workload can use the single global namespace provided by GPFS. GPFS will automatically handle all your storage subsystems ensuring a homogenous storage utilization.

Appendix A. Additional component details

501

Figure A-10 GPFS features

GPFS architecture
Figure A-11 describes the GPFS architecture. Basically a typical GPFS utilization is to run your daily business application on NSD Clients (or GPFS clients). These clients will access the same global name space through a LAN. Data accessing from clients will then been transferred to NSD servers (or GPFS servers) though the LAN. NSD clients and NSD servers are gather in a GPFS cluster. Latest GPFS version (3.3) supports AIX, Linux and Windows as NSD clients or NSD servers. These Operating system can run on many IBM and even non IBM hardware. Regarding the LAN, GPFS can use GigE Network as well as 10GigE or InfiniBand networks. Then servers will commit IO operations on storage subsystem where are physically located LUNs. From a GPFS point of view a LUN is actually a NSD. GPFS supports varied Storage Subsystems IBM and non even non IBM. Basically as the IBM SAN Volume Controller solution is also supported by GPFS, several Storage Subsystem solutions are de facto compatible with GPFS. To find more details regarding the software or hardware supported version refers to the following link: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm.cl uster.gpfs.doc/gpfs_faqs/gpfsclustersfaq.html

502

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Before describing in more details GPFS features and mechanism, it is important to specify here that like any File System, GPFS handle data and metadata, but even if there are two kinds of nodes (clients and server) inside a single GPFS cluster, there are no nodes dedicated to metadata management. Like all mechanism inside GPFS every nodes inside the cluster can be used to execute this mechanism. Basically some GPFS mechanisms are run in parallel from all nodes inside the cluster like the File System scanning for instance (see Figure A-11).

Figure A-11 GPFS architecture

GPFS file management


We describe here some GPFS mechanism which will bring you a strong File management. As described above inside a single GPFS cluster you have NSD clients and NSD servers. The main difference between these roles is that NSD servers are physically connected to Storage Subsystem where LUNs are located. From any nodes inside the GPFS cluster you can create one or many GPFS File Systems on top of these NSD. During NSD creation, the operation which consist on allocating LUNs to the GPFS layer, you can choose to host both data and metadata on the same NSD, or split them and then use some LUNs to host metadata or data only. As GPFS is a scalable solution, you will probably attach more than a single Storage Subsystem to your SAN and then your NSD servers. Assuming different technology or drives technology you can choose to host metadata on SAS or SSD drives; and data on SATA drives for example. This in order to increase metadata performance. Still inside the same GPFS file system, you can decide to create different failure groups. As for the metadata/data split you can specify the failure group during the NSD creation or change it later. Then during file system creation, you can specify GPFS that you want a replication of your metadata and/or data. This can also been changed after creation. GPFS will then automatically replicate your metadata and/or data to your failure groups. Obviously as it is replication you need twice the required capacity to replicate all data.

Appendix A. Additional component details

503

Another option would be to create some Storage Pools. Still assuming you have several storage subsystem in your SAN, and then accessing by your NSD servers, you can decide to create multiple Storage Pools. One SATA storage pool, one SAS Storage pool or a Tape storage pool, the same way you can also decide to create an IBM DS8300 storage pool and an IBM DS5300 storage pool, or even an IBM storage Pool and X Storage Pool. Here again you can decide it during NSD creation, or change it later. Then you can use these storage pool for different workload. For instance use SATA pool for multimedia files, use SAS pool for financial workload, and tape pool for archive. Or you can use SAS pool for daily business, then move file to SATA pool at the end of the week, and later to Tape Pool. Whereas failure groups were automatically handled by GPFS, Storage Pool mechanism needs some rules to be automated by GPFS. Indeed with Information Life cycle Management tool provided by GPFS, you can create some rules which will then be part of a policy. Basics rules are placement rules: place multimedia files on SATA pool, and financial workload files on SAS pool, or migration rules: move data from SAS pool to SATA pool at the end of the week, and move data from SATA to Tape pool at the end of the month. These rules can be gathered inside GPFS policies. This policies can then be automated and scheduled. You can also use more complex rules and policies to run a command at any given time on the entire GPFS File System on a subset of files like: delete all files older than two years, or move all files from the projectA directory to tape. In order to compare with classical UNIX commands, migration rules would be an mv command, whereas the last one would be more like a find command combined with an exec command. The ILM tool can be used for each GPFS File System, but you can also create some GPFS FileSets inside a single GPFS File System for a finer granularity, and then apply policy or quota rules to this File Set which are basically directory or GPFS sub trees.

GPFS performance
GPFS is not only a centralized management solution providing a global namespace, indeed GPFS has been design to scale according to your need in term of capacity but also to provide an aggregate bandwidth if set up appropriately. As explained above a typical use of GPFS is to run daily business application on NSD clients which are accessing data through the network. Note that depending your requirement, you might have only NSD servers and no NSD clients. With such a configuration you will run your application directly from NSD servers which also have access to the global namespace. Assuming you have NSD clients running your application on the GPFS File System, this file system has been created with a key parameters: the GPFS BlockSize. In few words the equivalent of the GPFS BlockSize for a NSD servers, is the chunk size or segment size for a RAID controller. This block size can be set from 16KB to 4MB. Assuming a GPFS cluster with some NSD clients, four NSD servers and one storage subsystem with four RAID arrays. The GPFS File System has been created with a 1MB Block Size. From the storage subsystem point of view, all four arrays have been configured in a RAID5 configuration with a 256KB segment size. Your application is running on NSD clients and generate a 4MB IO. These 4MB packets will be sent through the network in 1MB piece to NSD servers. NSD servers will then forward the 1MB packets to Storage Subsystem controller which will split these into 256KB pieces (segment size). This leads to a single 4MB IO written in a single IO operation on disk level as described in Figure A-12 on page 505, Figure A-13 on page 505, Figure A-14 on page 506 and Figure A-15 on page 506. In Figure A-12, Figure A-13, Figure A-14, and Figure A-15, each NSD is a RAID 5 array built with four data disks and an extra parity one. Performing any IO operation on an NSD is equivalent to performing I/O operations on physical disks inside the RAID.

504

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure A-12 Step 1, application is generating IO

Figure A-13 Step 2 data sent to NSD servers

Appendix A. Additional component details

505

Figure A-14 Step 3 data sent to Storage Subsystem

Figure A-15 Step 4 GPFS block size chop into segment size piece by the controller

506

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

The foregoing figures describe the GPFS function with few NSD and a single Storage Subsystem, but this is exactly the same for a larger configuration. All NSD clients will run application on the GPFS File System in parallel. As GPFS has been designed to scale with your storage infrastructure, if you add more Storage Subsystems and NSD servers, you will increase your overall bandwidth

GPFS High Availability solution


GPFS provides you great performance and an efficient centralized file management tools. But this is also a great high availability solution. From the NSD clients side, as GPFS provides a single domain namespace, you can access data from any nodes inside the GPFS cluster, from all NSD clients but also NSD servers. Moreover if you have a network layer fully redundant you can assume that there is no point of failure from the NSD clients side. From the NSD servers side you also have a high availability solution. As shown previously, NSD clients are sending packets to NSD servers in parallel. Actually each block is sent to each NSD in a round robin way. But each NSD has one NSD server as primary server, and a list of NSD servers as backup. This means that from the NSD clients point of view, sending data to NSD in a round robin way is equivalent to sending data to primary NSD servers in a round robin way. But as each NSD has a list of backups, if a network or a Fibre Channel connection (in case of a SAN) is broken, or even a NSD server failure, the NSD client will send data to back up NSD servers. You can afford to loose several NSD servers and still access data on the GPFS File System. Depending on your GPFS configuration and architecture, you can still access data as long as one NSD server is up and running. GPFS mechanisms ensure your data integrity once all NSD serves are back again and ready to be use again. From the Storage side, only RAID configuration can guarantee the high availability of your data, but you can also use GPFS features to replicate your data for better security as described in GPFS file management on page 503. You can even replicate synchronously your entire GPFS File System between two distant sites. As any file system solution, GPFS also provides a snapshot function with a maximum of 256 snapshots per file system. These snapshot are readable only, and are by definition instantaneous. GPFS snapshot feature uses the copy on write method on a block level, which means that the original version of a block (GPFS Block Size) is copied anywhere else while the new version of the block is updated. The snapshot is then pointing to the new position of the blocks in the allocation map table. Even if there is no metadata server concept in GPFS, there are however some key roles between GPFS nodes inside the cluster. These roles are required to ensure data integrity. GPFS special roles are: The GPFS cluster manager The file system manager The metanode The GPFS cluster manager is responsible for granting disks leases, detecting disks failure or selecting the file system manager for instance. The file system manager is in charge of several roles as the file system configuration (changes in configuration for example) or the management of disk space allocation (for efficient parallel allocation of space), token management (for file locking mechanism) or quota management (if configured). The two previous ones are unique inside a single file system whereas there are as many metanode as there are open files. Indeed the metanode is the node inside the GPFS cluster responsible of the metadata for a given open file.

Appendix A. Additional component details

507

GPFS failure group


A failure group is a group of disks that have a common point of failure. GPFS by default will assign a failure group at the node level for each disk in the GPFS file system. The reason for this is that the node is seen as a single point of failure. You will be able to assign a number to each failure group. Another type of failure group that is seen from more than a single nodes point of view is a Virtual Shared Disk that is twin-tailed and available from two nodes over an IBM Recoverable Shared Disk. Although there are two nodes, this represents a single failure group. The default number that GPFS assigns to a failure group is the node number plus 4000. If you decide that a disk should ignore failure group consideration, then you can assign it a value of -1. This generally indicates that a disk has no point of common failure with any other disk. You will be able to assign a value from -1 to 4000 for your failure groups.

Other GPFS features


We have described the GPFS utilization basics, but there are several other features you can use with GPFS. These features can be as simple as a GUI for management purposes or as complex as multi cross clusters.

Features
Among these interesting features, you have the following possibilities: The GUI which is included in GPFS packages if you are more familiar with GUI than CLI. The Cluster NFS (CNFS) feature, which allows you to use some nodes inside the GPFS cluster as NFS clients, and then access the GPFS File System from other nodes not in the GPFS clutter using the NFS protocol. You can even load balance to access through many NFS servers with appropriate DNS configuration. Similarly, GPFS also supports NFSv4 ACLs and Samba and then allows Windows and UNIX users to share data. The HSM compatibility; indeed, you can use GPFS in combination with HSM for better tiering usage inside your Storage Infrastructure. The cross cluster feature, which allows you in case of a multi site data center, to grant access to NSD clients from the remote GPFS cluster site to the local GPFS File System (and the opposite direction).

Documentation
For any more detailed documentation on GPFS refer to the IBM website: http://www-03.ibm.com/systems/software/gpfs/index.html or the online GPFS documentation: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm .cluster.gpfs.doc/gpfsbooks.html or GPFS wiki: http://www.ibm.com/developerworks/wikis/display/hpccentral/General+Parallel+Fil e+System+%28GPFS%29

508

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Tivoli Storage Manager overview


In this section we illustrate the Tivoli Storage Manager software product and how it relates to a SONAS environment. Tivoli Storage Manager is the cornerstone product on which IBM bases data protection and storage management. We explain how Tivoli Storage Manager provides an abstraction or virtualization.The following topics are covered: Tivoli Storage Manager concepts and architecture Tivoli Storage Manager backup archive client Tivoli Storage Manager HSM client Tivoli Storage Manager generic NDMP support For additional information, see the Redbooks publications, IBM Tivoli Storage Management Concepts, SG24-4877 and Tivoli Storage Manager V6.1 Technical Guide, SG24-7718.

Tivoli Storage Manager concepts


IBM Tivoli Storage Manager provides a comprehensive solution focused on the key data protection and management activities of backup, archive, recovery, space management, and disaster recovery. Tivoli Storage Manager allows you to separate the backup, archiving, and retention of data from storage-related aspects of the data, in addition to many other services. Tivoli Storage Manager offers many data protection and storage management functions relevant to SONAS: Data backup with Tivoli Storage Manager Progressive backup: Progressive backup for file systems eliminates the need for redundant, full backups again. This allows you to back up less data each night saving network, server, and storage resources than with other, traditional open systems products. Backups will finish quicker and restores will be faster because the restore paradigm of full + incremental is not required with Tivoli Storage Manager. Backups finish quicker because less file data needs to be moved using Tivoli Storage Manager. Tivoli Storage Manager always has a full backup available in the Tivoli Storage Manager storage repository. Data archiving defines how to insert data into the data retention system. Tivoli Storage Manager offers a command line interface to archive and back up files and a C language application programming interface (API) for use by content management applications. Storage defines on which storage device to put the object. Tivoli Storage Manager supports hundreds of disk and tape storage devices and integrated HSM of stored data. You can choose the most effective storage device for your requirements and subsequently let the data automatically migrate to different storage tiers. WORM functionality is offered by System Storage Archive Manager. The Tivoli Storage Manager administrator cannot accidentally or intentionally delete objects stored in Tivoli Storage Manager. Storage management services are provided by Tivoli Storage Manager. These additional storage management services facilitate hardware replacement and disaster recovery. Tivoli Storage Manager allows for easy migration to new storage devices when the old storage devices need replacing, and this will likely happen when data is retained for long periods of time. Tivoli Storage Manager also offers functions to make multiple copies of archived data. Tivoli Storage Manager offers a strong and comprehensive set of functions that you can exploit to effectively manage archived data. You can consider Tivoli Storage Manager an abstraction or virtualization layer between applications requiring data retention or storage management services and the underlying storage infrastructure.

Appendix A. Additional component details

509

Tivoli Storage Manager architectural overview


Tivoli Storage Manager is a client server software application that provides services such as network backup and archive of data to a central server. There are two main functional components in a Tivoli Storage Manager environment: You install the Tivoli Storage Manager client component on servers, computers, or machines that require Tivoli Storage Manager services. The Tivoli Storage Manager client accesses the data to be backed up or archived and is responsible for sending the data to the server. The Tivoli Storage Manager server is the central repository for storing and managing the data received from the Tivoli Storage Manager clients. The server receives the data from the client over the LAN network, inventories the data in its own database, and stores it on storage media according to predefined policies. Figure A-16 illustrates the components of a Tivoli Storage Manager environment. You can see that the core component is the Tivoli Storage Manager server.

Figure A-16 Tivoli Storage Manager components: architectural overview

We review and discuss the main components and functions of a Tivoli Storage Manager environment, emphasizing the components that are most relevant to an ILM-optimized environment. These components are: Tivoli Storage Manager server Administrative interfaces The server database Storage media management Data management policies Security concepts Backup Archive client interface Client application programming interface (API) Automation The client to server data path Tip: For a detailed overview of Tivoli Storage Manager and its complementary products, see the IBM Tivoli software information center at the following location: http://publib.boulder.ibm.com/infocenter/tivihelp 510
IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Tivoli Storage Manager server


The Tivoli Storage Manager server consists of a run-time environment and a DB2 relational database. You can install the server on several operating systems and on diverse hardware platforms, covering all popular environments. The DB2 database with its recovery log stores all the information about the current environment and the managed data. The Tivoli Storage Manager server listens for and communicates with the client systems over the LAN network.

Administrative interfaces
For the central administration of one or more Tivoli Storage Manager server instances, as well as the whole data management environment, Tivoli Storage Manager provides command line or Java-based graphical administrative interfaces, otherwise known as administration clients. The administrative interface enables administrators to control and monitor server activities, define management policies for clients, and set up schedules to provide services to clients at regular intervals.

Server database
The Tivoli Storage Manager server database is based on a standard DB2 database that is integrated into and installed with the Tivoli Storage Manager server itself. The Tivoli Storage Manager server DB2 database stores all information relative to the Tivoli Storage Manager environment, such as the client nodes that access the server, storage devices, and policies. The Tivoli Storage Manager database contains one entry for each object stored in the Tivoli Storage Manager server, and the entry contains information, such as: Name of the object Tivoli Storage Manager client that sent the object Policy information or Tivoli Storage Manager management class associated with the object Location where the object is stored in the storage hierarchy The Tivoli Storage Manager database retains information called metadata, which means data that describes data. The flexibility of the Tivoli Storage Manager database enables you to define storage management policies around business needs for individual clients or groups of clients. You can assign client data attributes, such as the storage destination, number of versions, and retention period at the individual file level and store them in the database. The Tivoli Storage Manager database also ensures reliable storage management processes. To maintain data integrity, the database uses a recovery log to roll back any changes made if a storage transaction is interrupted before it completes. This is known as a two-phase commit.

Storage media management


Tivoli Storage Manager performs multiple diverse hierarchy and storage media management functions by moving or copying data between different pools or tiers of storage, as shown in Figure A-17.

Appendix A. Additional component details

511

Figure A-17 Tivoli Storage Manager management of the storage hierarch

A Tivoli Storage Manager server can write data to more than 400 types of devices, including hard disk drives, disk arrays and subsystems, standalone tape drives, tape libraries, and other forms of random and sequential-access storage. The server uses media grouped into storage pools. You can connect the storage devices directly to the server through SCSI, through directly attached Fibre Channel, or over a Storage Area Network (SAN). Tivoli Storage Manager provides sophisticated media management capabilities that enable IT managers to perform the following tasks: Track multiple versions of files (including the most recent version) Respond to online file queries and recovery requests Move files automatically to the most cost-effective storage media Expire backup files that are no longer necessary Recycle partially filled volumes Tivoli Storage Manager provides these capabilities for all backup volumes, including on-site volumes inside tape libraries, volumes that have been checked out of tape libraries, and on-site and off-site copies of the backups. Tivoli Storage Manager provides a powerful media management facility to create multiple copies of all client data stored on the Tivoli Storage Manager server. Enterprises can use this facility to back up primary client data to two copy pools: One stored in an off-site location, and the other kept on-site for possible recovery from media failures. If a file in a primary pool is damaged or resides on a damaged volume, Tivoli Storage Manager automatically accesses the file from an on-site copy if it is available or indicates which volume needs to be returned from an off-site copy. Tivoli Storage Manager also provides a unique capability for reclaiming expired space on off-site volumes without requiring the off-site volumes to be brought back on-site. Tivoli Storage Manager tracks the utilization of off-site volumes just as it does for on-site volumes.

512

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

When the free space of off-site volumes reaches a determined reclamation threshold, Tivoli Storage Manager uses the on-site volumes to consolidate the valid files onto new volumes, then directs the new volumes to be taken off-site. When the new tapes arrive off-site, Tivoli Storage Manager requests the return of the original off-site volumes, which can be reused as scratch volumes.

Data management policies


A data storage management environment consists of three basic types of resources: client systems, rules, and data. The client systems contain the data to manage, and the rules specify how the management must occur; for example, in the case of backup, how many versions you keep, where you store them, and so on. Tivoli Storage Manager policies define the relationships between these three resources. Depending on your actual needs for managing your enterprise data, these policies can be simple or complex. Tivoli Storage Manager has certain logical entities that group and organize the storage resources and define relationships between them. You group client systems, or nodes in Tivoli Storage Manager terminology, together with other nodes with common storage management requirements, into a policy domain.

Security concepts
Because the storage repository of Tivoli Storage Manager is the place where an enterprise stores and manages all of its data, security is a vital aspect for Tivoli Storage Manager. To ensure that only the owning client or an authorized party can access the data, Tivoli Storage Manager implements, for authentication purposes, a mutual suspicion algorithm, which is similar to the methods used by Kerberos authentication. Whenever a client (backup/archive or administrative) wants to communicate with the server, an authentication has to take place. This authentication contains both-sides verification, which means that the client has to authenticate itself to the server, and the server has to authenticate itself to the client. To do this, all clients have a password, which is stored at the server side as well as at the client side. In the authentication dialog, these passwords are used to encrypt the communication. The passwords are not sent over the network, to prevent hackers from intercepting them. A communication session will be established only if both sides are able to decrypt the dialog. If the communication has ended, or if a time-out period has ended with no activity, the session will automatically terminate and a new authentication will be necessary. Tivoli Storage Manager offers encription of data sent by the client to the server. It offers both 128 bit AES and 56 bit DES encription.

Backup Archive client interface


Tivoli Storage Manager is a client-server program. You must install the client product on the machine you want to back up. The client portion is responsible for sending and receiving data to and from the Tivoli Storage Manager server. The Backup Archive client has two distinct features: The backup feature allows users to back up a number of versions of their data onto the Tivoli Storage Manager server and to restore from these, if the original files are lost or damaged. Examples of loss or damage are hardware failure, theft of computer system, or virus attack.

Appendix A. Additional component details

513

The archive feature allows users to keep a copy of their data for long term storage and to retrieve the data if necessary. Examples of this are to meet legal requirements, to return to a previous working copy if the software development of a program is unsuccessful, or to archive files that are not currently necessary on a workstation. The latter features are the central procedures around which Tivoli Storage Manager is built. Backup and archive are supporting functions to be able to retrieve lost data later on. You can interact with the Tivoli Storage Manager server to run a backup/restore or archive/retrieve operation through three different interfaces: Graphical User Interface (GUI) Command Line Interface (CLI) Web Client Interface (Web Client) The command line interface has a richer set of functions than the GUI. The CLI has the benefit of being a character mode interface, and, therefore, is well suited for users who need to type the commands. You might also consider using it when you cannot access the GUI interface or when you want to automate a backup or archive by using a batch processing file.

Client application programming interface (API)


Tivoli Storage Manager provides a data management application program interface (API) that you can use to implement application clients to integrate popular business applications, such as databases or groupware applications. The API also adheres to an open standard and is published to enable customers and vendors to implement specialized or custom clients for particular data management needs or nonstandard computing environments. The Tivoli Storage Manager API enables an application client to use the Tivoli Storage Manager storage management functions. The API includes function calls that you can use in an application to perform the following operations: Start or end a session Assign management classes to objects before they are stored on a server Archive objects to a server Signal retention events for retention, such as activate, hold, or release Alternatively, some vendor applications exploit the Tivoli Storage Manager data management API by integrating it into their software product itself to implement new data management functions or to provide archival functionality on additional system platforms. Some examples are IBM DB2 Content Manager, IBM DB2 Content Manager OnDemand, IBM CommonStore for SAP R/3, IBM Lotus Domino, and Microsoft Exchange data archival. The API, including full documentation available on the Internet, is published to enable customers and vendors to implement their own solutions to meet their requirements.

Automation
Tivoli Storage Manager includes a central scheduler that runs on the Tivoli Storage Manager server and provides services for use by the server and clients. You can schedule administrative commands to tune server operations and to start functions that require significant server or system resources during times of low usage. You can also schedule client action, but that would be unusual for a data retention-enabled client. Each scheduled command (administrative or client) action is called an event. The server tracks and records each scheduled event and its completion status in the Tivoli Storage Manager server database.

514

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Client to server data path


Tivoli Storage Manager data can travel from client to server either over the LAN network or the SAN network when using Tivoli Storage Manager for SAN to enable LAN-free data transfers. The diagram in Figure A-18 schematically illustrates the components and data paths in a Tivoli Storage Manager environment.

Figure A-18 Backup environment pipeline and data flows

Figure A-18 shows the data flow or pipeline and potential bottlenecks in a Tivoli Storage Manager environment. It illustrates the route the data takes through the many components of the client-server storage environment. For each step in this route, we list causes of potential performance bottlenecks. Data is read by the backup/archive client from client disk or transferred in memory to the API client from a content manager application. The Tivoli Storage Manager client might compress the data before sending it to the Tivoli Storage Manager server in order to reduce network utilization. The client can choose whether or not to use the LAN or the SAN, also called LAN-free, for data transport. The SAN is optimized for bulk transfers of data and allows writing directly to the storage media, bypassing the Tivoli Storage Manager server and the network. LAN-free support requires an additional Tivoli Storage Manager license called Tivoli Storage Manager for SAN. Archiving data is normally a low volume operation, handling relatively small amounts of data to be retained for long periods of time. In this case, the LAN is more than adequate for data transport. The Tivoli Storage Manager server receives metadata, and data when using LAN transport, over the LAN network. Tivoli Storage Manager then updates its database. Many small files potentially can cause a high level of database activity. When the data is received over the LAN, it generally is stored in a disk storage pool for later migration to tape as an overflow location.

Appendix A. Additional component details

515

The maximum performance of data storage or retrieval operations depends on the slowest link in the chain, another way of illustrating it is that performance is constrained by the smallest pipe in the pipeline, as shown in Figure A-18 on page 515. In the figure, the LAN is the constraint on performance.

Tivoli Storage Manager storage management


Tivoli Storage Manager manages client data objects based on information provided in administrator-defined policies. Data objects can be subfile components, files, directories, or raw logical volumes that are archived from client systems. They can be objects such as tables, logs, or records from database applications, or simply a block of data that an application system archives to the server. The Tivoli Storage Manager server stores these objects on disk volumes and tape media that it groups into storage pools.

Tivoli Storage Manager storage pools and storage hierarchy


Tivoli Storage Manager manages data as objects as they exist in Tivoli Storage Manager storage pools (Figure A-19).

Backup client

LAN, WAN, or SAN Data object Device class - disk Primary storage pool - disk
Data object

Device class - tape

Copy storage pool Migrate storage pool volumes - tape Copy pool

Primary storage pool - tape

TSM storage hierarchy


4
2005 IBM Corporation

Figure A-19 IBM Tivoli Storage Manager storage hierarchy

Each object is bound to an associated management policy. The policy defines how long to keep that object and where the object enters the storage hierarchy. The physical location of an object within the storage pool hierarchy has no effect on its retention policies. You can migrate or move an object to another storage pool within a Tivol Storage Manager storage hierarchy. This can be useful when freeing up storage space on higher performance devices, such as disk, or when migrating to new technology. You can and should also copy objects to copy storage pools. To store these data objects on storage devices and to implement storage management functions, Tivoli Storage Manager uses logical definitions to classify the

516

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

available physical storage resources. Most important is the logical entity called a storage pool, which describes a storage resource for a single type of media, such as disk volumes, which are files on a file system, or tape volumes, which are cartridges in a library.

Native data deduplication


Tivoli Storage Manager provides a built-in data deduplication feature. Deduplication is a technique that allows more data to be stored on a given amount of media than would otherwise be possible. It works by removing duplicates in the stored version of your data. In order to do that, the deduplication system has to process the data into a slightly different form. When you need the data back, it can be reprocessed into in the same form as it was originally submitted. Tivoli Storage Manager is capable of deduplicating data at the server. It performs deduplication out of band, in Tivoli Storage Manager server storage pools. Deduplication is only performed on data in FILE (sequential disk) devtype storage pools. Tivoli Storage Manager chunks the data and calculates an MD5 of all the objects in question, which are then sliced up into chunks. Each chunk has an SHA1 hash associated with it, which is used for the deduplication. The MD5s are there to verify that objects submitted to the deduplication system are reformed correctly, since the MD5 is recalculated and compared with the saved one to ensure that returned data is correct. Deduplication and compression are closely related, and the two often work in similar ways, but the size of working set of data for each is different. Deduplication works against large data sets compared to compression (for example, real-world LZW compression often only has a working set under 1MB, compared to deduplication which is often implemented to work in the range of 1 TB to 1 PB). With deduplication, the larger the quantity being deduplicated, the more opportunity exists to find similar patterns in the data, and the better the deduplication ratio can theoretically be, so a single store of 40TB would be better than five separate datastores of 8 TB each. Deduplication is effective with many, but not all workloads. It requires that there are similarities in the data being deduplicated: for example, if a single file exists more than once in the same store, this could be reduced down to one copy plus a pointer for each deduplicated version (this is often referred to as a Single Instance Store). Some other workloads such as uncompressible and non-repeated media (JPEGs, MPEGs, MP3, or specialist data such as geo-survey data sets) will not produce significant savings in space consumed. This is because the data is not compressible, has no repeating segments, and has no similar segments. To sum up, deduplication typically allows for more unique data to be stored on a given amount of media, at the cost of the additional processing on the way into the media (during writes) and the way out (during reads).

Device classes
A storage pool is built up from one or more Tivoli Storage Manager storage pool volumes. For example, a disk storage pool can consist of several AIX raw logical volumes or multiple AIX files on a file system. Each AIX raw logical volume or AIX file corresponds to one Tivoli Storage Manager storage pool volume. A logical entity called a device class is used to describe how Tivoli Storage Manager can access those physical volumes to place the data objects on them. Each storage pool is bound to a single device class. The storage devices used with Tivoli Storage Manager can vary in their technology and total cost. To reflect this fact, you can imagine the storage as a pyramid (or triangle), with

Appendix A. Additional component details

517

high-performance storage in the top (typically disk), normal performance storage in the middle (typically optical disk or cheaper disk), and low-performance, but high-capacity, storage at the bottom (typically tape). Figure 4-4 illustrates this tiered storage environment that Tivoli Storage Manager uses: Disk storage devices are random access media, making them better candidates for storing frequently accessed data. Disk storage media with Tivoli Storage Manager can accept multiple parallel data write streams. Tape, however, is an economical high-capacity sequential access media, which you can can easily transport off-site for disaster recovery purposes. Access time is much slower for tape due to the amount of time necessary to load a tape into a tape drive and locate the data. However, for many applications, that access time is still acceptable. Tape: Today many people in the industry say that tape is dead and customers should use disk instead. However, the performance of high-end tape devices is often unmatched by disk storage subsystems. Current tape has a native performance in the range of, or over, 100 MB/sec. that with compression can easily pass 200 MB/sec. Also, consider the cost: The overall power consumption of tape is usually less than that of a disk. Disk storage is referred to as online storage, while tape storage has often been referred to as off-line and also near-line with regard to HSM in the past. With Tivoli Storage Manager HSM, tape volumes, located in a tape library, are accessed by the application that is retrieving data from them (near-line) transparently. Tapes no longer in the library are off-line, requiring manual intervention. The introduction of lower cost mass storage devices, such as Serial Advanced Technology Attachment (SATA) disk systems, offers an alternative to tape for near-line storage. Figure A-20 illustrates the use of a SATA disk as near-line storage.

Figure A-20 Online, near-line, and off-line storage

Device types
Each device defined to Tivoli Storage Manager is associated with one device class. Each device class specifies a device type. A device type identifies a device as a member of a group of devices, devices that shares similar media characteristics. For example, the LTO device type applies toLTO tape drives. The device type also specifies management information, such as how the server gains access to the physical volumes, recording format, estimated capacity, and labeling prefixes. Device types include DISK, FILE, and a variety of removable media types for tape and optical devices. Note that a device class for a tape or optical drive must also specify a library. The library defines how Tivoli Storage Manager can mount a storage volume onto a storage device such as a tape drive.

518

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Device access strategy


The access strategy of a device is either random or sequential. Primary storage pools can use random devices (such as disk) or sequential devices (such as tape). Copy storage pools use sequential access devices. Certain Tivoli Storage Manager processes use only sequential access strategy device types: Copy storage pools Tivoli Storage Manager database backups Export Import

Tape devices
Tivoli Storage Manager supports a wide variety of enterprise class tape drives and libraries. Use tape devices for backing up your primary storage pools to copy storage pools and for backing up the database. Tape devices are well suited for this, because the media can be transported off-site for disaster recovery purposes.

Policy management
A data storage management environment consists of three basic types of resources: client system, policy, and data. The client systems contains the data to manage, for example, file systems with multiple files. The policies are the rules to specify how to manage the objects. For example, for archives they define how long to retain an object in Tivoli Storage Manager storage, in which storage pool to place an object; or, in the case of backup, how many versions to keep, where they should be stored, and what Tivoli Storage Manager does to the stored object once the data is no longer on the client file system. Client systems, or nodes, in Tivoli Storage Manager terminology, are grouped together with other nodes with common storage management requirements into a policy domain. The policy domain links the nodes to a policy set, a collection of storage management rules for different storage management activities. Client node: The term client node refers to the application sending data to the Tivoli Storage Manager server. A policy set consists of one or more management classes. A management class contains the rule descriptions called copy groups and links these to the data objects to manage. A copy group is the place where you define all the storage management parameters, such as the number of stored copies, retention period, and storage media. When the data is linked to particular rules, it is said to be bound to the management class that contains those rules. Another way to look at the components that make up a policy is to consider them in the hierarchical fashion in which they are defined; that is, consider the policy domain containing the policy set, the policy set containing the management classes, and the management classes containing the copy groups and the storage management parameters, as illustrated in Figure A-21.

Appendix A. Additional component details

519

Clients nodes

Policy domain Policy set #3 Policy set #2 Policy set #1 Management Class #1 Management Class #2 Management Class #3 Copy group Rules Data

Copy group Rules

Data

Copy group Rules

Data

Figure A-21 Policy relationships and resources

We explain the relationship between the items in Figure A-21 in the following topics.

Copy group rules


Copy group rules can define either a backup copy group or an archive copy group. One set of rules applies to backups and a separate set to archives:

Backup copy group


This copy group controls the backup processing of files associated with the specific management class. It is uncommon to use backup copy groups for archival or data retention applications because they are better suited to back up versioning of files. A backup copy group determines: Where to store the object What to do if the file if file on the client is in use Whether or not to back up only if modified or changed Enforce minimum frequency of backup, to avoid backing up every time If the file exists on the client node: How many copies to keep How long to keep them If the file has been deleted on the client: How many copies to keep How long to keep the last copy of the file

Archive copy group


This copy group controls the archive processing of files associated with the management class. An archive copy group determines: How the server handles files that are in use during archive Where the server stores archived copies of files How long the server keeps archived copies of files

520

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Management class
The management class associates client files with archive copy groups with files. A management class is a Tivoli Storage Manager policy. Each individual object stored in Tivoli Storage Manager is associated with one and only one management class. A management class is a container for copy groups; it can contain either one backup or archive copy group, both a backup and an archive copy group, or no copy groups at all. Users can bind (that is, associate) their files to a management class through the include-exclude list, a set of statements or rules that associate files to a management class based on file filtering rules. Alternatively, a user can explicitly request an archive management class.

Policy set
The policy set specifies the management classes that are available to groups of users. Policy sets contain one or more management classes. You must identify one management class as the default management class. Only one policy set, the ACTIVE policy set, controls policies in a policy domain.

Policy domain
The concept of policy domains enables an administrator to group client nodes by the policies that govern their files and by the administrators who manage their policies. A policy domain contains one or more policy sets, but only one policy set (named ACTIVE) can be active at a time. The server uses only the ACTIVE policy set to manage files for client nodes assigned to a policy domain. You can use policy domains to: Group client nodes with similar file management requirements Provide different default policies for different groups of clients Direct files from different groups of clients to different storage hierarchies based on need Restrict the number of management classes to which clients have access Figure A-22 summarizes the relationships among the physical device environment, Tivoli Storage Manager storage and policy objects, and clients. The numbers in the following list correspond to the numbers in the figure.

Figure A-22 Basic policy structure for backup

Appendix A. Additional component details

521

Figure A-22 shows an outline of the policy structure. These are the steps to create a valid policy: 1. When clients are registered, they are associated with a policy domain. Within the policy domain are the policy set, management class, and copy groups. 2. When a client (application) backs up an object, the object is bound to a management class. A management class and the backup copy group within it specify where files are stored first (destination), and how they are managed. 3. Storage pools are the destinations for all stored data. A backup copy group specifies a destination storage pool for archived files. Storage pools are mapped to device classes, which represent devices. The storage pool contains volumes of the type indicated by the associated device class. Data stored in disk storage pools can be migrated to tape or optical disk storage pools and can be backed up to copy storage pools.

Hierarchical Storage Management


Hierarchical Storage Management (HSM) refers to a function of Tivoli Storage Manager that automatically distributes and manages data on disk, tape, or both by regarding devices of these types and potentially others as levels in a storage hierarchy. The devices in this storage hierarchy range from fast, expensive devices to slower, cheaper, and possibly removable devices. The objectives are to minimize access time to data and maximize available media capacity. HSM is implemented in many IBM products, such as Tivoli Storage Manager, in System i, and in z/OS in the combination of the storage management subsystem (SMS), DFSMShsm, DFSMSdss, and DFSMSrmm. Tivoli Storage Manager HSM solutions are applied to data on storage media, such as disk; the data is automatically migrated from one level of storage media to the next level based on some predefined policy. Tivoli Storage Manager offers different kinds of HSM functionality.

HSM in the Tivoli Storage Manager server


One level of HSM is related to how the Tivoli Storage Manager server stores data. The Tivoli Storage Manager server stores data on storage pools or collections of storage volumes of the same media type, as discussed in Tivoli Storage Manager storage management on page 516. You can map different Tivoli Storage Manager storage pools to different device types, and they can be concatenated together into a hierarchy using the Tivoli Storage Manager nextstgpool parameter.

522

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Figure A-23 illustrates a Tivoli Storage Manager server hierarchy with three storage pools. Storage pools are managed by threshold; each pool has a high threshold and a low threshold. When the amount of data in the storage pool exceeds the high threshold, Tivoli Storage Manager initiates a migration process to move the data. The data is moved to a destination called next storage pool, the next storage pool is defined a storage pool parameter in the original storage pool. So, in the example we see that poolfast has a next storage pool called poolslow. The migration process will move data from the poolfast to poolslow; the process starts when the amount of data stored in poolfast exceeds the high migration threshold and stops when it reaches the low threshold.

Figure A-23 Tivoli Storage Manager server migration processing

Tivoli Storage Manager offers additional parameters to control migration of data from one storage pool to the next. One of these is migdelay that specifies the minimum number of days that a file must remain in a storage pool before the file becomes eligible for migration to the next storage pool.

HSM for file systems


Tivoli Storage Manager offers two separate HSM clients for file systems: one for UNIX and one for Windows environments. In both cases, the HSM client resides on the file server where you want to perform space management. It moves files from the local file system to lower cost storage managed by the Tivoli Storage Manager server, and this movement is called migration. Tivoli Storage Manager performs this movement based on criteria such as file size and age. Moving a file to the Tivoli Storage Manager server implies that the file is removed from the Tivoli Storage Manager client. The client file system continues to see the file as if it were still on local disk. When a request to access the file occurs, the HSM client intercepts the file system requests and, depending on operating system platform, either recalls the file to primary storage or, in some cases, can redirect the file system request to secondary storage. These operations are performed transparently to the file system request even though the request can be slightly delayed because of the tape mount processing.

Appendix A. Additional component details

523

Figure A-24 Illustrates a sample HSM storage hierarchy built to minimize storage costs.

data
gra te

Pool A: High end disk. Migrate to PoolB after 14 days non-use

Pool A
Mi

Recall

data Pool B
Mi n tio gra

Recall

Pool B: Cheap SATA disk. Migrate to Pool C if capacity utilization exceeds 80%

Pool C Tape library data

Pool C

Figure A-24 Sample cost-based HSM storage hierarchy

HSM for UNIX clients


The IBM Tivoli Storage Manager for Space Management for UNIX client (HSM) migrates files from your local file system to storage and recalls them either automatically or selectively. Migrating files to a distributed storage device frees space for new data on your local file system. Your Tivoli Storage Manager administrator defines management classes to files. You, as root user: Select space management options and settings. Assign management classes to your files. Exclude files from space management. Schedule space management services. These options and settings determine which files are eligible for automatic migration, the order in which files are migrated, where the migrated files are stored, and how much free space is maintained on your local file system. You prioritize files for migration by their file size, or by the number of days since your files were last accessed. Stub files that contain the necessary information to recall your migrated files remain on your local file system so that the files appear to reside locally. When you access migrated files, they are recalled automatically to your local file system. This is different from archiving, which completely removes files from your local file system. The HSM client provides space management services for locally mounted file systems, and it migrates regular files only. It does not migrate character special files, block special files, named pipe files, or directories. File migration, unlike file backup, does not protect against accidental file deletion, file corruption, or disk failure. Continue to back up your files whether they reside on your local file system or in Tivoli Storage Manager storage. You can use the IBM Tivoli Storage Manager backup-archive client to back up and restore migrated files in the same manner as you would back up and restore files that reside on your local file system. If you accidentally delete stub files from your local file system, or if you lose your local file system, you can restore the stub files from Tivoli Storage Manager. 524

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

For planned processes, such as storing a large group of files in storage and returning them to your local file system for processing, use the archive and retrieve processes. You can use the backup-archive client to archive and retrieve copies of migrated files in the same manner as you would archive and retrieve copies of files that reside on your local file system. HSM supports various file systems. Currently, the following integrations exist: File system proprietary integration Data can be directly accessed and read from any tier in the storage hierarchy. This is supported on JFS on AIX. DMAPI standard-based integration The Data Management Application Programming Interface (DMAPI) standard has been adopted by several storage management software vendors. File system vendors focus on the application data management part of the protocol. Storage management vendors focus on the HSM part of the protocol. Tivoli Storage Manager HSM Client supported platforms currently are: GPFS on AIX, VxFS on Solaris, GPFS on xLinux, and VxFS on HP.

HSM for Windows clients


HSM for Windows offers automated management features, such as these: Policy-based file selection to apply HSM rules to predefined sets of files On-demand scheduling to define when to perform HSM automatic archiving Transparent recall to automatically have an application to reference a migrated file The policies or rules that HSM for Windows supports allow you to filter files based on attributes, such as: Directory name File types, based on the extensions Creation, modification, or last access date of file Automatic archiving performs archiving operations based on inclusion or exclusion of directories and subdirectories and inclusion or exclusion of file extensions. In addition, you can configure filter criteria based on creation, modification, and last access date.

Appendix A. Additional component details

525

526

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Related publications
The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book.

IBM Redbooks publications


For information about ordering these publications, see How to get Redbooks publications on page 528. Note that some of the documents referenced here might be available in softcopy only. IBM Scale Out Network Attached Storage Concepts, SG24-7874 IBM eServer xSeries and BladeCenter Server Management, SG24-6495 Configuration and Tuning GPFS for Digital Media Environments, SG24-6700 IBM Tivoli Storage Manager Implementation Guide, SG24-5614

Other publications
These publications are also relevant as further information sources: IBM Scale Out Network Attached Storage - Software Configuration Guide, GA32-0718 IBM Scale Out Network Attached Storage Installation Guide, GA32-0715 IBM Scale Out Network Attached Storage Introduction and Planning Guide, GA32-0716 IBM Scale Out Network Attached Storage Troubleshooting Guide, GA32-0717 IBM Scale Out Network Attached Storage Users Guide, GA32-0714 GPFS Advanced Administration Guide - Version 3 Release 3, SC23-5182

Online resources
These websites are also relevant as further information sources: SONAS Support Site: http://www.ibm.com/storage/support/ and select: Product family: Network Attached Storage (NAS) Product: Scale Out Network Attached Storage Click Go. Support for IBM System Storage, TotalStorage, and Tivoli Storage products: http://www.ibm.com/storage/support/ Additional GPFS documentation sources: http://www.ibm.com/systems/gpfs http://www-03.ibm.com/systems/software/gpfs/resources.html NFS V4 ACL information: http://www.nfsv4.org/
Copyright IBM Corp. 2010. All rights reserved.

527

How to get Redbooks publications


You can search for, view, or download Redbooks publications, Redpapers publications, Technotes, draft publications, and Additional materials, as well as order hardcopy Redbooks publications, at this website: ibm.com/redbooks

Help from IBM


IBM Support and downloads ibm.com/support IBM Global Services ibm.com/services

528

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Index
Numerics
36 port InfiniBand switch storage capacity 231 authentication services 436 Automated triggers 169 Automount filesystem 356 Average Seek Time 237

A
access control element 498 access control entries 436 access control list 144, 498 access control lists 355 access pattern 235236 access SONAS 307 accessing the CLI 345 ACL modify 305 ACL management 305 acoustics tests 253 activate export share 387 Active Energy Manager 252 Active Energy Manager component 61 active-active configuration recovery 216 active-active peer nodes 80 active-passive config recovery 215 Active-Passive systems 492 Add/Delete Cluster 351 addcluster command 282, 351 adding new disks 366 admin create 452 Administrator role 128 administrator role 453 administrator roles 453 aggregate mode 244 Alert logs 432 Apache daemon 78 appliance connection 450 application characteristics 234 architecture 8 assign disk to filesystem 172 async configuration 211 async replication 205, 261 process 211 two directions 209 async replication code 209 async replication tool 212 asynchronous replication 116, 199 attachnw command 133, 282, 304 authentication 91, 301 CIFS 92 authentication environment, 257 authentication method 266 authentication methods 91, 145 authentication server 91 authentication servers 94 Copyright IBM Corp. 2010. All rights reserved.

B
backup 187 backupmanagementnode command 186, 217 bandwidth 243 bandwidth consideration 236 bandwidth requirement 234 banned node 183, 352 Base Rack Feature Code 9004 228 feature code 9005 226 base rack 62 feature code 9003 62 Feature Code 9004 63 Feature Code 9005 63 base rackKFeature Code 9003 227 Baseboard Management Controller 56 better performance of whole SONAS file system and failure protection. 265 Block I/O 4 block level replication 199 block size 262 block-level migration 445 bonded IP address 5960 bonded ports monitoring 151 bonding 150 bonding interface hardware address 150 bonding mode mode 1 150 bonding modes 150 mode 6 150 business continuance 214 Byte-range locking 78, 86

C
cabling consideration 225 cache hit 235 cache hit ratio 235, 239240 cache miss 235, 239 Call Home feature 255, 434 capacity Storage Subsystem disk type 243 capacity and bandwidth 234 capacity requirements 242 cfgad command 266, 301 cfgbackupfs command 135136, 476 cfgcluster command 293 cfghsmfs command 137

529

cfgldap command 267, 302 cfgrepl command 210 cfgtsmnode command 135136, 190 chdisk command 294 chexport command 215, 385 chfs command 174, 204 chgrp command 305 chkauth command 301 chkauth command,commands chkauth 267 chkpolicy command 177 chpolicy command 164 CIFS access control list 144 authentication 92 authorization 144 byte-range locks 78 export configuration 330 file lock 144 file serving functions 438 file shares 144 session-oriented 144 timestamps 77 CIFS details 385 CIFS export 381, 390 CIFS locks 78 CIFS parameters 410 CIFS protocol 438 CIFS share Windows access 390, 470 CIFS shares migration 444 CIM messages 133 CLI credentials 256 CLI policy commands 162 CLI ssh access 345 CLI tasks 350 CLI user 402 client interface node assignment 82 Cloud Storage 268 cluster thresholds 340 cluster backup Tivoli Storage Manager 186 cluster configuration 286 cluster configuration information 186 cluster details via CLI 351 Cluster management 351 cluster management 351 Cluster Manager 79 CIFS function 91 components 88 function 76, 83, 90 interface node management 83 responsibilities 79 Cluster manager concurrent file access 86 cluster replication 199

cluster utilization 411 clustered file server 486 Clustered Trivial Data Base 89 clustered trivial database 264 cnmgmtconfbak command 218 cnreplicate command 207 cnrsscheck command 281, 288 Command Line Interface 131 commands addcluster 282, 351 attachnw 133, 282 backupmanagementnode 217 cfgad 266, 301 cfgbackupfs 135136, 476 cfgcluster 293 cfghsmfs 113 cfgldap 267, 302 cfgtsmnode 135136, 190 chdisk 294 chexport 385 chfs 367 chgrp 305 chkauth 301 chpolicy 164 cnrsscheck 281, 288 dsmmigundelete 124 lsbackupfs 190 lscluster 294 lscurrnode 292 lsdisk 294, 297 lsexport 383, 469 lsfs 296 lsfset 372 lshsmlog 114 lshsmstatus 114 lsnode 291 lsnwgroup 303 lspolicy 164 lsquota 370 lstsmnode 188, 476 lsuser 452 mkexport 133, 304, 382 mkfs 295 mkfset 373 mknw 302 mknwnatgateway 299 mkpolicytask 165 mkuser 282, 452 mmgetacl 307, 436 mmlsattr 204 restripefs 203 rmexport 389 rmfs 369 rmsnapshot 196 runpolicy 165 setnwdns 297298 setpolicy 113, 165 setquota 370 startbackup 113, 189, 477 startrestore 190

530

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

unlinkfset 374, 376 unmountfs 364 commandsLmkpolicy 163 computing capacities 241 concurrent file access 86 configuration changes 46 configuration data restore 218 configuration information backup 217 configuration sizing 233 configurations controller 223 rack 222 Console Logging and Tracing 344 Console users 344 contact information 343 Create File system panel 358 create filesystem 359 Cron jobs 165 CRON task MkSnapshotCron 419 cron tasks 340 cron triggers 169 CTDB 182, 293 cluster management 495 configuration 494 databases 488, 496 DMASTER 489 function 8990 GPFS management 482 High Availability 492 ip failover 495 LACCESSOR 490 LACCOUNT 490 LMASTER 490 Node recovery 494 Node status 495 overview 486 Record Sequence Number 489 RECOVER MASTER 490 services manages 482 CTDB architecture 487 CTDB Health Check 480 CTDB layer 268 CTDB logs 481 CTDB tickle-acks 182 CTDB unhealthy 482 customer-supplied racks 61

Data Network, 268 Data Path IP 302 data replication internal 103 data striping 82 database application file access 4 dblservice command 407 default network group 152 default placement rule 167 default userid 451 defined roles 128 delete file system 369 Denali code 433 direct attached storage 6 Director API module 433 disable service 407 disaster recovery 116, 217 disaster recovery purposes 261 disk characteristics average seek time 236 rotational latency 236 disk management 394 disk properties change 397 disk scrubbing 67 distributed metadata 101 distributed token management 101 DMAPI 193 DNS round robin config 278 DNS configuration 297 DNS function 145 Domain Name Servers 145 drive configurations 67 drive options 66 dual-inline-memory modules 224

E
eblservice command 407 EFSSG0026I error message 480 enable service 407 end users 403 Ethernet connections six additional 54 Ethernet network external ports 67 Ethernet switch internal private 48 Ethernet switches 47 Event logs 432 expansion unit 52 expansion units 58 export access 470 deactivate 387 modify 383 remove protocols 386 Export administrator 128 export administrator role 453 Index

D
data access failover 182 data access layer 75 data blocks 262 data growth contributors 3 data management 107 Data Management API 193 data migration process 441 Data Network Topology 427

531

export configuration 378 export configuration wizard 467 Exports 387 exports create 466 details 330 exports panel 330 external connections 49 external Ethernet connections 49 external network 154 external notifications 132 external storage pool 109 External storage pools 102 external storage pools 265

F
F5 file virtualization solution 443 failback interface node 84 failover 149 NFS 185 protocol behaviors 185 failover considerations 184 failover failback 148 failure group 508 failure groups 264 FAT file system 142 Feature code 1000 44 Feature code 1001 44 Feature code 1100 44 Feature code 1101 44 file access protocols 75 File I/O 4 File level backup/restore 446 file level replication 199 file level security 144 file management rules 110 file placement policies 110 file restore 190 file set Hard Limit Disk 334 Grace Time 334 Hard Limit I-nodes 334 Quota 334 Snapshots 335 soft limit disk 334 Soft Limit I-nodes 334 unlink 376 File Sets 333 file shares 144 File System configuration panel 328 File system concepts 497 file system concept 142 file sets 333 GUI create 462 mount 361

overhead 263 permissions 498 related tasks 327 file system concept 142 File System Disks GUI info 329 file system migration 446 file system snapshot 101 file system status 361 File System Usage 329 File System utilization 413 file system utilization 339 fileset 102, 104 filesets 372 Filesystem management 354 Filesystem utilization 459 first_time_install script 283 floor load 250 FTP 77 FTP protocol 438 FTP shares 393

G
General Parallel File System overview 499 General Parallel File System 8 global namespace 10, 146, 501 global policy engine 265 GPFS 81 architecture 502 cluster manager roles 507 failure group 508 global namespace 501 high availability 507 metanode 507 performance 504 GPFS and CTDB 482 GPFS Filesystem 295 GPFS logs 481 GPFS metadata 111 GPFS technology 97 Grid view 424 grouping concepts Tivoli Storage Manager 134 GUI File Sets 334 GUI tasks 314, 340, 349 GUI user 402

H
hardware architecture 41 hardware installation 280 hardware overview 42 Health Center 131 Health Check 480 Health Summary 310, 420 help information 451 high availability 182 high availability design 12

532

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

HSM 262 concepts 191 node grouping 137 rules 169 space management 112 HSM stub file 122 HSM usage advantages 111 CLI commands 113 HTTPS supported features 78 HTTPS protocol 78

internal private management network 57 internal storage pools 265 intracluster replication 199 IP address balancing 82, 268 IP Address configuration 297 IP address ranges 49, 153

J
junctionPath 375

L
LAN free backup 187 LDAP config file 302 LDAP server 302 LDAP software 267 Lightweight Directory Access Protocol 257 Limits tab GUI 357 link file set 374 linkfset 375 list storage pools 171 locking 144 locking capability 88 logical storage pools 104 lsbackupfs command 190 lscluster command 294, 351 lscurrnode command 292 lsdisk command 174, 203, 294, 297, 465 lservice command 406 lsexport command 305, 383, 469 lsfs command 296, 465 lsfset command 372373 lsnode command 291 LSNW command VLAN option 303 lsnwgroup command 303 lsnwinterface command 184 lspolicy command 164 lsquota command 370 lssnapshot command 197 lstask command 197 lstsmnode command 188, 476 lsuser command 452

I
InfiniBand connections 59 InfiniBand Network 268 InfiniBand switch 36-port 47 96-port 47 InfiniBand switches 47 configuration 223 initialsoftware configuration 291 Integrated Baseboard Management Controller 49 Integrated Management Module 49, 224 integration 265 intelligent PDU 252 intercluster replication 199 Interface Expansion Rack 230 interface expansion racks 62 Interface Network Topology 427 Interface node memory capacity 224 interface node 10, 81, 148 bandwidth 236 cache memory 10 components 43 configuration 224 connections 53 failover 84 failover failback 148 failure 270 Intel NIC 44 locking capability 88 network group 151 panel 326 Qlogic network adapter 44 rear view 55 redundancy 83 SAS HDD 43 single point of failure 55 suspend 353 Tivoli Storage Manager 187 Tivoli Storage Manager client code 125 TSM proxy agent 136 workload allocation 81 interface nodes optional features 44 Internal IP addresses ranges 49

M
macro defines 168 manage users 344 Management Ethernet network 60 Management GUI 127 defined roles 128 management node 45, 81 NTP server 154 management node connections 57 management policies 160 management service stopped 480 manpage command 348349 Manual trigger 169 marketplace requirements 2

Index

533

applications 3 Master file system 370 master file system 264 Master filesystem 296 maximum files 260 maximum size 97 maximum transmission unit 155 maxsess parameter 186 metadata migration 441 metadata servers 265 Microsoft Active Directory 266 migrate files 440 tools 442 migration 435 CIFS shares 444 metadata 441 methods 445 networth bandwidth 440 NFS exports 444 migration data 445 migration example 447 migration filters 170 migrequiresbackup option 171 Miscellaneous tab 358 mkexport command 133, 215, 304, 382 mkfs command 295, 359, 465 mkfset command 373 mknw command 302 mknwgroup command 303 mknwnatgateway command 299 mkpolicy command example 163 mkpolicy command 163 mkpolicytask 160 MkSnapshotCron template 197 mkuser command 282, 291, 452 mmapplypolicy command 162 mmeditacl command 403 mmgetacl command 307, 403, 436 mmlsattr command 179, 204 modify export 383 monitoring 453 nodes report 458 mount file system 361 mount.cifs command 392

N
NAS access 67 limitations 7 overview 6 NAS device compared to SAN 143 NAS Services 426 NAS storage 278 NAT Gateway 154, 299 Nearline SAS 53 net rpc vampire utility 443 net that utility 443 Network Address Translation 154

Network Address Translation gateway 299 network attached storage 142 network bonding 150 network group 151 network group concept 133 network interface name 153 network interface names 153 network latency 155 network object 133 network router 154 Network Shared Disk protocol 105 network traffic separate 152 networking 141 NTP server 154 new disks 366 NFS access control 144 authentication 437 locking 144 stateless service 144 NFS client 77 NFS clients 331 NFS details 384 NFS Export Configuration 331 NFS exports 77, 380 migration 444 NFS protocol 437 NFS share Linus host 471 mount rules 77 NFS shares DNS host names 150 NFS verses CIFS 144 NIC 142 nmon tool 241, 245 nmon_analyser tool 241 node failover 149 node health 281 node health script 281 nodes report 458 nodes verification 291 Notification technology tickle ack 84 NSD 354 NTFS file system security 499 NTP server 154

O
oldpolicy option 163 onboard Ethernet ports 57 Operator role 128 operator role 453 overview 1

P
parallel grid architecture 49

534

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

passwordless access 219 Peered policies 168 Peered pools 168 peered pools 110 perfmon tool 246 performance cache hit ratio 239 GPFS 504 RAID technology 237 read/write 239 permissions UNIX classes 498 permon tool 241 placement policies 160 Plugable Authentication Module 92 policies 160 best practices 165 CLI apply 176 default templates 110 file placement 110 list CLI 177 three types 108 Policies List 331 Policies panel 331 policy apply 179 CLI commands 162 GUI create 175 rules 161 scan engine 161 threshold implementation 162 validate 177 Policy details 332 policy implementation 108 policy rule syntax 109 policy rules 163 syntax 167 Policy triggers 169 port configuration switch migration 231 POSIX locks 78 power consumption 66, 253 power distribution units 61 power of two 241 predefined task 340 premigration files 192 Private Network range 268 protocol HTTPS 78 protocol behaviors 185 protocols CIFS 76 FTP 77 protocols mapping 86

quota management 370

R
rack cabling 47 rack configurations power distribution units 61 racks customer supplied 61 raid storage controller 51 Raw Usable Capacity 242 recover asynchronous replica 219 recovery master node 264 recovery steps active-passive config 215 Redbooks publications website 528 Contact us xvii redundancy 278 interface node 83 remote configuration 255 Remote Memory Direct Access 106 remove disk 367 remove task 418 REPLICATE clause 162 replication 116 local and remote 198 replication cycle 212 replication enabled 356 replication schedule 119 reporting. 411 resolv.conf file 298 restore remote replica 219 traditional backup 219 restripefs command 203204 Resume disk 396 Resume Node command 353 resumenode command 353 Richcopy tool 442 rmexport command 389 rmfs command 369 rmpolicy command 164 rmpolicytask command 165 rmsnapshot command 196 robocopy tool 440 Robocpy tool 442 rolling upgrades 81 root fileset 333 Rotational Latency 237 rsync tool 443 rsync transfer 118 rsynch tool 440 rule types 160 rules syntax 161 runpolicy command 160161, 165

Q
Quad ports GigE connectivity 244 quorum configuration 254 Quorum nodes 285 quorum topology 256

S
Samba 486 Index

535

Samba logs 481 SAS drive configuration 223 SATA drive configuration 224 SATA drives maximum capacity 260 scale out capability 10 SCAN engine 161 scan engine 103 Schedule task 420 schedule tasks 416 SCP 407 scp command 219, 442 SCP protocol 410 script verify_hardware_wellness 288 secure copy protocol 442 service configuration change 408 service maintenance port 5657 setnwdns command 297298 setpolicy command 160, 165 setquota command 370 share access load balance 278 shares information 431 size maximum physical 97 sizing 233 smbd daemon 488 snapshot 261 CLI create 473 create 472 Snapshots 114 software 11 upgrades 81 software architecture 73 Software Cluster Manager 148 software configuration 282 software installation 281 SONAS 74, 193 ACLs access control list 103 addressing 146 administrator roles 453 architecture 8 asynchronous replication 116 authentication 91 authentication methods 145 authorization 91 backup without Tivoli Storage Manager 126 base rack 62 CLI tasks 350 cluster backup 186 cluster configuration 286 cluster configuration information 186 Command Line Interface 131 component connections 53 configuration changes 46

create admin 452 data access 239 data access layer 75 data management 107 disaster recovery 116 DNS 145 drive options 66 Ethernet ports 43 external ports 67 file archive 124 file system administrator 101 fileset 102, 104 GUI access 314 GUI admin create 452 GUI tasks 349 hardware architecture 41 hardware installation 280 hardware overview 42 Health Center 131, 420 Hierarchical Storage Management processing 113 InfiniBand connections 59 interface expansion rack 64 interface node 45 IP address ranges 153 license 291 logs 457 Management GUI 127 management node connections 57 maximum files 260 monitoring 453 Network Address Translation 154 network interface names 153 node health 281 nodes verification 291 notification monitoring 132 online help 451 operating system 45 overview 1, 8 Private Network 292 raw storage 9 redundancy 278 scale out capability 10 scan engine 162 Snapshots 114 snapshots 101 software 11, 43 software configuration 282 software installation 281 storage controller 51 storage expansion uni 53 storage management 11, 126 storage node connections 56 switches 47 Tivoli Storage Manager 186 Tivoli Storage Manager licensing 125 Tivoli Storage Manager setup 124 SONAS administrator 399 SONAS appliance access 307 SONAS Base Rack

536

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

XIV storage 70 SONAS CLI user 399 SONAS end users 403 SONAS GUI and CLI tasks 350 SONAS GUI user 402 SONAS rack acoustic doors acoustic doors 253 SONAS snapshots 193 SONAS Software 74 function 81 Tivoli Storage Manager scripts 122 space efficient 193 space requirements 250 ssh client session 451 startbackup command 189, 477 startmgtsrv command 314, 405406, 480 startrepl command 213, 215 startrestore command 190 stateful 144 stateless service 144 stopmgtsrv command 405 storage 260 Storage administrator 128 storage administrator role 453 Storage Building Block 428 storage controller 32 KB chunk size 262 power consumption 66 RAID 5 66 rebuilds 52 Storage Disk details 336 Storage Expansion Rack 229 storage expansion unit 53 storage expansion units 58 storage node 45, 81 contents 45 GUI information 326 HA pairs 45 HDDs 45 maximum 45 storage node commands 353 storage node connections 56 Storage pod expansion 50 storage pod 50 configuration 223 connectivity 58 expansion 50 storage 429 storage pool 265 change 173 CLI create 173 external 109 GUI create 171 user 265 Storage Pool list 337 Storage Pools 504 Storage pools 102 suspend disk 395

suspendnode command 183, 352 switch configurations 223 switch health 288 switches 47 symbolic host names. 145 synchronous replication 199201 System administrator 128 system administrator role 453 system log 457 System logs 432 system management 126 system overhead 263 system utilization 338

T
task details 341 temperature specifications 254 threshold 166 utilization 133 threshold implementation 162 threshold monitoring 459, 461 threshold notification 460 threshold settings 339 tickle ack 84 tickle application 148 tickle-ack technology 79 Tiered policies 168 tiered pools 110 time synchronization nodes 92 Tivoli Storage Manager 186 access strategy 519 administration clients 511 architectural overview 510 archive copy group 520 archive feature 514 Backup Archive client 513 backup copy group 520 backup/archive client 125 bound policy 516 central scheduler 514 client API 514 client node 519 client software 125 clients 119 concepts 509 Copy group rule 520 Data archiving 509 data deduplication 517 database size 121 DB2 database 511 device classes 517 device type 518 dsmmigundelete command 123124 file based 119 grouping concepts 134 Hierarchical Storage Management for UNIX clients 524 Hierarchical Storage Management for Windows 525 HSM stub file 122 Index

537

interface node setup 187 LAN-free 515 LAN-free backup 121 management class 519, 521 metadata 511 overview 509 policy domain 521 policy set 521 progressive backup 509 reclamation threshold 513 restore command 123 scripts 122 security concepts 513 server 511 software licensing 125 SONAS interaction 119 storage management services 509 tape drives 519 Tivoli Storage Manager client code 125 Tivoli Storage Manager client definitions 188 Tivoli Storage Manager client software 182 Tivoli Storage Manager configuration 475 Tivoli Storage Manager database 187 Tivoli Storage Manager database sizing 187 Tivoli Storage Manager Hierarchical Storage Management client 190 Tivoli Storage Manager HSM client 111 Tivoli Storage Manager requirements 112 Tivoli Storage Manager server maxsess parameter 186 Tivoli Storage Manager server stanza 188 Tivoli Storage Manager setup 124 Tivoli Storage Manager stanzas 135 Tivoli Storage Manager storage pool 475 Topology Data Network 427 Interface Network 427 Interface Nodes 423 Management Node 426 topology 420 Topology Viewer 130 transparent recall 192 TSM data management 513 TSM server 120 two-tier architecture 81 two-tiered architecture 105

utilization thresholds 133

V
vampire utility 443 verify_hardware_wellness 281 VI editor 305 VLAN 153 VLAN option 303 VLAN tagging 153 VLAN trunking 153 Volume Shadow copy Services 116 Volume Shadow Services 77

W
WebDAV 78 weight distribution 250 weight expression 170 weight expressions 168 Winbind logs 481 Windows Active Directory 144 workload tools 245 workload allocation 81 workload analyzer tools 245 workload characteristics 240 performance impact 235

X
XCOPY utility 84, 149 xcopy utility 442 XIV Metadata replication 71 SONAS configuration 68 XIV configuration component considerations 70 XIV storage 68 attachment to SONAS 69

U
UID to SID mapping 94 UNIX permissions 498 unlink file set 376 unlinkfset command 374, 376 unmount file system 363 unmountfs command 364 upgrades 81 user storage pool 265 userid default 451 utilization monitoring 415

538

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

(1.0 spine) 0.875<->1.498 460 <-> 788 pages

IBM Scale Out Network Attached Storage: Architecture, Planning, and

IBM Scale Out Network Attached Storage: Architecture, Planning, and

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

IBM Scale Out Network Attached Storage: Architecture, Planning, and Implementation Basics

Back cover

IBM Scale Out Network Attached Storage


Architecture, Planning, and Implementation Basics
Shows how to set up and customize the IBM Scale Out NAS Details the hardware and software architecture Includes daily administration scenarios
IBM Scale Out Network Attached Storage (IBM SONAS) is a Scale Out NAS offering designed to manage vast repositories of information in enterprise environments requiring very large capacities, high levels of performance, and high availability. IBM SONAS provides a range of reliable, scalable storage solutions for a variety of storage requirements. These capabilities are achieved by using network access protocols such as NFS, CIFS, HTTP, and FTP. Utilizing built-in RAID technologies, all data is well protected with options to add additional protection through mirroring, replication, snapshots, and backup. These storage systems are also characterized by simple management interfaces that make installation, administration, and troubleshooting uncomplicated and straightforward. In this IBM Redbooks publication, we give you details of the hardware and software architecture that make up the SONAS appliance, along with configuration, sizing, and performance considerations. We provide information about the integration of the SONAS appliance into an existing network. We demonstrate the administration of the SONAS appliance through the GUI and CLI, as well as showing backup and availability scenarios. Using a quick start scenario, we take you through common SONAS administration tasks to familiarize you with the SONAS system.

INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE


IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information: ibm.com/redbooks


SG24-7875-00 ISBN 0738435058

Vous aimerez peut-être aussi