Vous êtes sur la page 1sur 9

BEST PRACTICES GUIDE FOR

MAINTAINING ENOUGH FREE SPACE


ON ISILON CLUSTERS AND POOLS
November 20, 2014

Abstract
Consuming all available storage space on an Isilon cluster or
pool can have a major impact on your workflow and the clusters
data protection capabilities. However, it is one of the easiest
problems to avoid. This document describes the issues that can
arise from out-of-space situations, and provides guidelines to
avoid them.

Table of Contents
Implications of having a full cluster or pool ................................................................................................. 3
Error messages indicating cluster or pool is full or nearly full ....................................................................... 3
What to do if you have a full or nearly full cluster or pool ............................................................................. 4
Best practices to maintain enough free space on your cluster or pool ........................................................... 4
Maintain sufficient free space ......................................................................................................................... 4
Maintain appropriate protection levels ............................................................................................................ 4
Monitor cluster capacity.................................................................................................................................. 4
Provision additional capacity .......................................................................................................................... 5
Manage your data ........................................................................................................................................... 5
Manage snapshots ......................................................................................................................................... 5
Make sure all nodes in a pool are compatible.................................................................................................. 6

Output of isi_for_array -X -s 'isi_hw_status | grep Prod' ............................................................................. 8


Enable Virtual Hot Spare ................................................................................................................................. 8
Enable Spillover.............................................................................................................................................. 8

2 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

Implications of having a full cluster or pool


When a cluster or pool is more than 90% full, the system can experience slower performance and possible
workflow interruptions in high-transaction or write-speed-critical operations.
Further, when a cluster or pool approaches full capacity (over 98% full), the following issues can occur:

Substantially slower performance

Workflow disruptions (failed file operations)

Inability to write data

Inability to make configuration changes

Inability to run commands that would ordinarily be used to free up space

Inability to delete data

Data unavailability

Potential failure of client operations or authentication (to connect/mount and navigate)

Potential for data loss

Allowing a cluster or pool to fill can put the cluster into a non-operational state that can take significant time
(hours or even days) to correct. Therefore, it is important to keep your cluster or pool from becoming full.

Error messages indicating cluster or pool is full or nearly


full
You might get any of the following error messages when attempting to write to a full, or nearly full, cluster or
pool:
Error

Where the error appears

The operation cant be completed


because the disk <share name> is
full.

OneFS web administration interface, or the


command-line interface on a Mac with an NFS
mount.

No space left on device.

OneFS web administration interface, or the


command-line interface on a Mac client.

No available space.

OneFS web administration interface, or the


command-line interface of a Windows client.

ENOSPC (error code)

On the cluster in /var/log/messages. This error


code will be embedded in another message.

Failed to satisfy layout preference.

On the cluster in /var/log/messages.

Disk Quota Exceeded.

Cluster command-line interface, or an NFS client


when you encounter a Snapshot Reserve limitation.

3 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

What to do if you have a full or nearly full cluster or pool


Follow the best practices listed below to keep your cluster or pools from getting too full. If your cluster or pool
is full, or nearly full, and you are unable to maintain enough free space on it, contact Isilon Technical Support.

Best practices to maintain enough free space on your


cluster or pool
Follow these guidelines to ensure your cluster or pool always has enough free space to keep your workflows
active and protect your data.
For more information about adding additional features or capacity to your system, contact your sales
representative.
For instructions on how to use the features described below, see the OneFS Administration Guide.
Alternatively, consult the OneFS Online Help.

M AIN TAIN SUF FICIENT FREE S PACE


For each pool (or cluster, in clusters with only one pool), EMC recommends that you maintain at least 10% free
space, or an amount equal to one node worth of space, whichever is larger. This will allow you to smartfail one
node if a node were to fail. For small clusters or pools (containing 6 nodes or fewer), the recommendation is to
maintain 10% free space.

M AIN TAIN APPROPRIATE PRO TECTION L EVELS


Make sure your cluster and pools are protected at the appropriate level. Every time you add nodes, re-evaluate
protection levels. For more information about protection levels, see the OneFS Administration Guide and How
to determine if an Isilon cluster is in a window of risk for data loss, 16701.
OneFS 7.2 includes a function that calculates a suggested protection level based on cluster configuration, and
alerts you if the cluster falls below the suggested level. For more information about this feature, see the Isilon
OneFS 7.2 Release Notes.

M ONITO R CLUSTE R CAPACITY

Configure alerts. Set up event notification rules so that you will be notified when the cluster
begins to reach capacity thresholds. Make sure to enter a current email address in order to
receive the notifications.

Pay attention to alerts. The cluster sends notifications when it has reached 95 percent and 99
percent capacity. On some clusters, 5 percent (or even 1 percent) capacity remaining might
mean that a lot of space is still available, so you might be inclined to ignore these
notifications. However, it is best to pay attention to the alerts, closely monitor the cluster,
and have a plan in place to take action when necessary.

Pay attention to ingest rate. It is very important to understand the rate at which data is coming in to
the cluster or pool. Some ways to do this are:
SNMP
SmartQuotas
FSAnalyze

4 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

Use SmartQuotas to monitor and enforce administrator-defined storage limits. SmartQuotas manages
storage use, monitors disk storage, and issues alerts when disk storage limits are exceeded. Although
it does not provide the same detail of the file system that FSAnalyze does, SmartQuotas maintains a
real-time view of space utilization so that you can quickly obtain the information you need. A
SmartQuotas license is required to use this feature. Consult with your account team and Isilon
Technical Support to determine a strategy for your workflow and operational needs. If SmartQuotas
not licensed on the cluster, you can use FSAnalyze.
Run FSAnalyze jobs. FSAnalyze is a job-engine job that the system runs to create data for InsightIQs
file system analytics tools. FSAnalyze provides details about data properties and space usage within
/ifs. Unlike SmartQuotas, FSAnalyze only updates its views when the FSAnalyze job runs. Since
FSAnalyze is a fairly low-priority job, it can sometimes be preempted by higher-priority jobs and
therefore take a long time to gather all of the data. An InsightIQ license is required to run an
FSAnalyze job.

P ROVIS ION ADDITIO NAL CAPACITY


To ensure your cluster or pools do not run out of space, you can create more clusters, replace smaller-capacity
nodes with larger-capacity nodes, or add new nodes to existing clusters or pools. If you decide to add new
nodes to an existing cluster or pool, contact your sales representative to order the nodes long before the
cluster or pool runs out of space. EMC recommends that you begin the ordering process when the cluster or
pool reaches 80% used capacity. This will allow enough time to receive and install the new equipment and
still maintain enough free space.
Suggested timeline:
Used

Space Action

80%

Begin planning to purchase additional capacity, upgrade, or replace old nodes

85%

Receive delivery of new nodes

90%

Install the new nodes

95%

Possibility of data unavailability if capacity is not increased

M AN AGE YOUR DATA

Regularly archive data that is rarely accessed.


Regularly delete unused data.

M AN AGE SN APSHOT S
Sometimes a cluster has many old snapshots that take up a lot of space. Reasons for this include:

The cluster is configured to take a lot of snapshots, but is not configured to have the
snapshots expire as quickly as they are created.

There is a device that is down or smartfailed on the cluster (in other words, the cluster is in a
degraded protection state). You cannot run a SnapshotDelete job (or any job other than
FlexProtect or FlexProtectLin) while the cluster is in this state. This could cause the number of
snapshots to increase without being noticed.

There is no SnapshotIQ license on the cluster, so you cannot delete old snapshots that still
exist.

5 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

Best practices for managing snapshots:

Configure the cluster to take fewer snapshots, and for the snapshots to expire more quickly,
so that less space will be consumed by old snapshots. Take only as many snapshots as you
need, and keep them active for only as long as you need them.

If you need to delete snapshots and there are down or smartfailed devices on the cluster, or if
the cluster is in an otherwise degraded protection state, contact Isilon Technical Support
for assistance.

Avoid taking snapshots of /ifs, /ifs/data, and /ifs/home in favor of more specific targets when
possible. In particular, avoid taking nested snapshots, redundant snapshots, or overly
scoped snapshots. For example, if you schedule snapshots of /ifs/data and /ifs/data/foo and
/ifs/data/foo/bar, consider taking snapshots of only the intermediate or most granularly
scoped part (/ifs/data/foo or /ifs/data/foo/bar).

If you have a lot of old, unneeded snapshots taking up space, and you do not have a
SnapshotIQ license, contact Isilon Technical Support to discuss your options and provide
assistance on deleting the old snapshots.

Do not delete SyncIQ snapshots (snapshots with names that start with SIQ), unless the only
remaining snapshots on the cluster are SyncIQ snapshots, and the only way to free up space
is to delete those SyncIQ snapshots. Deleting SyncIQ snapshots resets the SyncIQ policy
state, which requires a reset of the policy and potentially a full sync or initial diff sync. A full
sync or initial diff sync could take many times longer than a regular snapshot-based
incremental sync.

Delete snapshots in order, beginning with the oldest. Do not delete snapshots from the
middle of a time range. Newer snapshots are mostly pointers to older snapshots, and they
look larger than they really are. Deleting the newer snapshots will not free up much space.
Deleting the oldest snapshot ensures you will actually free up the space. You can determine
snapshot order (if not by name or date) by using the isi snapshot list -l command.
The snapshot IDs (first column) are non-conserved, serial values.
NOTE: If the oldest snapshot is a SyncIQ snapshot and you do not want to delete it,
you can delete the next-oldest snapshot. Even if the next-oldest snapshot is in the
same path, you will still free up some amount of data.

M AKE SU RE ALL N ODES IN A POOL ARE COM PATIBLE


If the nodes in a pool (or cluster, if the cluster contains only one pool) are of different capacities, each node
will only fill up to the capacity of the smallest-capacity node in the pool. This can result in no space
available errors even when a cluster or pool is only partially full.
For example, if a pool contains two 6 TB nodes and one 12 TB nodes, each node will only be able to take 6 TB
of data. Once each node contains 6 TB of data, the cluster will generate no space available errors, even
though one node is half empty.
Also, when the lower-capacity nodes are filled, there are significant performance implications, so mixed
density pools should be avoided.
Nodes in each pool must be of compatible types, as defined below.
NOTE: Each version of OneFS only supports certain nodes. Refer to the OneFS and node
compatibility section of the Isilon Supportability and Compatibility Guide for a list of which
nodes are compatible with each version of OneFS. Some customers might have upgraded to a
version of OneFS that no longer supports their existing nodes.

6 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

OneFS 7.0 and 7.1: Nodes in each node pool must be of the same model (X, S, or NL) and
contain the same capacity, equivalent SSD capacity, and equivalent RAM, as defined in the
Node provisioning equivalences (One FS 7.x) section of the Isilon Supportability and
Compatibility Guide.

OneFS 6.5 Nodes in each disk pool:

must be of compatible node types as defined in the Mixed-node cluster


compatibility (pre-OneFS 7.0) section of Isilon Supportability and Compatibility
Guide, and

should be the same capacity. Some nodes of different capacities are listed as being
compatible in the Isilon Supportability and Compatibility Guide; however, they
should be in different node pools for optimal performance.

In OneFS 7.0 and 7.1, autoprovisioning automatically divides nodes into node pools according to equivalence
class. In OneFS 6.5, autoprovisioning divides nodes into disk pools according to node model and capacity.
Occasionally, autoprovisioning might put a node into an incorrect pool, or an administrator might manually
put a node into a pool with nodes of a different type or capacity.
The following instructions describe how to verify that all the nodes in each pool are compatible. To do this,
first check to see which nodes are in which pool by looking at the OneFS web administration interface. Then,
run a command on the command line interface to see the model number of each node. Once you know the
model number, you can check the compatibility by referring to the correct section of the Isilon Supportability
and Compatibility Guide.
OneFS 7.0 and 7.1:
1. On the OneFS web administration interface, go to File System Management > SmartPools >
Summary to see which nodes are in which node pool.
2. Open an SSH connection on any node in the cluster and log on using the root account.
3. Run this command:
isi_for_array -X -s 'isi_hw_status | grep Prod'
4. Use the output to determine if the nodes all are compatible. See the Output from isi_for_array
-X -s 'isi_hw_status | grep Prod' section below for more details.
5. Autoprovisioning is supposed to ensure that nodes only go into the correct pools. If you see a
node in an incorrect pool, contact Isilon Technical Support.
OneFS 6.5:
1. On the OneFS web administration interface, go to File System > SmartPools > Disk Pools > Edit
to see which nodes are in which disk pool.
2. Open an SSH connection on any node in the cluster and log on using the root account.
3. Run this command:
isi_for_array -X -s 'isi_hw_status | grep Prod'
4. Use the output to determine if the nodes all are compatible. See the Output from isi_for_array
-X -s 'isi_hw_status | grep Prod' section below for more details.
5. If nodes are in the incorrect pool, go back to the web administration interface and manually
move them into the correct pool. If you are unable to correct the pool arrangements, or are
unsure how to proceed, contact Isilon Technical Support.

7 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

Out put o f isi_for_a rra y -X -s 'isi_hw _st atu s | grep P ro d '


The output of this command shows you the logical node number and model number of each node. With this
information, you can make sure each node is in the correct pool, with other compatible nodes.

In OneFS 6.5, if nodes are in an incorrect pool, you can manually move the nodes to the
correct pool.

In OneFS 7.0 and 7.1, autoprovisioning is supposed to ensure that nodes only go into the
correct pools. If you see a node in an incorrect pool, contact a coach or SME.

The model number, or product, displayed in the output looks different depending on the node type.
Older nodes of a given type all have the same capacity. From the model number listed, you can determine the
capacity, as shipped from the factory, in Base 10 gigabytes. In the following example output, node 1 is a 3000i
node, and nodes 2 and 4 are 1920i nodes. IQ 3000i nodes have a capacity of 3000 GB, and 1920i nodes have
a capacity of 1920 GB.
Although these two node types are listed as being compatible in the Isilon Supportability and Compatibility
Guide, they are not the same capacity and typically should not be in the same pool.
cluster-1: Product: IQ 3000i
cluster-2: Product: IQ 1920i
cluster-4: Product: IQ 1920i
The model number for newer nodes includes the entire string after Product: in the output. The model
number includes the model type and node configuration details, including the configured RAM, network
interfaces, and hard drive and SSD capacities. The node type is listed first, and the capacity is the largest
space value listed. In the following example output, nodes 1, 2, and 3 are of node type NL400, and their
capacity is 36 TB.
cluster-1: Product: NL400-4U-Dual-12GB-2x1GE-2x10GE SFP+-36TB
cluster-2: Product: NL400-4U-Dual-12GB-2x1GE-2x10GE SFP+-36TB
cluster-3: Product: NL400-4U-Dual-12GB-2x1GE-2x10GE SFP+-36TB

E NABLE V IRTU AL H O T S PARE


The purpose of Virtual Hot Spare (VHS) is to keep space in reserve in case you need to smartfail devices when
the cluster gets close to capacity. Enabling VHS will not give you more free space, but it will help protect your
data in the event that space becomes scarce.
VHS is enabled by default. Isilon strongly recommends that you do not disable VHS unless directed by a
Support engineer. If you disable VHS in order to free some space, the space you just freed will probably fill up
again very quickly with new writes. At that point, if a device were to fail, you might not have enough space to
smartfail the device and re-protect its data, potentially leading to data loss.
When upgrading OneFS from OneFS 5.5 and earlier, VHS is not enabled by default. In addition, if VHS was
disabled on any version and you upgrade, VHS will remain disabled. If VHS is not enabled on your cluster, first
check to make sure the cluster has enough free space to safely enable VHS, and then enable it. For
instructions, see OneFS: How to enable and configure Virtual Hot Spare (VHS), 88964.

E NABLE S PILLOVER
Spillover allows data that is being sent to a full pool to be diverted to an alternate pool. Spillover is enabled by
default on clusters that have more than one pool. If you have a SmartPools license on the cluster, you have the
ability to disable Spillover. However, it is recommended that you keep Spillover enabled. If a pool is full and
Spillover is disabled, you might get a no space available error but still have a large amount of space left on
the cluster.
8 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

Copyright 2014 EMC Corporation. All rights reserved. Published in


USA.
EMC believes the information in this publication is accurate as of its
publication date. The information is subject to change without notice.
The information in this publication is provided as is. EMC Corporation
makes no representations or warranties of any kind with respect to the
information in this publication, and specifically disclaims implied
warranties of merchantability or fitness for a particular purpose. Use,
copying, and distribution of any EMC software described in this
publication requires an applicable software license.
EMC, EMC, and the EMC logo are registered trademarks or trademarks of
EMC Corporation in the United States and other countries. All other
trademarks used herein are the property of their respective owners.
For the most up-to-date regulatory document for your product line, go to
EMC Online Support (https://support.emc.com). For documentation on
EMC Data Domain products, go to the EMC Data Domain Support Portal
(https://my.datadomain.com).

9 BEST PRACTICES GUIDE FOR MAINTAINING ENOUGH FREE SPACE ON ISILON CLUSTERS OR POOLS
Help us improve this document by submitting your feedback at http://bit.ly/isi-docfeedback.

Vous aimerez peut-être aussi