Académique Documents
Professionnel Documents
Culture Documents
You need:
• An operating system with ZFS support:
○ Solaris 10 6/06 or later [download]
○ OpenSolaris [download]
○ Mac OS X 10.5 Leopard (requires ZFS download)
○ FreeBSD 7 (untested) [download]
○ Linux using FUSE (untested) [download]
• Root privileges (or a role with the appropriate ZFS rights profile)
• Some storage, either:
○ 512 MB of disk space on an existing partition
○ Four spare disks of the same size
Using Files
To use files on an existing filesystem, create four 128 MB files, eg.:
# mkfile 128m /home/ocean/disk1
# mkfile 128m /home/ocean/disk2
# mkfile 128m /home/ocean/disk3
# mkfile 128m /home/ocean/disk4
# ls -lh /home/ocean
total 1049152
-rw------T 1 root root 128M Mar 7 19:48 disk1
-rw------T 1 root root 128M Mar 7 19:48 disk2
-rw------T 1 root root 128M Mar 7 19:48 disk3
-rw------T 1 root root 128M Mar 7 19:48 disk4
Using Disks
To use real disks in the tutorial make a note of their names (eg. c2t1d0 or c1d0 under Solaris).
You will be destroying all the partition information and data on these disks, so be sure they're not
needed.
In the examples I will be using files named disk1, disk2, disk3, and disk4; substitute your disks
or files for them as appropriate.
ZFS Overview
The architecture of ZFS has three levels. One or more ZFS filesystems exist in a ZFS pool, which
consists of one of more devices* (usually disks). Filesystems within a pool share its resources
and are not restricted to a fixed size. Devices may be added to a pool while its still running: eg.
to increase the size of a pool. New filesystems can be created within a pool without taking
filesystems offline. ZFS supports filesystems snapshots and cloning existing filesystems. ZFS
manages all aspects of the storage: volume management software (such as SVM or Veritas) is
not needed.
*Technically a virtual device (vdev), see the zpool(1M) man page for more.
ZFS is managed with just two commands:
• zpool - Manages ZFS pools and the devices within them.
• zfs - Manages ZFS filesystems.
If you run either command with no options it gives you a handy options summary.
Pools
All ZFS filesystems live in a pool, so the first step is to create a pool. ZFS pools are administered
using the zpool command.
Before creating new pools you should check for existing pools to avoid confusing them with
your tutorial pools. You can check what pools exist with zpool list:
# zpool list
no pools available
NB. OpenSolaris now uses ZFS, so you will likely have an existing ZFS pool called syspool on
this OS.
Single Disk Pool
The simplest pool consist of a single device. Pools are created using zpool create. We can create
a single disk pool as follows (you must use the absolute path to the disk file):
# zpool create herring /home/ocean/disk1
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
herring 123M 51.5K 123M 0% ONLINE -
No volume management, configuration, newfs or mounting is required. You now have a working
pool complete with mounted ZFS filesystem under /herring (/Volumes/herring on Mac OS X -
you can also see it mounted on your Mac desktop). We will learn about adjusting mount points in
part 2 of the tutorial.
Create a file in the new filesystem:
# mkfile 32m /herring/foo
# ls -lh /herring/foo
-rw------T 1 root root 32M Mar 7 19:56 /herring/foo
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
herring 123M 32.1M 90.9M 26% ONLINE -
The new file is using about a quarter of the pool capacity (indicated by the CAP value). NB. If
you run the list command before ZFS has finished writing to the disk you will see lower USED
and CAP values than shown above; wait a few moments and try again.
Now destroy your pool with zpool destroy:
# zpool destroy herring
# zpool list
no pools available
You will only receive a warning about destroying your pool if it's in use.
Mirrored Pool
A pool composed of a single disk doesn't offer any redundancy. One method of providing
redundancy is to use a mirrored pair of disk as a pool:
# zpool create trout mirror /home/ocean/disk1 /home/ocean/disk2
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 123M 51.5K 123M 0% ONLINE -
To see more detail about the pool use zpool status:
# zpool status trout
pool: trout
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 123M 32.1M 90.9M 26% ONLINE -
As before about a quarter of the disk has been used; but the data is now stored redundantly over
two disks. Let's test it by overwriting the first disk label with random data (if you are using real
disks you could physically disable or remove a disk instead):
# dd if=/dev/random of=/home/ocean/disk1 bs=512 count=1
ZFS automatically checks for errors when it reads/writes files, but we can force a check with the
zfs scrub command.
# zpool scrub trout
# zpool status
pool: trout
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-4J
scrub: scrub completed with 0 errors on Wed Mar 7 20:42:07 2007
config:
NAME STATE READ WRITE CKSUM
trout DEGRADED 0 0 0
mirror DEGRADED 0 0 0
/home/ocean/disk1 UNAVAIL 0 0 0 corrupted
data
/home/ocean/disk2 ONLINE 0 0 0
If you are replacing a disk in the ZFS root pool, see How to Replace a Disk in the ZFS Root
Pool.
The basic steps for replacing a disk are:
• Offline the disk, if necessary, with the zpool offline command.
• Remove the disk to be replaced.
• Insert the replacement disk.
• Run the zpool replace command. For example:
• Put the disk back online with the zpool online command.
# rm /home/ocean/disk1
# mkfile 128m /home/ocean/disk1
To attach another device we specify an existing device in the mirror to attach it to with zpool
attach:
# zpool attach trout /home/ocean/disk2 /home/ocean/disk1
If you're quick enough, after you attach the new disk you will see a resilver (remirroring) in
progress with zpool status.
# zpool status trout
pool: trout
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 69.10% done, 0h0m to go
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
trout 246M 64.5M 181M 26% ONLINE -
This happens almost instantly, and the filesystem within the pool remains available. Looking at
the status now shows the pool consists of two mirrors:
# zpool status trout
pool: trout
state: ONLINE
scrub: resilver completed with 0 errors on Wed Mar 7 20:58:17 2007
config:
NAME STATE READ WRITE CKSUM
trout ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk2 ONLINE 0 0 0
/home/ocean/disk1 ONLINE 0 0 0
mirror ONLINE 0 0 0
/home/ocean/disk3 ONLINE 0 0 0
/home/ocean/disk4 ONLINE 0 0 0
We can see where the data is currently written in our pool using zpool iostat -v:
zpool iostat -v trout
capacity operations bandwidth
pool used avail read write read write
---------------------------- ----- ----- ----- ----- ----- -----
trout 64.5M 181M 0 0 13.7K 278
mirror 64.5M 58.5M 0 0 19.4K 394
/home/ocean/disk2 - - 0 0 20.6K 15.4K
/home/ocean/disk1 - - 0 0 0 20.4K
mirror 0 123M 0 0 0 0
/home/ocean/disk3 - - 0 0 0 768
/home/ocean/disk4 - - 0 0 0 768
---------------------------- ----- ----- ----- ----- ----- -----
All the data is currently written on the first mirror pair, and none on the second. This makes
sense as the second pair of disks was added after the data was written. If we write some new data
to the pool the new mirror will be used:
# mkfile 64m /trout/quuxx
ZFS Filesystems
ZFS filesystems within a pool are managed with the zfs command. Before you can manipulate
filesystems you need to create a pool (you can learn about ZFS pools in part 1). When you create
a pool, a ZFS filesystem is created and mounted for you.
ZFS Filesystem Basics
Create a simple mirrored pool and list filesystem information with zfs list:
# zpool create salmon mirror c3t2d0 c3t3d0
# zpool list
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
salmon 136G 84.5K 136G 0% ONLINE -
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
salmon 75.5K 134G 24.5K /salmon
We can see our filesystem is mounted on /salmon and is 134 GB in size.
We can create an arbitrary number (264) of new filesystems within our pool. Let's add some
filesystems space for three users with zfs create:
# zfs create salmon/kent
# zfs create salmon/dennisr
# zfs create salmon/billj
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
salmon 168K 134G 28.5K /salmon
salmon/billj 24.5K 134G 24.5K /salmon/billj
salmon/dennisr 24.5K 134G 24.5K /salmon/dennisr
salmon/kent 24.5K 134G 24.5K /salmon/kent
Note how all four filesystems share the same pool space and all report 134 GB available. We'll
see how to set quotas and reserve space for filesystems later in this tutorial.
We can create arbitrary levels of filesystems, so you could create whole tree of filesystems inside
/salmon/kent.
We can also see our filesystems using df (output trimmed for brevity):
# df -h
Filesystem size used avail capacity Mounted on
salmon 134G 28K 134G 1% /salmon
salmon/kent 134G 24K 134G 1% /salmon/kent
salmon/dennisr 134G 24K 134G 1% /salmon/dennisr
salmon/billj 134G 24K 134G 1% /salmon/billj
You can remove filesystems with zfs destroy. User billj has stopped working on salmon, so let's
remove him:
# zfs destroy salmon/billj
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
salmon 138K 134G 28.5K /salmon
salmon/dennisr 24.5K 134G 24.5K /salmon/dennisr
salmon/kent 24.5K 134G 24.5K /salmon/kent
Mount Points
It's useful that ZFS automatically mounts your filesystem under the pool name, but this is often
not what you want. Thankfully it's very easy to change the properties of a ZFS filesystem, even
when it's mounted.
You can set the mount point of a ZFS filesystem using zfs set mountpoint. For example, if we
want to move salmon under /projects directory:
# zfs set mountpoint=/projects/salmon salmon
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
salmon 142K 134G 27.5K /projects/salmon
salmon/dennisr 24.5K 134G 24.5K /projects/salmon/dennisr
salmon/kent 24.5K 134G 24.5K /projects/salmon/kent
On Mac OS X you need to force an unmount of the filesyetem (using umount -f
/Volumes/salmon) before changing the mount point as it will be in use by fseventsd. To mount it
again after setting a new mount point use 'zfs mount salmon'.
Mount points of filesystems are not limited to those of the pool as a whole, for example:
# zfs set mountpoint=/fishing salmon/kent
# zfs list
NAME USED AVAIL REFER MOUNTPOINT
salmon 148K 134G 27.5K /projects/salmon
salmon/dennisr 24.5K 134G 24.5K /projects/salmon/dennisr
salmon/kent 24.5K 134G 24.5K /fishing
To mount and unmount ZFS filesystems you use zfs mount and zfs unmount*. ZFS filesystems
are entirely managed by ZFS by default, and don't appear in /etc/vfstab. In a future tutorial we
will look at using 'legacy' mount points to manage filesystems the traditional way.
*Old school Unix users will be pleased to know 'zfs umount' also works.
For example (mount output trimmed for brevity):
# zfs unmount salmon/kent