Académique Documents
Professionnel Documents
Culture Documents
Case Study
Figure 2 shows a tree whose rightmost leaf node is C and C.2. Bitot
where the leaves are linked from left to right. If C is updated,
the entire tree needs to be shadowed. Without leaf pointers, Bitrot [9] is the silent corruption of data on disk or tape. One
only C, B, and A require shadowing. In such B-trees, with at a time, year by year, a random bit here or there gets flipped.
shadowing each change has to be propagated up to the root. The worst thing is that the backup won't save the user, since
Hence the major challenge is to achieve benefits of shadowing backups are completely oblivious to bitrot. Conventional RAID
mechanism, while retaining the ubiquitous B-trees for doesn't help either. Though RAID5 array can rebuild the data
organization and maintenance of large ordered indexes. from the parity, but that only works if the drive fails completely
Fig. 3. Original image Fig. 4. After flipping one bit
BTRFS info (device sdb): csum failed ino 257 off 0 csum L1 C11 C21 C31 C41
2566472073 expected csum 3681334314 L2 C12 C22 C32 C42
BTRFS info (device sdb): csum failed ino 257 off 0 csum L3 C13 C23 C33 C43
2566472073 expected csum 3681334314
BTRFS: read error corrected: ino 257 off 0 (dev /dev/sdc D. Subvolumes
sector 449512) Subvolumes provide an alternative restricted view of the
C. Multi-device support file system. Each subvolume can be treated as its own
filesystem and mounted separately and exposed as needed.
The device mapper [11] subsystem in Linux manages storage
Exposing only a part of a file system, restricts the damage to
devices. For example, LVM and mdadm. These are software
the entire file system. A subvolume [12] in btrfs has its
modules whose primary function is to take raw disks, merge
hierarchy and relations between other subvolumes. A
them into a virtually contiguous block address space, and
subvolume in btrfs can be accessed in two ways, (i)From the
export that abstraction to higher level kernel layers. They
parent subvolume when accessing from the parent subvolume,
support mirroring, striping, and RAID5/6. However,
the subvolume can be used just like a directory. It can have
checksums are not supported. This causes problem for BtrFS,
child subvolumes and its own files/directories. (ii) Separate
which maintains checksum for each block. Consider a case
mounted filesystem. From outside, subvolumes look like
where data is stored in RAID-1 form on disk, and each 4KB
ordinary directory structure; one can copy things into that
block has an additional copy. If the file system detects a
directory (which thus puts them into that subvolume), one can
checksum error on one copy, it needs to recover from the other
create other directories under that subvolume directory, and can
copy. Device mapper hide that information behind the virtual
even create other subvolumes under it, however in reality they
address space abstraction, and return one of the copies. Since
are not. An attempt to create hardlinks across subvolumes
won't work. Subvolumes are extremely easy to manage when
taking a snapshot. These snapshot of subvolume can be read- seen in the field. At the end of the day, what matters to a user
only as well. Since BtrFS is copy-on-write based file system, is the robustness and performance for his particular application.
the snapshot initially consume no additional disk space and 1000 surprising power failure test result [7] show that Ext4
will only start to use space if its files are modified, or new files metadata was corrupted, while BtrFS worked without any
are created. problem. The power failure test was done on Freescale TWR-
VF65GS10 board, 1GB DDR3 memory and 16GB Micro SD
E. Snapshot Card. Linux kernel version used was 3.15-rc7. The board was
One of the requirement for mission critical system is to be periodically turned On and Off, while a file writing application
able to recover from failures. Snapshots are one such was continuously creating 4KB files.
mechanism. Snapshot is a state of a system (In this case data)
TABLE IV. POWER FAILURE TEST RESULTS
at a particular point in time. Using snapshot, one can go back
to a particular time in history and recover data. Snapshots are Number of Results
built in BtrFS and cost little performance, especially compared Power Failure
to LVM (Logical Volume Manager). In BtrFS, a snapshot is a BtrFS 1000+ No Abnormal situation occurred
cheap atomic copy of a subvolume, stored on the same file
Ext4 1000+ Corrupted inode had increased up to 32,000
system as the original. Snapshot volume looks similar to a full and Finally Fell into Abnormal Disk Full
backup taken at a particular point. For example, consider a file State
of size 10GB, it takes up 10GB of space. At this point (say at
Time 't') snapshot is taken, the file and the snapshot between
them take up 10GB of space. Later 1GB of the file is Table IV shows the robustness test results. It is perhaps the
modified, and now the file and the snapshot take up 11GB of copy-on-write feature of BtrFS that sustained such abrupt
power failures.
space; 9GB is unmodified and is still shared. Only the
remaining 1GB has two different versions. This approach has Performance test results [6, 7] show that despite supporting
tremendous space savings. These read-only snapshots can be new features such as Snapshot, data checksum and multiple
sent to another file system or machine using send/receive to device support, BtrFS provides reasonable performance under
cancel out single point of failure. A snapshot in Btrfs is a most workloads.
special type of subvolume; one which contains a copy of the
current state of some other subvolume. Snapshots clearly have
a useful backup function. If, for example, one has a Linux
system using Btrfs, one can create a snapshot prior to
installing a set of distribution updates. If the updates go well,
the snapshot can simply be deleted. Should the update go
badly, instead, the snapshot can be made the default
subvolume and, after a reboot, everything is as it was before.