Académique Documents
Professionnel Documents
Culture Documents
Outline
ZFSOndiskStructure
ZFSInternals
StoragePool
PhysicalLayout
OndiskWalk
Yupu Zhang
yupu@cs.wisc.edu
ZFSArchitecture
Summary
4/30/2014
4/30/2014
ZFSStoragePool
Asimpleconfiguration
Managesphysicaldeviceslikevirtualmemory
Providesaflatspace
Sharedbyallfilesysteminstances
root
(mirrorA/B)
logicalvdev
Consistsofatreeofvirtualdevices (vdev)
Physicalvirtualdevice(leafvdev)
physicalvdev
Writablemediablockdevice,e.g.,adisk
A
(disk)
B
(disk)
Logicalvirtualdevice(interiorvdev)
Conceptualgroupingofphysicalvdevs,e.g.mirror
4/30/2014
4/30/2014
4/30/2014
Vdev Label
Vdev Label
A256KBstructurecontainedinphysicalvdev
Label0
Label1
storage spacefordata
Label 2
Label3
Name/valuepairs
Storeinformationaboutthevdevs
e.g.,vdev id,amountofspace
Redundancy
Fourcopiesoneachphysicalvdev
Twoatthebeginning,andtwoattheend
Arrayofuberblocks
Auberblock islikeasuperblockinext2/3/4
Provideaccesstoapoolscontents
Containinformationtoverifyapoolsintegrity
4/30/2014
Preventaccidentaloverwritesoccurringincontiguous
chunks
4/30/2014
Outline
BlockAddressing
Physicalblock
ZFSOndiskStructure
Contiguoussectorsondisk
512Bytes 128KB
DataVirtualAddress(DVA)
StoragePool
PhysicalLayout
OndiskWalk
vdev id+offset(inthevdev)
DVA1
DVA2
DVA3
Block
Checksum
Logicalblock
ZFSArchitecture
e.g.adatablock,ametadatablock
BlockPointer(blkptr)
UptothreeDVAsforreplication
Asinglechecksumforintegrity
Summary
4/30/2014
4/30/2014
Block
Block
Block
4/30/2014
Object
ExamplesofObject
Fileobject
Object
Blocktree
datablocks
e.g.,afile,adir,afilesystem
Commonfields
dnode
data
data
data
Directoryobject
bonus
Bonusbuffer
Upto3blkptrs
Blocksize,#oflevels,
dnode
znode
znode_phys_t :attributesofthedir
Blocktree
Bonusbuffer
ZAPblocks(ZFSAttributesProcessor)
namevaluepairs
dir contents:filename objectid
Objectspecificinfo
4/30/2014
ZILheader
Bonusbuffer
metadnode
dsl_dataset_phys_t
zpool
zfs
ZILheader
metadnode
Recordsinfoaboutsnapshotsandclones
Pointstotheobjectsetblock
Objset Structure
Pointstoachainoflogblocks
dnode
dsl_dataset_phys_t
Encapsulatesanobjectset(i.e.,FS)
Tracksitssnapshotsandclones
Agroupofdnode blocksmanagedby
themetadnode
Aspecialdnode,calledmetadnode
ZIL(ZFSIntentLog)header
ZAP
10
Dataset(itsanobject!)
Acollectionofrelatedobjects
Filesystem,snapshot,clone,volume
ZAP
Dataset
ObjectSet(Objset)
Fourtypes
ZAP
4/30/2014
ObjectSet
4/30/2014
znode
znode_phys_t:attributesofthefile
Ablocktreeconnectedbyblkptrs
EverythinginZFSisanobject
Dnode Structure
dnode
Bonusbuffer
Agroupofblocksorganizedbyadnode
Blocktree
dnode
None
dnode
dnode
dnode
dnode
dnode
11
4/30/2014
12
4/30/2014
PhysicalLayout
vdev label
MetaObjectSet
dnode
Outline
ZFSOndiskStructure
StoragePool
PhysicalLayout
OndiskWalk
uberblock
dnode
dnode
zpool
dnode
objectsetblock
zfs
dnode
dnode block
dnode
ZFSArchitecture
dnode
dnode
indirectblock
file object
Summary
datablock
filesystem
dataset
objectset
4/30/2014
13
4/30/2014
OnDiskWalkthrough(/tank/z.txt)
MetaObjectSet
metadnode
Object
Directory
root Dataset
Directory
root Dataset
Childmap
root=2
tank=27
Master
Node
root
Directory
tank Dataset
Directory
14
ReadaBlock
tank
Dataset
z.txt
File
zpool
zfs
tank ObjectSet
metadnode
z.txt
File
root=3
vdev label
objectsetblock
uberblock
dnode block
4/30/2014
z.txt=4
data
indirectblock
blockpointer
data/ZAPblock
datablock
objectreference
15
4/30/2014
16
4/30/2014
WriteaBlock
dnode
dnode
Neveroverwrite
Foreverydirtyblock
dnode
zpool
dnode
zfs
dnode
dnode
Outline
dnode
dnode
ZFSOndiskStructure
StoragePool
PhysicalLayout
OndiskWalk
Newblockisallocated
Checksumisgenerated
Blockpointermustbeupdated
Itsparentblockisthusdirtied
Updatestolowlevelblocks
arepropagateduptothe
uberblock
ZFSArchitecture
Summary
4/30/2014
17
4/30/2014
Overview
write(file,offset,length)
VFS
(VirtualFileSystem)
writeto
blk Zofobj XindatasetY
ZPL
(ZFSPOSIXLayer)
TXstart
dmu_write
TXend
diskwritetoblk N
4/30/2014
DMU
(DataManagementUnit)
18
ZIL(ZFSIntentLog)
WhydoesZFSneedalog?
NOTforconsistency
COWtransactionmodelguaranteesconsistency
ZIL
(ZFSIntentLog)
Forperformanceofsynchronouswrites
WaitingsecondsforTXGcommitisnotacceptable
Justflushchangestothelogandreturn
Replaytheloguponacrashorpowerfailure
ZIO
(ZFSI/OPipeline)
19
4/30/2014
20
4/30/2014
Outline
Summary
ZFSismorethanafilesystem
ZFSOndiskStructure
Storagemanageability:zpool
Dataintegrity:checksum,replication
Dataconsistency:COW,transactionalmodel
StoragePool
PhysicalLayoutandLogicalOrganization
OndiskWalk
MoreaboutZFS
ZFSArchitecture
Wiki:http://en.wikipedia.org/wiki/ZFS
ZFSonLinux:http://zfsonlinux.org
ZFSonFreeBSD:https://wiki.freebsd.org/ZFS
Summary
4/30/2014
21
4/30/2014
22