Vous êtes sur la page 1sur 19

The IceFileSystem 2.

x Disk Layout
By Leif Salomonsson (c) 2011 Last updated: June 2 2011. Contents:

Introduction. Conventions. eta !eaders. eta "#$ects. %&tents. 'dmin Space. Constants. (unctions.

Introduction
Ice(ileSystem layout is e&tent #ased) *+#it) ,it- c-ec.summed meta data and a meta level $ournal. ain space allocator is #ased on /LS( al0orit-m) adapted for on1dis. stora0e. 'll metadata e&cept e&tent -eaders are located in special e&tents called pools and use a local #itmap to .eep trac. of free meta space. etadata is very compact. 2esi0n 0oals and priorities *+#it dis. filesystem supportin0 a -i0- num#er of useful features) #y a desi0n prioritised li.e follo,in0: 113elia#ility 21Scala#ility 41%fficiency +1Speed (eatures

64bit file/partition/extent sizes (actually very close to 2^63 bytes). No self imposed fra mentation of lar e files unless t!ere is not enou ! conti ious space available.

"ll metadata on dis# is c!ec#summed. $!is means any errors on dis# %ill be detected a lot &uic#er. 'eta level (ornallin . )ardlin#s (directory and file)* softlin#s* file comments. +upports bloc#sizes from ,-2 bytes to 32 .i/. 0ilesystem does not et slo%er for lar er partitions (scales very %ell)* or %!en !eavily fra mented. No limit in 1 of files/dirs in partition/dirs. 2ecycle directory. "utomatically truncated lo files and file c!an e lo . 3astes only -24# (and preallocates anot!er -24# for meta space) for filesystem administration data* re ardless of partition size.

Conventions
'll meta data stores information in #i0 endian format. 'll on1dis. pointers e&press a #yte offset relative to t-e start of t-e volume.
'll fields in all structures descri#ed are of t-e unsi0ned inte0er .ind) ,it- si5es ran0ein0 from 1 to 6 #ytes) or array of unsi0ned inte0er .ind. (ield types: p*+ p42 i*+ i42 i1* i6 7 7 7 7 7 1 *+#it pointer 42#it pointer *+#it inte0er 42#it inte0er 1*#it inte0er 6#it inte0er

eta !eaders
'll meta o#$ects starts ,it- a meta -eader structure) t-e 8meta8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 111111111111111 SI9%"( : 1* selfadr: pointer to our selves. c-ec.sum: total sum of all 42#it ,ords (,it- 8c-ec.sum8 field itself cleared) t-at ma.e up t-e ,-ole o#$ect (si5e of o#$ect in 8metasi5e8 field). metasi5e: total si5e of meta o#$ect. reserved: 5ero. metata0: inte0er descri#in0 t-e type of meta o#$ect. 'dditionally) all meta t-at is stored in meta pools -ave one e&tra 8#mapadr8 field. /-e 8pmeta8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ 111111111111111 SI9%"( : 2+ #mapadr: pointer to t-e 8#itmap8 o#$ect.

eta "#$ects
/-e 8filesys8 o#$ect: 111111111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 creation i*+ totalsi5e i*+ rootentry p42 rootnode p42 root-as-ta# p42 recyclednode p42 lo0filenode p42 fsversion i42 fsrevision i42 #loc.si5e i42 reserved2 ;2< i42 -as-type i42 e&tinfo p42 $ournal p42 startofe&tents p42 endofe&tents p*+ po,erta#s ;*+< p42 ori0=selfadr p*+ reserved ;1>< i*+ 111111111111111111111 metasi5e: ?12 metata0: /'@=(IL%SAS creation: time of creation e&pressed in microseconds since start of time ;1< totalsi5e: total si5e of volume e&pressed in #ytes. rootentry: pointer to root 8entry8 o#$ect. rootnode: pointer to root 8node8 o#$ect. root-as-ta#: pointer to root 8-as-ta#8 o#$ect. recyclednode: pointer to recycled 8node8 o#$ect. may #e 5ero for pre v2.4. lo0filenode: pointer to lo0file 8node8 o#$ect. may #e 5ero for pre v2.4.

fsversion: version of filesystem -andler used to format t-e volume. 'l,ays 2) for icefs 2.&. fsrevision: revision for folesystem -andler used to format t-e volume. #loc.si5e: 'l,ays po,er of 2. minimum ?12 #ytes) current ma&imum 42.. reserved2: reserved) 5ero filled. -as-type: !'S!/AB%=CCC. e&tinfo: pointer to 8e&tinfo8 o#$ect. $ournal: pointer to 8$ournal8 o#$ect. startofe&tents: Bointer to t-e first e&tent in volume. /-is e&tent is al,ays a preallocated meta pool created ,-en formattin0 t-e volume. endofe&tents: pointer to end of last e&tent in volume. po,erta#s: array of *+ pointers to 8po,er8 o#$ects. ori0=selfadr: pointer to our selves (t-e 8filesys8 o#$ect). Dseful in filesys #ac.up to remem#er t-e ori0inal address. reserved: array of reserved and 5eroed *+#it ,ords. ;1< start of time is Jan1111>E6 00:00 /-e 8$ournal8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 nummods i42 rsrvd i42 FdataG 111111111111111 metasi5e: 126..42E*6 (preferra#ly #loc.si5e ali0ned) metata0:

/'@=J"D3H'L nummods: num#er of meta o#$ects in $ournal. rsrvd: 5ero. FdataG: meta o#$ects of various si5es. /-e 8e&tinfo8 o#$ect: 111111111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 reuse=first p*+ reuse=last p*+ recycledfiles i42 recycled#ytes i*+ reserved ;21<i42 111111111111111111111 metasi5e: 126 metata0: /'@=%C/IH(" reuse=first: first metapool e&tent in list of non full meta pools. reuse=last: last metapool e&tent in list of non full meta pools. recycledfiles: num#er of files currently in recycled directory. recycled#ytes: num#er of #ytes currently in recycled directory. reserved: 5ero. /-e 8po,er8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 freespace i*+ pad12 ;12<i*+ po,er i42 pad i42 ta#le ;*+<p*+

111111111111111 metasi5e: *+0 metata0: /'@=B"I%3 freespace: total space in free e&tents lin.ed #y t-is po,er ta#le. pad12: 5ero. po,er: po,er num#er of t-is po,er. first po,er is 1) last is *+. filesys.po,erta#s;0< points to po,er 1. only po,er >..*4 are actually used) #ut all po,ers s-ould still #e initialised. pad: 5ero.

ta#le: (ree e&tents are c-ained to0et-er in a dou#ly lin.ed list and t-e ta#le contains pointers to t-e first e&tent in eac- list) or 5ero if empty. /-e JentryJ o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ lin. p*+ parent p*+ namee&t p*+ ne&t-lin. p*+ type i6 fla0s i6 namelen i1* dirne&t p*+ dirprev p*+ -as-ne&t p*+ -as-prev p*+ name ;4*<i6 111111111111111 metasi5e: 126 metata0: /'@=%H/3A lin.: pointer to eit-er 8softname8 if type : %/AB%=S"(/%H/3A) or 8node8 if type : %/AB%=!'32%H/3A.

parent:

pointer to parent entry. 5ero if root. namee&t: pointer to 8namee&t8 o#$ect if filename e&ceeds 4* c-aracters) else 5ero. ne&t-lin.: ne&t -ard lin.. pointer used to lin. -ard lin.s for an entry to0et-er. ori0inal entry is not part of t-is lin.a0e. type: %/AB%=CCC. eit-er %/AB%=!'32%H/3A or %/AB%=S"(/%H/3A. fla0s: %(L'@=CCC. only one fla0 for no,: %(L'@=!I22%H. namelen: total len0t- of filename in c-aracters. dirne&t: pointer to ne&t entry in directory) or 5ero if last. dirprev: pointer to previous entry in directory) or 5ero if first. -as-ne&t: pointer to ne&t entry in -as- ta#le lin.a0e. -as-prev: pointer to previous entry in -as- ta#le lin.a0e. name: array containin0 t-e (first 4* c-aracters of) filename.

/-e 8namee&t8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ str ;1000<i6 111111111111111 metasi5e: 126..102+ ( %/'BL"CKSI9% ali0ned) metata0: /'@=H' %%C/ str: array containin0 up to 1000 c-aracters e&tendin0 filename a#ove t-e 4* #yte limit.

/-e 8softname8 o#$ect:

111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ softtime p*+ str ;>>2<i6 111111111111111 metasi5e: 126..102+ ( %/'BL"CKSI9% ali0ned) metata0: /'@=S"(/H' % softtime: microsecond timestamp descri#in0 time of creation. str: 01terminated tar0et strin0 for softlin..

/-e 8node8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ type i6 rsrvd i6 fla0s i1* prot#its i42 reserved i42 o,nerinfo i42 ori0entry p*+ addentries p*+ modified i*+ data=first p*+ data=last p*+ dir=first p*+ dir=last p*+ e&tleft i*+ numentries i*+ filesi5e i*+ -asp*+ comment p*+ rsrvd4 ;4<i*+ 111111111111111 metasi5e: 126 metata0: /'@=H"2% type:

L data=first L data=last L e&tleft L filesi5e

%it-er H/AB%=(IL% or H/AB%=2I3. rsrvd: 5ero. fla0s: 2efined fla0s are H(L'@=L"@(IL% and H(L'@=3%CACL%2I3. prot#its: Brotection #its. Li.e ami0a protection #its #ut =,it-out= t-e inversion of t-e lo,er ni##le. reserved: 5ero. o,nerinfo: Li.e ami0a o,nerinfo. ori0entry: pointer to t-e 8ori0inal8 entry. addentries: pointer to t-e first -ard lin. entry) or 5ero. modified: modification timestamp e&pressed in micro seconds since start of time. data=first: pointer to first filedata e&tent) or 5ero. data=last: pointer to last filedata e&tent) or 5ero. dir=first: pointer to first entry in directory) or 5ero. dir=last: pointer to last entry in directory) or 5ero. e&tleft: num#er of #ytes left in last e&tent. numentries: num#er of entries in directory. not really used #y anyt-in0) #ut s-ould #e updated any,ay. filesi5e: t-e current filesi5e) if H/AB%=(IL%. -as-: pointer to 8-as-ta#8 o#$ect) if H/AB%=2I3. comment: pointer to 8comment8 o#$ect) or 5ero. rsrvd4: 5ero.

/-e 8-as-ta#8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ parent p*+ ta#le ;12+<p*+ 111111111111111 metasi5e: 102+ parent: #ac. pointer to t-e 8node8 o#$ect. ta#le: array of pointers to 8entry8 o#$ects. /-e 8#itmap8 o#$ect: 1111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 #mapadr p*+ reserved ;12<i*+ pooladr p*+ lon0s ;42<i42 1111111111111111 metasi5e: 2?* metata0: /'@=BI/ 'B reserved: 5ero. pooladr: al,ays selfadr 1 SI9%"( e&tent. lon0s: 126 #ytes of #itmap. %ac- #it represents a 126 #ytes #loc.. ' set #it means t-e #loc. is allocated. /-e first #it in #itmap maps to t-e selfadr of t-e #itmap itself (t-at is) directly after e&tent -eader). /-is means first t,o #its are al,ays set to allocate t-e #itmap itself. /-e last #it of t-e #itmap is al,ays set as it ,ould ot-er,ise represent free space outside ot t-e metapool. Because of t-e #itmap #e0innin0 *+ #ytes into metapool) metadata startin0 address in metapools is only *+1#yte ali0ned instead of 126#yte ali0ned.

%&tents
'll e&tents starts ,it- t-e 8e&tent8 o#$ect -eader) ,-ic- in turn starts ,it- t-e 8meta8 o#$ect -eader. /-e 8e&tent8 o#$ect: 111111111111111 selfadr p*+ c-ec.sum i42 metasi5e i1* metarsrvd i6 metata0 i6 e&tsi5e i*+ e&t=prev p*+ data=ne&t p*+ data=prev p*+ dataofs i42 data=e&tnum i42 data=node p*+ MN unions NM ran0e=ne&t p*+ ran0e=prev p*+ reuse=ne&t p*+ reuse=prev p*+ #itmap p*+ 111111111111111 metasi5e: *+ selfadr: al,ays #loc.si5e ali0ned. metasi5e: al,ays *+ metata0: /ype of e&tent (%/'@=CCC). /-ere are t-ree types of e&tents: %/'@=(3%%: free space. %/'@= %/'B""L: 126KiB meta pool. %/'@=(IL%2'/': file data e&tent. e&tsi5e: si5e of t-is e&tent in #ytes. al,ays #loc.si5e ali0ned. e&t=prev: pointer to previous e&tent #y placement on dis.. Oif metata0 : %/'@=(IL%2'/' data=ne&t: pointer to ne&t filedata e&tent. data=prev: pointer to previous filedata e&tent. dataofs:

L L L L L

data=ne&t data=prev data=ne&t data=prev dataofs

Start of filedata relative to start of e&tent descri#ed in #ytes. %it-er SI9%"( e&tent (*+) for files t-at fits in (#loc.si5e 7 SI9%"( e&tent)) or #loc.si5e for lar0er files. data=e&tnum: filedata e&tent num#er) startin0 at 0. data=node: pointer #ac. to t-e 8node8 o#$ect. Oendif Oif metata0 : %/'@=(3%% ran0e=ne&t: pointer to ne&t free e&tent in t-is ran0e. ran0e=prev: pointer to previous free e&tent in t-is ran0e. Oendif Oif metata0 : %/'@= %/'B""L reuse=ne&t: pointer to ne&t non full metapool e&tent. reuse=prev: pointer to previous non full metapool e&tent. #itmap: pointer to 8#itmap8 o#$ect. 'LI'AS selfadr P *+. Oendif

'dmin Space
/-e si5e of adminspace is 126KiB. It is located at #e0innin0 of partition and starts ,it- t-e 8icestart8 structure: 111111111111111 ice0 i42 fsys i42 filesys p42 111111111111111 ice0: fsys: t-e ma0ic ,ord 8(SAS8 filesys: pointer to t-e 8filesys8 o#$ect. eta o#$ects in admin space: 1 1 *+ 1 (ilesys %&tinfo Bo,erQs Journal t-e ma0ic ,ord I2=IC%(S=2ISK

Bac.ed up o#$ects. ' volume ends ,it- a copy of t-e root -as-ta#le o#$ect. 's of revision 4) also t-e filesys o#$ect is #ac.ed up) directly in front of t-e root -as-ta#le #ac.up. Bot- -as-ta#le and filesys #ac.up -ave t-eir 8selfadr8 field c-an0ed to #ac.up location. /-e 8filesys8 o#$ect .eeps a #ac.up of t-e ori0inal 8selfadr8 in its 8ori0=selfadr8 field.

Constants
I2=IC%(S=2ISK : 0&+>+4+?02 /'@=H"H% : 0 %/'@=(3%% : 1 %/'@= %/'B""L : 2 %/'@=(IL%2'/' : 4 %/'@=3%S%3R%2 : + /'@=(IL%SAS : ? /'@=%H/3A : * /'@=H"2% : E /'@=H' %%C/ : 6 /'@=C" %H/ : > /'@=S"(/H' % : 10 /'@=!'S!/'B : 11 /'@=H"/DS%2 : 12 /'@=B"I%3 : 14 /'@=J"D3H'L : 1+ /'@=BI/ 'B : 1? /'@=%C/IH(" : 1* /'@=3%S%3R%2 : 1E %/'B""LSI9% : 102+ N 126 '2 IHSB'C% : 102+ N 126 %/'BL"CKSI9% : 126 %/'BL"CK 'SK : 12E %/'BL"CKS!I(/ : E 'CBI/ 'BIH2%C : 41 BI/ 'BL"H@S : 42 BI/ 'B3%S%3R%2 : 12 %/AB%=!'32%H/3A : 0 %/AB%=S"(/%H/3A : 1 %(L'@=!I22%H : 1 'C%H/3AH' %BA/%S : 4* H/AB%=(IL% : 0 H/AB%=2I3 : 1 H(L'@=L"@(IL% : 1 H(L'@=3%CACL%2I3 : 2 !'S!/'B%H/3I%S : 12+ 'CH' %%C/S/3BA/%S : 1000 'CS"(/H' %S/3BA/%S : >>2 'CC" %H/S/3BA/%S : 1000 IHB"I%3 : * 'CB"I%3 : *4 3'H@%SB%3B"I%3 : *+ 3'H@%SB%3B"I%3S!I(/ : * (IL%SAS3%S%3R%2 : 1>

!'S!/AB%=H%I : 2 !'S!/AB%=H%I=C'S% : 4 %C/IH("3%S%3R%2 : 21 'CJ"D3H'LSI9% : 42E*6 'C %/'SI9% : 102+

(unctions
(unctions are ,ritten in % lan0ua0e) #ut s-ould not #e too -ard to translate into C or somet-in0 else. /-e directory entry name -as-in0 al0orit-m: Case sensitive: B3"C -as-2is.HameHe,Case12+(name:B/3 /" C!'3) 2%( i) v:0:L"H@) c 1G 0et a small seed from num#er of c-aracters I!IL% c :: name;v< %CI/ c : 8M8 vPP %H2I!IL% 1G compute I!IL% c :: %CI/ v :: %H2I!IL% -asname;<PP c : 8M8 v N 14 P c

1G and ad$ust into ran0e i :: v 'H2 12E I( i G: 12+ /!%H i :: i 1 12+ 1G ,rap %H2B3"C i Case insensitive: B3"C -as-2is.HameHe,12+(name:B/3 /" C!'3) 2%( i) v:0:L"H@) c 1G 0et a small seed from num#er of c-aracters I!IL% c :: name;v< %CI/ c : 8M8 vPP %H2I!IL% 1G compute I!IL% c :: %CI/ I( c -asname;<PP c : 8M8 G: 8a8 I( c F: 858 c 1: 42 %H2I(

%H2I( I( c G: 22+ I( c F: 2?+ I( c FG 2+E c 1: 42 %H2I( %H2I( %H2I( v :: v N 14 P c %H2I!IL%

1G and ad$ust into ran0e i :: v 'H2 12E I( i G: 12+ /!%H i :: i 1 12+ 1G ,rap %H2B3"C i /-e /LS( po,erMinde& loo.up function: B3"C 0et=e&tsi5e=po,ernum(e&tsi5e:II2%) (L"H@) L"H@) 2%( po,erval:II2%) p:*+) inde& 3%B%'/ p11 po,erval :: L 1 S!L p DH/IL L po,erval 'H2 e&tsi5e inde& :: L e&tsi5e 1 po,erval S!L 3'H@%SB%3B"I%3S!I(/ S!3 p %H2B3"C p) inde& /-e meta setMc-ec. sum functions: B3"C set etaSum(meta:B/3 /" meta) 2%( v:HIL:L"H@) ptr:B/3 /" L"H@) len ptr :: meta meta.c-ec.sum :: HIL len :: meta.metasi5e S!3 2 P 1 I!IL% len11 2" v :: v P ptr;<PP meta.c-ec.sum :: v %H2B3"C B3"C c-ec. etaSum(meta:B/3 /" meta) 2%( v:HIL:L"H@) ptr:B/3 /" L"H@) len ptr :: meta len :: meta.metasi5e S!3 2 P 1 I!IL% len11 2" v :: v P ptr;<PP %H2B3"C (v 1 meta.c-ec.sum) : meta.c-ec.sum

Vous aimerez peut-être aussi