Vous êtes sur la page 1sur 27

Types of files

Stable registry hives consist of primary files, transaction log files, and backup copies of primary
files. Primary files and their backup copies share the same format to hold actual data making up a
Windows registry, transaction log files are used to perform fault-tolerant writes to primary files.
Before writing modified (dirty) data to a primary file, a hive writer will store this data in a
transaction log file. If an error (like a system crash) occurs when writing to a transaction log file,
a primary file will remain consistent; if an error occurs when writing to a primary file, a
transaction log file will contain enough data to recover a primary file and bring it back to the
consistent state.

Examples of hives and supporting files

Windows 8.1: System hive

 Primary file:
o \Windows\System32\config\SYSTEM
 Transaction log files:
o \Windows\System32\config\SYSTEM.LOG
o \Windows\System32\config\SYSTEM.LOG1
o \Windows\System32\config\SYSTEM.LOG2
 Backup copy of a primary file:
o \Windows\System32\config\RegBack\SYSTEM

Windows 8.1: BCD hive

 Primary file:
o \Boot\BCD
 Transaction log files:
o \Boot\BCD.LOG
o \Boot\BCD.LOG1
o \Boot\BCD.LOG2

Windows Vista: System hive

 Primary file:
o \Windows\System32\config\SYSTEM
 Transaction log files:
o \Windows\System32\config\SYSTEM.LOG
o \Windows\System32\config\SYSTEM.LOG1
o \Windows\System32\config\SYSTEM.LOG2
 Backup copies of a primary file:
o \Windows\System32\config\SYSTEM.SAV
o \Windows\System32\config\RegBack\SYSTEM
o \Windows\System32\config\RegBack\SYSTEM.OLD
Windows Vista: SAM hive

 Primary file:
o \Windows\System32\config\SAM
 Transaction log files:
o \Windows\System32\config\SAM.LOG
o \Windows\System32\config\SAM.LOG1
o \Windows\System32\config\SAM.LOG2
 Backup copies of a primary file:
o \Windows\System32\config\RegBack\SAM
o \Windows\System32\config\RegBack\SAM.OLD

Windows XP: System hive

 Primary file:
o \WINDOWS\system32\config\system
 Transaction log file:
o \WINDOWS\system32\config\system.LOG
 Backup copy of a primary file:
o \WINDOWS\system32\config\system.sav

Transactional registry (TxR)


Transactional registry, introduced in Windows Vista, is a feature that allows a program to
accumulate multiple modifications of a registry hive within a transaction, which can be
committed or rolled back as a single unit (like a database transaction). Such a transaction is
written to storage files of the Common Log File System (CLFS).

Transactional registry and storage files of the CLFS are out of the scope of this document.

Format of primary files


A primary file consists of a base block, also known as a file header, and hive bins data. Hive bins
data consists of hive bins. Each hive bin includes a header and cells, cells are the actual storage
of high-level registry entities (like keys, values, etc.). Primary files may contain a padding or
remnant data of an arbitrary size after the end of the last hive bin.
Base block

The base block is 4096 bytes in length, it contains the following structure in Windows XP
(hereinafter, all numbers and bit masks are in the little-endian form, all data units are bytes,
unless otherwise mentioned):

Offset Length Field Value(s) Description


0 4 Signature regf ASCII string
Primary
This number is incremented by 1 in the beginning of a
4 4 sequence
write operation on the primary file
number
This number is incremented by 1 at the end of a write
Secondary
operation on the primary file, a primary sequence
8 4 sequence
number and a secondary sequence number should be
number
equal after a successful write operation
Last written
12 8 FILETIME (UTC)
timestamp
20 4 Major version 1 Major version of a hive writer
3, 4, 5,
24 4 Minor version Minor version of a hive writer
or 6
28 4 File type 0 0 means primary file
32 4 File format 1 1 means direct memory load
Root cell Offset of a root cell in bytes, relative from the start of
36 4
offset the hive bins data
Offset Length Field Value(s) Description
Hive bins data
40 4 Size of the hive bins data in bytes
size
Clustering Logical sector size of the underlying disk in bytes
44 4
factor divided by 512
UTF-16LE string (contains a partial file path to the
48 64 File name primary file, or a file name of the primary file), used
for debugging purposes
112 396 Reserved
508 4 Checksum XOR-32 checksum of the previous 508 bytes
512 3576 Reserved
4088 4 Boot type This field has no meaning on a disk
4092 4 Boot recover This field has no meaning on a disk

In Windows 10, the following fields are found to be allocated in the previously reserved areas
(also, see notes below):

Offset Length Field Value Description


112 16 RmId GUID, see below
128 16 LogId GUID, see below
144 4 Flags Bit mask, see below
148 16 TmId GUID, see below
164 4 GUID signature rmtm ASCII string
168 8 Last reorganized timestamp FILETIME (UTC), see below
... ...
4040 16 ThawTmId GUID, this field has no meaning on a disk
4056 16 ThawRmId GUID, this field has no meaning on a disk
4072 16 ThawLogId GUID, this field has no meaning on a disk

The RmId field contains a GUID of the Resource Manager (RM) and the TmId field contains a
GUID used to generate a file name of a log file stream for the Transaction Manager (TM), see
the ZwCreateResourceManager() and ZwCreateTransactionManager() routines; the LogId field
contains a GUID used to generate a file name of a physical log file, see the ClfsCreateLogFile()
routine. The GUID signature field is always set to the "rmtm" ASCII string (unless a hive is
frozen).

The Flags field is used to record the state of the Kernel Transaction Manager (KTM) and the
hive, possible flags are:

Mask Description
0x00000001 KTM locked the hive (there are pending or anticipated transactions)
Mask Description
The hive has been defragmented (all its pages are dirty therefore) and it is being
written to a disk (Windows 8 and Windows Server 2012 only, this flag is used to
0x00000002 speed up hive recovery by reading a transaction log file instead of a primary file);
this hive supports the layered keys feature (starting from Insider Preview builds of
Windows 10 "Redstone 1")

The Offline Registry Library (offreg.dll) is writing the following additional fields to the base
block when the hive is serialized (saved):

Offset Length Field Value Description


176 4 Signature OfRg ASCII string
180 4 Flags 1 This is the only value used
512 8 Serialization timestamp FILETIME (UTC)

These fields are used in current versions of the library (versions: 6.2, 6.3, 10.0). The fields
defined below are used in legacy versions of the library (version: 6.1):

Offset Length Field Value Description


168 4 Signature OfRg ASCII string
172 4 Flags 1 This is the only value used
512 8 Serialization timestamp FILETIME (UTC)

Notes

1. File offset of a root cell = 4096 + Root cell offset. This formula also applies to any other
offset relative from the start of the hive bins data (however, if such a relative offset is
equal to 0xFFFFFFFF, it doesn't point anywhere).
2. The XOR-32 checksum is calculated using the following algorithm:
o let C be a 32-bit value, initial value is zero;
o let D be data containing 508 bytes;
o split D into non-overlapping groups of 32 bits, and for each group, G[i], do the
following: C = C xor G[i];
o if C is equal to -1, set C to -2;
o if C is equal to 0, set C to 1;
o C is the checksum.
3. The Boot type and Boot recover fields are used for in-memory hive recovery management
by a boot loader and a kernel, they are not written to a disk in most cases (when
Clustering factor is 8, these fields may be written to a disk, but they have no meaning
there).
4. New fields, except the Last reorganized timestamp, were introduced in Windows Vista as
a part of the CLFS. The Last reorganized timestamp field was introduced in Windows 8
and Windows Server 2012.
5. The Last reorganized timestamp field contains a timestamp of the latest hive
reorganization (it happens once a week when a hive isn't locked). The reorganization
process may either defragment the hive on a disk by storing allocated data compactly in a
primary file or clear the access history of key nodes in the hive (see the Access bits field
of a key node). The defragmentation process implies clearing the access history of key
nodes in the hive. The first 2 bits of the Last reorganized timestamp field, counting from
the least significant bit, have a special meaning: when the first bit is set to 1, the hive was
defragmented during the latest reorganization; when the second bit is set to 1, the access
history of key nodes in the hive was cleared during the latest reorganization. When the
Last reorganized timestamp field is equal to 1, the next reorganization will defragment
the hive; when the Last reorganized timestamp field is equal to 2, the next reorganization
will clear the access history of key nodes in the hive (these are special values to enforce
the specific operation).
6. The LogId field usually contains the same value as the RmId field.
7. When the RmId field is null, the LogId and TmId fields may contain garbage data.
8. The ThawTmId, ThawRmId, and ThawLogId fields are used to restore the state of the
TmId, RmId, and LogId fields respectively when thawing a hive (after it was frozen in
order to create a shadow copy).
9. The Last written timestamp field isn't updated as of Windows 8.1 and Windows Server
2012 R2 (but it can be set when a hive is created).

Hive bin

The Hive bin is variable in size and consists of a header and cells. A hive bin's header is 32 bytes
in length, it contains the following structure:

Offset Length Field Value Description


0 4 Signature hbin ASCII string
Offset of a current hive bin in bytes, relative from the
4 4 Offset
start of the hive bins data
8 4 Size Size of a current hive bin in bytes
12 8 Reserved
FILETIME (UTC), defined for the first hive bin only
20 8 Timestamp
(see below)
Spare (or
28 4 This field has no meaning on a disk (see below)
MemAlloc)

Notes

1. A hive bin size is multiple of 4096 bytes.


2. The Spare field is used when shifting hive bins and cells in memory. In Windows 2000,
the same field is called MemAlloc, it is used to track memory allocations for hive bins.
3. A Timestamp in the header of the first hive bin acts as a backup copy of a Last written
timestamp in the base block.
Cell

Cells fill the remaining space of a hive bin (without gaps between them), each cell is variable in
size and has the following structure:

Offset Length Field Description


Size of a current cell in bytes, including this field (aligned to 8 bytes): the
0 4 Size size is positive if a cell is unallocated or negative if a cell is allocated
(use absolute values for calculations)
Cell
4 ...
data

Cell data may contain one of the following records:

Record Description
Index leaf (li) Subkeys list
Fast leaf (lf) Subkeys list with name hints
Hash leaf (lh) Subkeys list with name hashes
Index root (ri) List of subkeys lists (used to subdivide subkeys lists)
Key node (nk) Registry key node
Key value (vk) Registry key value
Key security (sk) Security descriptor
Big data (db) List of data segments

Also, Cell data may contain raw data (e.g. Key value data) and any list defined below. A padding
may be present at the end of a cell.

When a record (entity) contains an offset field pointing to another record (entity), this offset
points to a cell containing the latter record (entity). As already mentioned above, an offset
relative from the start of the hive bins data doesn't point anywhere when it is equal to
0xFFFFFFFF.

Index leaf

The Index leaf has the following structure:

Offset Length Field Value Description


0 2 Signature li ASCII string
2 2 Number of elements
4 ... List elements

A list element has the following structure (a list element is repeated Number of elements times):
Offset Length Field Description
0 4 Key node offset In bytes, relative from the start of the hive bins data

All list elements are required to be sorted ascending by an uppercase version of a key name
string (comparison should be based on character codes).

Fast leaf

The Fast leaf has the following structure:

Offset Length Field Value Description


0 2 Signature lf ASCII string
2 2 Number of elements
4 ... List elements

A list element has the following structure:

Offset Length Field Description


Key node
0 4 In bytes, relative from the start of the hive bins data
offset
The first 4 ASCII characters of a key name string (used to speed
4 4 Name hint
up lookups)

If a key name string is less than 4 characters in length, it is stored in the beginning of the Name
hint field (hereinafter, the beginning of the field means the byte at the lowest address or the first
few bytes at lower addresses in the field), unused bytes of this field are set to null. UTF-16LE
characters are converted to ASCII (extended), if possible (if it isn't, the first byte of the Name
hint field is null).

Hereinafter, an extended ASCII string means a string made from UTF-16LE characters with
codes less than 256 by removing the null byte (at the highest address) from each character.

All list elements are required to be sorted (as described above).

Hash leaf

The Hash leaf has the following structure:

Offset Length Field Value Description


0 2 Signature lh ASCII string
2 2 Number of elements
4 ... List elements
A list element has the following structure:

Offset Length Field Description


Key node
0 4 In bytes, relative from the start of the hive bins data
offset
Hash of a key name string, see below (used to speed up
4 4 Name hash
lookups)

All list elements are required to be sorted (as described above). The Hash leaf is used when the
Minor version field of the base block is greater than 4.

Note

The hash is calculated using the following algorithm:

 let H be a 32-bit value, initial value is zero;


 let N be an uppercase name of a key;
 split N into characters, and for each character (exactly two bytes in the case of UTF-
16LE, also known as a wide character), treated as a number (character code), C[i], do the
following: H = 37 * H + C[i];
 H is the hash value.

Index root

The Index root has the following structure:

Offset Length Field Value Description


0 2 Signature ri ASCII string
2 2 Number of elements
4 ... List elements

A list element has the following structure:

Offset Length Field Description


0 4 Subkeys list offset In bytes, relative from the start of the hive bins data

Notes

1. An Index root can't point to another Index root.


2. A Subkeys list can't point to an Index root.
3. List elements within subkeys lists referenced by a single Index root must be sorted as a
whole (i.e. the first list element of the second subkeys list must be greater than the last list
element of the first subkeys list).
Key node

The Key node has the following structure:

Offset Length Field Value Description


0 2 Signature nk ASCII string
2 2 Flags Bit mask, see below
Last written
4 8 FILETIME (UTC)
timestamp
Bit mask, see below (this field is used as of Windows 8
12 4 Access bits and Windows Server 2012; in previous versions of
Windows, this field is reserved and called Spare)
Offset of a parent key node in bytes, relative from the
16 4 Parent start of the hive bins data (this field has no meaning on
a disk for a root key node)
Number of
20 4
subkeys
Number of
24 4
volatile subkeys
Subkeys list In bytes, relative from the start of the hive bins data
28 4
offset (also, this field may point to an Index root)
Volatile subkeys This field has no meaning on a disk (volatile keys are
32 4
list offset not written to a file)
Number of key
36 4
values
Key values list
40 4 In bytes, relative from the start of the hive bins data
offset
Key security
44 4 In bytes, relative from the start of the hive bins data
offset
Class name
48 4 In bytes, relative from the start of the hive bins data
offset
Largest subkey In bytes, a subkey name is treated as a UTF-16LE
52 4
name length string (see below)
Largest subkey
56 4 class name In bytes
length
Largest value
60 4 In bytes, a value name is treated as a UTF-16LE string
name length
Largest value
64 4 In bytes
data size
68 4 WorkVar Cached index (see below)
72 2 Key name length In bytes
Offset Length Field Value Description
Class name
74 2 In bytes
length
76 ... Key name string ASCII (extended) string or UTF-16LE string

Starting from Windows Vista, Windows Server 2003 SP2, and Windows XP SP3, the Largest
subkey name length field has been split into 4 bit fields (the offsets below are relative from the
beginning of the old Largest subkey name length field, i.e. the first bit field starts within the byte
at the lowest address):

Offset (bits) Length (bits) Field Description


0 16 Largest subkey name length
16 4 Virtualization control flags Bit mask, see below
20 4 User flags (Wow64 flags) Bit mask, see below
24 8 Debug See below

Warning

When implementing the structure defined above in a program, keep in mind that a compiler may
pack the Virtualization control flags and User flags bit fields in a different way. In C, two or
more bit fields inside an integer may be packed right-to-left, so the first bit field defined in an
integer may reside in the less significant (right) bits. In debug symbols for Windows, the
UserFlags field is defined before the VirtControlFlags field exactly for this reason (however,
these fields are written to a file in the order indicated in the table above).

End of warning

Starting from Insider Preview builds of Windows 10 "Redstone 1", the Access bits field has been
split into the following fields:

Offset Length Field Description


0 1 Access bits
1 1 Layered key bit fields See below
2 2 Spare Not used

The Layered key bit fields have the following structure (the offsets below are relative from the
most significant bit):

Offset (bits) Length (bits) Field Value(s) Description


0 1 Inherit class 0 or 1 See below
1 5 Spare Not used
6 2 Layer semantics 0, 1, 2, or 3 See below
Access bits

Access bits are used to track when key nodes are being accessed (e.g. via the RegCreateKeyEx()
and RegOpenKeyEx() calls), the field is set according to the following bit masks:

Mask Description
This key was accessed before a Windows registry was initialized with the
0x1
NtInitializeRegistry() routine during the boot
This key was accessed after a Windows registry was initialized with the
0x2
NtInitializeRegistry() routine during the boot

When the Access bits field is equal to 0, the access history of a key is clear.

Class name

Typically, a class name is a UTF-16LE string assigned to a key node; it doesn't have any
particular meaning. For example, Microsoft was using class names to store keys for the Syskey
encryption. Originally, class names were intended to be used to define key node types (similar to
data types of the Key value).

WorkVar

In Windows 2000, the WorkVar field may contain a zero-based index of a subkey in a subkeys
list found at the last successful lookup (used to speed up lookups when the same subkey is
accessed many times consecutively: a list element with the cached index is always tried first);
such a cached index may point to a list element either in a subkeys list or in a volatile subkeys
list, the latter is examined last.

This field isn't used as of Windows XP.

Flags

In Windows XP and Windows Server 2003, the first 4 bits, counting from the most significant
bit, are reserved for user flags (set via the NtSetInformationKey() call, read via the NtQueryKey()
call). Other bits have the following meaning:

Mask Name Description


0x000 Is volatile (not used, a key node on a disk isn't expected to
KEY_VOLATILE
1 have this flag set)
0x000 Is the mount point of another hive (a key node on a disk isn't
KEY_HIVE_EXIT
2 expected to have this flag set)
0x000
KEY_HIVE_ENTRY Is the root key for this hive
4
0x000
KEY_NO_DELETE This key can't be deleted
8
Mask Name Description
This key is a symlink (a target key is specified as a UTF-
16LE string (REG_LINK) in a value named
0x001
KEY_SYM_LINK "SymbolicLinkValue", example:
0
\REGISTRY\MACHINE\SOFTWARE\Classes\Wow6432Nod
e)
0x002 Key name is an ASCII string, possibly an extended ASCII
KEY_COMP_NAME
0 string (otherwise it is a UTF-16LE string)
0x004 KEY_PREDEF_HANDL Is a predefined handle (a handle is stored in the Number of
0 E key values field)

The following bits are also used as of Windows Vista:

Mask Name Description


0x0080 VirtualSource This key was virtualized at least once
0x0100 VirtualTarget Is virtual
0x0200 VirtualStore Is a part of a virtual store path

It is plausible that both a registry key virtualization (when registry writes to sensitive locations
are redirected to per-user locations in order to protect a Windows registry against corruption) and
a registry key reflection (when registry changes are synchronized between keys in 32-bit and 64-
bit views; this feature was removed in Windows 7 and Windows Server 2008 R2) required more
space than this field can provide, that is why the Largest subkey name length field was split and
the new fields were introduced.

Starting from Windows Vista, user flags were moved away from the first 4 bits of the Flags field
to the new User flags bit field (see above). These user flags in the new location are also called
Wow64 flags. In Windows XP and Windows Server 2003, user flags are stored in the old
location anyway.

It should be noted that, in Windows Vista and Windows 7, the 4th bit (counting from the most
significant bit) of the Flags field is set to 1 in many key nodes belonging to different hives; this
bit, however, can't be read through the NtQueryKey() call. Such key nodes are present in initial
primary files within an installation image (install.wim), and their number doesn't increase after
the installation. A possible explanation for this oddity is that initial primary files were generated
on Windows XP or Windows Server 2003 using the Wow64 subsystem (see below).

User flags

The User flags field (in the appropriate location for a version of Windows being used) is set
according to the following bit masks:
Mask Description
Is a 32-bit key: this key was created through the Wow64 subsystem or this key shall not
0x1
be used by a 64-bit program (e.g. by a 64-bit driver during the boot)
0x2 This key was created by the reflection process (when reflecting a key from another view)
0x4 Disable registry reflection for this key
In the old location of the User flags field: execute the int 3 instruction on an access to this
key (both retail and checked Windows kernels), this bit was superseded by the Debug
0x8 field (see below); in the new location of the User flags field: disable registry reflection
for this key if a corresponding key exists in another view and it wasn't created by a caller
(see below)

In Windows 7, Windows Server 2008 R2, and more recent versions of Windows, the bit mask
0x1 isn't used to mark 32-bit keys created by userspace programs.

The bit mask 0x8 was supported in the new location of the User flags field only in pre-release
versions of Windows Vista, e.g. beta 2 [1].

Virtualization control flags

The Virtualization control flags field is set according to the following bit masks:

Mask Name Description


0x2 REG_KEY_DONT_VIRTUALIZE Disable registry write virtualization for this key
0x4 REG_KEY_DONT_SILENT_FAIL Disable registry open virtualization for this key
Propagate virtualization flags to new child keys
0x8 REG_KEY_RECURSE_FLAG
(subkeys)

Debug

When the CmpRegDebugBreakEnabled kernel variable is set to 1, a checked Windows kernel


will execute the int 3 instruction on various events according to the bit mask in the Debug field.
A retail Windows kernel has this feature disabled.

The following bit masks are supported:

Mask Name Event description


0x01 BREAK_ON_OPEN This key is opened
0x02 BREAK_ON_DELETE This key is deleted
0x04 BREAK_ON_SECURITY_CHANGE A key security item is changed for this key
0x08 BREAK_ON_CREATE_SUBKEY A subkey of this key is created
0x10 BREAK_ON_DELETE_SUBKEY A subkey of this key is deleted
0x20 BREAK_ON_SET_VALUE A value is set to this key
0x40 BREAK_ON_DELETE_VALUE A value is deleted from this key
Mask Name Event description
0x80 BREAK_ON_KEY_VIRTUALIZE This key is virtualized

Layered keys

Layered keys were introduced in Insider Preview builds of Windows 10 "Redstone 1". When a
hive supports the layered keys feature, a kernel may treat some key nodes in a special way. This
feature is used to implement the containerized registry.

When a kernel is accessing a key node treated as a part of a layered key, it builds a key node
stack, including the key node being accessed, its "parent" key node and no more than 2 (or 14 as
of Windows 10 Insider Preview Build 14986, or 126 as of Windows 10 Insider Preview Build
15025) "parent" key nodes towards the top layer. Then this stack is used to produce cumulative
information about the layered key. For example, if you query the last written timestamp for a
layered key, the most recent timestamp will be returned from the key node stack; if you
enumerate key values for a layered key, key values from key nodes in the stack will be returned
(except tombstone values; if there are two or more values with the same name in the key node
stack, a value from a lower ("child") key node is used and returned).

When the Inherit class field is set to 0, the layered key will have the same class name as the key
node originally accessed by a kernel. Otherwise, the layered key will receive the same class
name (possibly an empty class name) as an upper ("parent") key node (from the stack) having the
Inherit class field set to 0.

The Layer semantics field is set using the following values:

Value Name Description


This key node and its "parent" key nodes can be included in the current
0
layered key (based on the stack built)
Is a tombstone key node: this key node and its "parent" key nodes can't
1 IsTombstone be included in the current layered key (also, such a key node has no
class name, no subkeys, and no values)
This key node can be included in the current layered key, but its
2 IsSupersedeLocal
"parent" key nodes can't
This key node can be included in the current layered key, but its
3 IsSupersedeTree "parent" key nodes can't; child key nodes (subkeys), except tombstone
key nodes, are required to have the same value set in the field

Here, a "parent" key node doesn't mean a key node which offset is stored in the Parent field of a
given key node. And a "child" key node doesn't mean a subkey of a given key node (unless
otherwise mentioned). Instead, these relationships are created between key nodes in different
registry hives (a host registry hive, which represents the top layer, and a differencing registry
hive, which represents the lower layer; both registry hives are used to provide the merged view
of registry data, the top layer provides base data and the lower layer provides modifications to be
applied on top of base data).
Key values list

The Key values list has the following structure:

Offset Length Field


0 ... List elements

A list element has the following structure:

Offset Length Field Description


0 4 Key value offset In bytes, relative from the start of the hive bins data

List elements are not required to be sorted.

Key value

The Key value has the following structure:

Offset Length Field Value Description


0 2 Signature vk ASCII string
2 2 Name length In bytes, can be 0 (name isn't set)
In bytes, can be 0 (value isn't set), the most significant bit
4 4 Data size
has a special meaning (see below)
In bytes, relative from the start of the hive bins data (or
8 4 Data offset
data itself, see below)
12 4 Data type See below
16 2 Flags Bit mask, see below
18 2 Spare Not used
Value name
20 ... ASCII (extended) string or UTF-16LE string
string

Data size

When the most significant bit is 1, data (4 bytes or less) is stored in the Data offset field directly
(when data contains less than 4 bytes, it is being stored as is in the beginning of the Data offset
field). The most significant bit (when set to 1) should be ignored when calculating the data size.

When the most significant bit is 0, data is stored in the Cell data field of another cell (pointed by
the Data offset field) or in the Cell data fields of multiple cells (referenced in the Big data
structure stored in a cell pointed by the Data offset field).

Data types
Value Name(s)
0x00000000 REG_NONE
0x00000001 REG_SZ
0x00000002 REG_EXPAND_SZ
0x00000003 REG_BINARY
0x00000004 REG_DWORD, REG_DWORD_LITTLE_ENDIAN
0x00000005 REG_DWORD_BIG_ENDIAN
0x00000006 REG_LINK
0x00000007 REG_MULTI_SZ
0x00000008 REG_RESOURCE_LIST
0x00000009 REG_FULL_RESOURCE_DESCRIPTOR
0x0000000a REG_RESOURCE_REQUIREMENTS_LIST
0x0000000b REG_QWORD, REG_QWORD_LITTLE_ENDIAN

Other values are allowed as well (but they are not predefined).

Flags

Mask Name Description


Value name is an ASCII string, possibly an extended ASCII
0x0001 VALUE_COMP_NAME
string (otherwise it is a UTF-16LE string)
Is a tombstone value (the flag is used starting from Insider
Preview builds of Windows 10 "Redstone 1"), a tombstone
0x0002 IsTombstone value also has the Data type field set to REG_NONE, the
Data size field set to 0, and the Data offset field set to
0xFFFFFFFF

Key security

The Key security item has the following structure:

Offset Length Field Value Description


0 2 Signature sk ASCII string
2 2 Reserved
4 4 Flink See below
8 4 Blink See below
12 4 Reference count Number of key nodes pointing to this item
16 4 Security descriptor size In bytes
20 ... Security descriptor

Notes
1. Flink and blink are offsets in bytes, relative from the start of the hive bins data.
2. Key security items form a doubly linked list. A key security item may act as a list header
or a list entry (the only difference here is the meaning of flink and blink fields).
3. When a key security item acts as a list header, flink and blink point to the first and the
last entries of this list respectively. If a list is empty, flink and blink point to a list header
(i.e. to a current cell).
4. When a key security item acts as a list entry, flink and blink point to the next and the
previous entries of this list respectively. If there is no next entry in a list, flink points to a
list header. If there is no previous entry in a list, blink points to a list header.

Big data

The Big data is used to reference data larger than 16344 bytes (when the Minor version field of
the base block is greater than 3), it has the following structure:

Offset Length Field Value Description


0 2 Signature db ASCII string
2 2 Number of segments
Offset of a list of In bytes, relative from the start of the hive bins
4 4
segments data

The list of segments has the following structure:

Offset Length Field


0 ... List elements

A list element has the following structure:

Offset Length Field Description


0 4 Data segment offset In bytes, relative from the start of the hive bins data

Data segment

A data segment is stored in the Cell data field of a cell pointed by the Data segment offset field.
A data segment has the maximum size of 16344 bytes.

Data segments of a Big data record, except the last one, always have the maximum size.

Unallocated cell

If a cell to be marked as unallocated has an adjacent unallocated cell, these cells are coalesced,
this is why a single unallocated cell may contain multiple remnant records (entities).
In Windows 2000, the following record is written to the beginning of the Cell data field of an
unallocated cell:

Offset Length Field Description


Offset of a next unallocated cell in a free list (in bytes, relative from the
0 4 Next start of the hive bins data) or 0xFFFFFFFF, if there is no such a cell in that
list (there are many free lists for a single hive in memory)

The record described above isn't used as of Windows XP.

Summary

1. A Base block points to a root cell, which contains a Key node.


2. A Key node points to a parent Key node, to a Subkeys list (a subkey is a Key node too), to
a Key values list, to a Key security item.
3. A Subkeys list can be subdivided with the help of the Index root structure.
4. A Key value points to data. Data may be stored in the Data offset field of a Key value
structure, in a separate cell, or in a bunch of cells. In the last case, a Key value points to
the Big data structure in a cell.

Format of transaction log files


Old format

A transaction log file (old format) consists of a base block, a dirty vector, and dirty pages.

Base block

A partial backup copy of a base block is stored in the first sector of a transaction log file, only
the first Clustering factor * 512 bytes of a base block are written there.

A backup copy of a base block isn't an exact copy anyway, the following modifications are
performed on it by a hive writer:
1. As of Windows XP, the File type field is set to 1 (1 means transaction log). In Windows
2000, the File type field is set to 2 (while 1 is reserved for in-memory hive recovery
management using an alternate file for the System hive –
\WINNT\system32\config\SYSTEM.ALT, an alternate file is a mirror of a primary file used
as a backup copy).
2. In Windows 2000, Windows XP, and Windows Server 2003, a primary sequence number
is incremented by 1 before writing a log of dirty data; a secondary sequence number is
incremented by 1 after a log of dirty data was written. As of Windows Vista, a primary
sequence number and the same secondary sequence number (incremented by 1) are
written at once before writing a log of dirty data.
3. The Checksum field is recalculated for the modified data.

Notes

1. A partial backup copy of a base block is made using data from memory, not from a
primary file.
2. A transaction log file is considered to be valid when it has an expected base block
(including the modifications mentioned above), and its primary sequence number is equal
to its secondary sequence number. An invalid transaction log file can be repaired by the
self-healing process (this will reset the fields of a base block in memory to reasonable
values, i.e. the Signature field will be reset to the proper value, the Hive bins data size
field will be reset to a value calculated from the file size of a primary file, the Clustering
factor field will be reset to a value for an underlying disk, the sequence numbers will be
reset to 1, the Checksum field will be recalculated).
3. A valid transaction log file can be applied to a dirty hive when a Last written timestamp
in its base block is equal to a Last written timestamp in a base block of a primary file
(when a base block of a primary file is invalid, i.e. it has a wrong Checksum, a Timestamp
from the first hive bin is used instead). Also, a transaction log file can be applied to a
dirty hive after the self-healing process.
4. If a base block of a primary file has a wrong Checksum, it is being recovered using a base
block from a transaction log file (and the File type field is set back to 0).
5. In Windows 2000, Windows XP, and Windows Server 2003, the same memory region is
used to write a base block both to a transaction log file and to a primary file, that is why,
after a successful write operation, a base block of a primary file will contain two
sequence numbers equal to N, and a base block of a transaction log file will contain two
sequence numbers equal to N - 1 (sequence numbers are incremented by 1 for a
transaction log file first, and then they are incremented by 1 again for a primary file). As
of Windows Vista, a copy of a base block is used when writing to a transaction log file,
that is why a transaction log file and a primary file will contain the same sequence
numbers in their base blocks after a successful write operation.

Dirty vector

The Dirty vector is stored starting from the beginning of the second sector of a transaction log
file, it has the following structure:
Offset Length Field Value Description
0 4 Signature DIRT ASCII string
4 ... Bitmap Bitmap of dirty pages

Each bit of a bitmap corresponds to the state of a specific 512-byte page within hive bins data to
be written to a primary file from memory, regardless of a logical sector size of an underlying
disk (these pages don't overlap, there are no gaps between them):

 the first bit, bit #1, corresponds to the state of the first 512-byte page within hive bins
data;
 bit #2 corresponds to the state of the second 512-byte page within hive bins data, etc.

Bits of a bitmap are checked using the bt instruction or its equivalent based on bit shifting. This
means that bits are packed into bytes, the first byte of a bitmap contains bits #1-#8, the second
byte contains bits #9-#16, and so on. Within a byte, bit numbering starts at the least significant
bit.

The meaning of each bit in a bitmap is the following:

Bit Description
0 A corresponding page is clean
1 A corresponding page is dirty

Bitmap length (in bits) is calculated using the following formula: Bitmap length = Hive bins data
size / 512.

Notes

1. The state of a base block isn't recorded in a dirty vector.


2. A dirty vector is stored in memory as a RTL_BITMAP structure. However, only the
Buffer field of this structure is written to a transaction log file.
3. Windows tracks changes to hive bins data in 4096-byte blocks (as of Windows XP). This
means that a dirty vector's bitmap is expected to contain only the following bytes: 0x00
and 0xFF.
4. A padding is likely to be present after the end of a bitmap (up to a sector boundary).

Dirty pages

Dirty pages are stored starting from the beginning of the sector following the last sector of a
dirty vector. Each dirty page is stored at an offset divisible by 512 bytes and has a length of 512
bytes, there are no gaps between dirty pages.

The first dirty page corresponds to the first bit set to 1 in the bitmap of a dirty vector, the second
dirty page corresponds to the second bit set to 1 in the bitmap of a dirty vector, etc. During
recovery, contiguous dirty pages belonging to the same hive bin in a primary file are processed
together, and a dirty hive bin is verified for correctness (its Signature must be correct, its Size
must not be less than 4096 bytes, its Offset must match the Offset of a corresponding hive bin in
a primary file); recovery stops if a dirty hive bin is invalid, an invalid dirty hive bin is ignored.

Notes

1. The number of dirty pages is equal to the number of bits set to 1 in the bitmap of a dirty
vector. Remnants of previous dirty pages may be present after the end of the last dirty
page.
2. A dirty page will be written to a primary file at the following offset: File offset = 4096 +
512 * Bit position, where Bit position is the index (zero-based) of a corresponding bit in
the bitmap of a dirty vector.

New format

A transaction log file (new format) consists of a base block and log entries. This format was
introduced in Windows 8.1 and Windows Server 2012 R2.

Base block

A modified partial backup copy of a base block is stored in the first sector of a transaction log
file in the same way as in the old format and for the same purpose. However, the File type field
is set to 6.

Log entries

Log entries are stored starting from the beginning of the second sector. Each log entry is stored at
an offset divisible by 512 bytes and has a variable size (multiple of 512 bytes), there are no gaps
between log entries.

A log entry has the following structure:

Offset Length Field Value Description


0 4 Signature HvLE ASCII string
Offset Length Field Value Description
4 4 Size Size of a current log entry in bytes
Partial copy of the Flags field of the base block at the time
8 4 Flags
of creation of a current log entry (see below)
This number constitutes a possible value of the Primary
sequence number and Secondary sequence number fields of
Sequence
12 4 the base block in memory after a current log entry is applied
number
(these fields are not modified before the write operation on
the recovered hive)
Hive bins Copy of the Hive bins data size field of the base block at the
16 4
data size time of creation of a current log entry
Dirty pages
20 4 Number of dirty pages attached to a current log entry
count
24 8 Hash-1 See below
32 8 Hash-2 See below
Dirty pages
40 ...
references
... ... Dirty pages

A dirty page reference describes a single page to be written to a primary file, and it has the
following structure:

Offset Length Field Description


Offset of a page in a primary file (in bytes), relative from the start of the
0 4 Offset
hive bins data
4 4 Size Size of a page in bytes

Dirty pages are attached to a log entry in the same order as in the Dirty pages references without
an alignment or gaps.

Notes

1. Hash-1 is the Marvin32 hash of the data starting from the beginning of the first page
reference of a current log entry with the length of Size - 40 bytes.
2. Hash-2 is the Marvin32 hash of the first 32 bytes of a current log entry (including the
Hash-1 calculated before).
3. The following seed is used for calculating the Hash-1 and Hash-2 (hexadecimal bytes):
82 EF 4D 88 7A 4E 55 C5.
4. A transaction log file may contain multiple log entries written together (the base block in
a transaction log file is updated only when writing the first log entry), as well as old
(already applied) log entries.
5. If a primary file is dirty and has a valid Checksum (in the base block), only subsequent
log entries are applied. A subsequent log entry is a log entry with a sequence number
equal to or greater than a primary sequence number of the base block in a transaction log
file.
6. If a primary file is dirty and has a wrong Checksum (in the base block), its base block is
recovered from a transaction log file. Then subsequent log entries are applied.
7. If a log entry with a sequence number N is not followed by a log entry with a sequence
number N + 1, recovery stops after applying a log entry with a sequence number N. If the
first log entry doesn't contain an expected sequence number (equal to a primary sequence
number of the base block in a transaction log file, not less than a secondary sequence
number of the valid base block in a primary file), recovery stops.
8. If a log entry has a wrong value in the field Hash-1, Hash-2, or Hive bins data size (i.e. it
isn't multiple of 4096 bytes), recovery stops, only previous log entries (preceding a bogus
one) are applied.
9. A primary file is grown according to the Hive bins data size field of a log entry being
applied.
10. Dirty hive bins are verified for correctness during recovery (but recovery doesn't stop on
an invalid hive bin, an invalid hive bin is replaced with a dummy hive bin instead).
11. The Flags field of a log entry is set to 0x00000001 when a value of the Flags field of the
base block has the bit mask 0x00000001 set, otherwise the Flags field of a log entry is set
to 0x00000000. During recovery, the bit mask 0x00000001 is set or unset in the Flags
field of the base block according to a value taken from a log entry being applied. This
means that only the bit mask 0x00000001 is saved to or restored from a log entry.

Dirty state of a hive


A hive is considered to be dirty (i.e. requiring recovery) when a base block in a primary file
contains a wrong checksum, or its primary sequence number doesn't match its secondary
sequence number. If a hive isn't dirty, but a transaction log file (new format) contains subsequent
log entries, they are ignored.

Multiple transaction log files


A hive writer may use no transaction log files, a single transaction log file (*.LOG), or two
transaction log files (*.LOG1 and *.LOG2). In the last case, also known as a dual-logging
scheme (introduced in Windows Vista), a dummy third transaction log file (*.LOG) may be
present as an artifact from an installation image. When a hive writer is using a single transaction
log file, two empty transaction log files from the dual-logging scheme (*.LOG1 and *.LOG2)
may be present as well.

Old format

Under normal circumstances, only the first transaction log (*.LOG1) file is used. If an error
occurs when writing to a primary file, a switch to the second transaction log file (*.LOG2) is
performed (this file will contain a cumulative log of dirty data, i.e. dirty pages that weren't
written in full to a primary file due to an error and pages dirtied since the failed write, on the next
write attempt). If write errors persist, a hive writer will swap the transaction log file being used
on every write attempt (*.LOG2 to *.LOG1 and vice versa), keeping a cumulative log of dirty
data. This allows a hive writer to keep a consistent copy of previous dirty data in another
transaction log file on each write attempt (if a system crash occurs when writing to a current
transaction log file, thus leaving it in the inconsistent state, a primary file may be recovered later
using a previous transaction log file). After a successful write operation on a primary file, the
first transaction log file will be used again. If an error occurs when writing a base block to a
primary file in the beginning of a write operation (in order to update the Primary sequence
number and Last written timestamp fields), the whole operation fails without changing the log
file being used.

In the general case, the first transaction log file is used by a kernel to recover a dirty hive. Before
Windows 8, if a primary file contains an invalid base block (i.e. it has a wrong Checksum), and
the first transaction log file doesn't contain a valid backup copy of a base block (or, if this copy
of a base block is valid, it doesn't contain a matching Last written timestamp, i.e. this transaction
log file can't be applied to a dirty hive, see above), and the second transaction log file contains a
valid backup copy of a base block with a mismatching Last written timestamp and a more recent
log of dirty data than the first transaction log file (according to the Last written timestamp fields
of the backup copies of a base block in these files, this check is ignored if the first transaction log
file contains an invalid backup copy of a base block), then the second transaction log file is used
by a kernel to recover a dirty hive.

Such a recovery algorithm is extremely ineffective, because it doesn't use the second transaction
log file unless a base block of a primary file is invalid, and this base block is likely to be valid,
because an error when writing a base block to a primary file in the beginning of a write operation
will not trigger the switch to the second transaction log file, so the most probable event
triggering this switch is a write error when storing dirty data in a primary file, that is likely to
leave a valid base block in a primary file (the mid-update state).

In Windows 8, the second transaction log file is used by a kernel to recover a dirty hive when the
conditions mentioned above are met, with the following exceptions: the second transaction log
file is used even if a base block of a primary file is valid, the second transaction log file is used
even if its backup copy of a base block has a matching Last written timestamp. The new
algorithm is much more sound. In March 2016, Microsoft released an updated kernel for
Windows 7 (6.1.7601.19160, KB3140410), this kernel implements a recovery algorithm similar
to existing in Windows 8.

When a dirty hive is loaded by a boot loader (not by a kernel), the applicable transaction log file
(i.e. having a matching Last written timestamp in a valid backup copy of a base block) is used in
the recovery. The first transaction log file is tried first.

Another shortcoming in the implementation of the dual-logging scheme is that sequence numbers
in a backup copy of a base block in a transaction log file are not used to record its mid-update
state (see above). If a system crash occurs when writing to a transaction log file, there will be no
clear indicators of which transaction log file is inconsistent. It is possible for an operating system
to pick an inconsistent transaction log file for the recovery.
New format

A hive writer will regularly swap the transaction log file being used (*.LOG1 to *.LOG2 and
vice versa). This may divide log entries between two transaction log files; the first transaction
log file isn't guaranteed to contain earlier log entries.

If a primary file contains a valid base block, both transaction log files are used to recover the
dirty hive, i.e. log entries from both transaction log files are applied; the transaction log file with
earlier log entries is used first (if recovery stops when applying log entries from this transaction
log file, then recovery is resumed with the next transaction log file; the first log entry of the next
transaction log file is expected to have a sequence number equal to N + 1, where N is a sequence
number of the last log entry applied). If a primary file contains an invalid base block, only the
transaction log file with latest log entries is used in the recovery.

Flush strategies
Flushing a hive ensures that its dirty data was written to a disk. When the old format of
transaction log files is used, this means that dirty data was stored in a primary file. When the new
format of transaction log files is used, a flush operation on a hive will succeed after dirty data
was stored in a transaction log file (but not yet in a primary file); a hive writer may delay writing
to a primary file (up to an hour), in this situation dirty data becomes unreconciled.

Sector size and clustering factor


As of Windows 8, the Clustering factor field is always set to 1, the logical sector size is always
assumed to be 512 bytes when working with related offsets and sizes. For example, a backup
copy of a base block in a transaction log file is 512 bytes in length regardless of a logical sector
size of an underlying disk.

According to Microsoft, there is no support for a logical sector size different from 512 bytes and
4096 bytes in Windows; a logical sector size equal to 4096 bytes is supported as of Windows 8
and Windows Server 2012 [2]. This is why the Clustering factor field is expected to be equal to
1.

Very old versions of Windows


The format description above applies to registry hives with the following major and minor
version numbers in the Base block structure: 1.3, 1.4, 1.5, and 1.6. However, the following major
and minor version numbers can be found in registry hives of Windows NT 3.1, Windows NT 3.5,
and Windows NT 3.51: 1.1 and 1.2.

When the Minor version field of the base block is equal to 2, the Fast leaf records are not
supported.
When the Minor version field of the base block is equal to 1, the Fast leaf records are not
supported; the Key name string of the Key node and the Value name string of the Key value are
always UTF-16LE; the Flags and Spare fields of the Key value don't exist, there is a Title index
field at the offset 16 bytes with the length of 4 bytes; the Access bits (Spare) field of the Key
node doesn't exist, there is a Title index field at the offset 12 bytes with the length of 4 bytes; and
every cell has the following structure:

Offset Length Field Description


Size of a current cell in bytes, including this field (aligned to 16 bytes):
0 4 Size the size is positive if a cell is unallocated or negative if a cell is allocated
(use absolute values for calculations)
Offset of a previous cell in bytes, relative from the start of a current hive
4 4 Last
bin (or 0xFFFFFFFF for the first cell in a hive bin)
Cell
8 ...
data

The Title index field is used to store an index of a localized alias for a name string (there is no
alias for a name string if this field is equal to 0). Although the Title index field can be set and
read in Windows NT 3.1, localized aliases were never supported (and the Title index field
became deprecated in Windows NT 3.5).

In Windows NT 3.1, the following record is written to the beginning of the Cell data field of an
unallocated cell:

Offset Length Field Description


Offset of a next unallocated cell in a free list (in bytes, relative from the
0 4 Next start of the hive bins data) or 0xFFFFFFFF, if there is no such a cell in
that list
Offset of a previous unallocated cell in a free list (in bytes, relative
4 4 Previous from the start of the hive bins data) or 0xFFFFFFFF, if there is no such
a cell in that list

Registry hives with the Minor version field of the base block set to 0 can be found in pre-release
versions of Windows NT 3.1. This hive version is similar to the version 1.1, but it is out of the
scope of this document.

Also, the File type field of a base block in a transaction log file is set to 2 in all versions of
Windows NT up to and including Windows 2000.

Vous aimerez peut-être aussi