Vous êtes sur la page 1sur 8

Introduction:Buffer management is a key component in achieving this efficiency.

The buffer management component consists of two mechanisms: the buffer manager to access and update database pages, and the buffer cache (also called the buffer pool , to reduce database file I!".

#ow Buffer $anagement %orks:& buffer is an '-(B page in memory, the same si)e as a data or inde* page. Thus, the buffer cache is divided into '-(B pages. The buffer manager manages the functions for reading data or inde* pages from the database disk files into the buffer cache and writing modified pages back to disk. & page remains in the buffer cache until the buffer manager needs the buffer area to read in more data. +ata is written back to disk only if it is modified. +ata in the buffer cache can be modified multiple times before being written back to disk.

Example For Buffer Management: - SQL SERVER


%hen ,-. ,erver starts, it computes the si)e of virtual address space for the buffer cache based on a number of parameters such as the amount of physical memory on the system, the configured number of ma*imum server threads, and various startup parameters. ,-. ,erver reserves this computed amount of its process virtual address space (called the memory target for the buffer cache, but it ac/uires (commits only the re/uired amount of physical memory for the current load. 0ou can /uery the bpool1commit1target and bpool1committed columns in the sys.dm1os1sys1info catalog view to return the number of pages reserved as the memory target and the number of pages currently committed in the buffer cache, respectively. The interval between ,-. ,erver startup and when the buffer cache obtains its memory target is called ramp-up. +uring this time, read re/uests fill the buffers as needed. 2or e*ample, a singlepage read re/uest fills a single buffer page. This means the ramp-up depends on the number and type of client re/uests. 3amp-up is e*pedited by transforming single-page read re/uests into aligned eight-page re/uests. This allows the ramp-up to finish much faster, especially on machines with a lot of memory. Because the buffer manager uses most of the memory in the ,-. ,erver process, it cooperates with the memory manager to allow other components to use its buffers. The buffer manager interacts primarily with the following components: 3esource manager to control overall memory usage and, in 45-bit platforms, to control address space usage. +atabase manager and the ,-. ,erver "perating ,ystem (,-.", for low-level file I!" operations. .og manager for write-ahead logging.

,upported 2eatures:The buffer manager supports the following features: The buffer manager is non-uniform memory access (67$& aware. Buffer cache pages are distributed across hardware 67$& nodes, which allow a thread to access a buffer page that is allocated on the local 67$& node rather than from foreign memory. 2or more information, see #ow ,-. ,erver ,upports 67$&. To understand how pages of memory from the buffer cache are assigned when using 67$&, see growing and shrinking the Buffer 8ool 7nder 67$&. The buffer manager supports #ot &dd $emory, which allows users to add physical memory without restarting the server. 2or more information, see #ot &dd $emory. The buffer manager supports dynamic memory allocation on $icrosoft %indows 98 45bit and %indows 5::4 45-bit platforms when &%; is enabled. +ynamic memory allocation allows the +atabase ;ngine to efficiently ac/uire and release memory in the buffer cache to support the current workload. 2or more information see, +ynamic $emory $anagement. The buffer manager supports large pages on <=-bit platforms. The page si)e is specific to the version of %indows. 2or more information, see the %indows documentation. The buffer manager provides additional diagnostics that are e*posed through dynamic management views. 0ou can use these views to monitor a variety of operating system resources that are specific to ,-. ,erver. 2or e*ample, you can use the sys.dm1os1buffer1descriptors view to monitor the pages in the buffer cache. 2or more information, see ,-. ,erver "perating ,ystem 3elated +ynamic $anagement >iews (Transact-,-. .

+isk I!"
The buffer manager only performs reads and writes to the database. "ther file and database operations such as open, close, e*tend, and shrink are performed by the database manager and file manager components. +isk I!" operations by the buffer manager have the following characteristics: &ll I!"s are performed asynchronously, which allows the calling thread to continue processing while the I!" operation takes place in the background. &ll I!"s are issued in the calling threads unless the affinity I!" option is in use. The affinity I!" mask option binds ,-. ,erver disk I!" to a specified subset of ?87s. In high-end ,-. ,erver online transactional processing (".T8 environments, this e*tension can enhance the performance of ,-. ,erver threads issuing I!"s. $ultiple page I!"s are accomplished with scatter-gather I!", which allows data to be transferred into or out of noncontiguous areas of memory. This means that ,-. ,erver can /uickly fill or flush the buffer cache while avoiding multiple physical I!" re/uests.

Long I/O Requests


2

The buffer manager reports on any I!" re/uest that has been outstanding for at least @A seconds. This helps the system administrator distinguish between ,-. ,erver problems and I!" subsystem problems. ;rror message '44 is reported and appears in the ,-. ,erver error log as follows: ,-. ,erver has encountered Bd occurrence(s of I!" re/uests taking longer than Bd seconds to complete on file CBlsD in database CBlsD (Bd . The ", file handle is :*Bp. The offset of the latest long I!" is: BE:@<I<=*. & long I!" may be either a read or a writeF it is not currently indicated in the message. .ong-I!" messages are warnings, not errors. They do not indicate problems with ,-. ,erver. The messages are reported to help the system administrator find the cause of poor ,-. ,erver response times more /uickly, and to distinguish problems that are outside the control of ,-. ,erver. &s such, they do not re/uire any action, but the system administrator should investigate why the I!" re/uest took so long, and whether the time is Gustifiable.

Causes of Long-I/O Requests


& long-I!" message may indicate that an I!" is permanently blocked and will never complete (known as lost I/O , or merely that it Gust has not completed yet. It is not possible to tell from the message which scenario is the case, although a lost I!" will often lead to a latch time-out. .ong I!"s often indicate a ,-. ,erver workload that is too intense for the disk subsystem. &n inade/uate disk subsystem may be indicated when: $ultiple long I!" messages appear in the error log during a heavy ,-. ,erver workload. 8erfmon counters show long disk latencies, long disk /ueues, or no disk idle time. .ong I!"s may also be caused by a component in the I!" path (for e*ample, a driver, controller, or firmware continually postponing servicing an old I!" re/uest in favor of servicing newer re/uests that are closer to the current position of the disk head. The common techni/ue of processing re/uests in priority based upon which ones are closest to the current position of the read!write head is known as Helevator seeking.H This may be difficult to corroborate with the %indows ,ystem $onitor (8;32$"6.;9; tool because most I!"s are being serviced promptly. .ong I!" re/uests can be aggravated by workloads that perform large amounts of se/uential I!", such as backup and restore, table scans, sorting, creating inde*es, bulk loads, and )eroing out files. Isolated long I!"s that do not appear related to any of the previous conditions may be caused by a hardware or driver problem. The system event log may contain a related event that helps to diagnose the problem.

Error Detection
+atabase pages can use one of two optional mechanisms that help insure the integrity of the page from the time it is written to disk until it is read again: torn page protection and checksum protection. These mechanisms allow an independent method of verifying the correctness of not only the data storage, but hardware components such as controllers, drivers, cables, and even the operating system. The protection is added to the page Gust before writing it to disk, and verified after it is read from disk.
3

Torn Page Protection


Torn page protection, introduced in ,-. ,erver 5:::, is primarily a way of detecting page corruptions due to power failures. 2or e*ample, an une*pected power failure may leave only part of a page written to disk. %hen torn page protection is used, a 5-bit signature is placed at the end of each A@5-byte sector in the page (after having copied the original two bits into the page header . The signature alternates between binary :@ and @:with every write, so it is always possible to tell when only a portion of the sectors made it to disk: if a bit is in the wrong state when the page is later read, the page was written incorrectly and a torn page is detected. Torn page detection uses minimal resourcesF however, it does not detect all errors caused by disk hardware failures.

?hecksum 8rotection
Checksum protection, introduced in ,-. ,erver 5::A, provides stronger data integrity checking. & checksum is calculated for the data in each page that is written, and stored in the page header. %henever a page with a stored checksum is read from disk, the database engine recalculates the checksum for the data in the page and raises error '5= if the new checksum is different from the stored checksum. ?hecksum protection can catch more errors than torn page protection because it is affected by every byte of the page, however, it is moderately resource intensive. %hen checksum is enabled, errors caused by power failures and flawed hardware or firmware can be detected any time the buffer manager reads a page from disk. The kind of page protection used is an attribute of the database containing the page. ?hecksum protection is the default protection for databases created in ,-. ,erver 5::A and later. The page protection mechanism is specified at database creation time, and may be altered by using &.T;3 +&T&B&,;. 0ou can determine the current page protection setting by /uerying the page1verify1option column in the sys.databases catalog view or the IsTorn8age+etection;nabled property of the +&T&B&,;83"8;3T0;9 function. If the page protection setting is changed, the new setting does not immediately affect the entire database. Instead, pages adopt the current protection level of the database whenever they are written ne*t. This means that the database may be composed of pages with different kinds of protection.

3;&+I6I I6,T&6?;
The I!" from an instance of the ,-. ,erver +atabase ;ngine includes logical and physical reads. & logical read occurs every time the +atabase ;ngine re/uests a page from the buffer cache. If the page is not currently in the buffer cache, a physical read first copies the page from disk into the cache. The read re/uests generated by an instance of the +atabase ;ngine are controlled by the relational engine and optimi)ed by the storage engine. The relational engine determines the most effective access method (such as a table scan, an inde* scan, or a keyed read F the access methods and buffer manager components of the storage engine determine the general pattern of reads to perform, and optimi)e the reads re/uired to implement the access method. The thread e*ecuting the batch schedules the reads.

Read-A ead
The +atabase ;ngine supports a performance optimi)ation mechanism called read-ahead. 3eadahead anticipates the data and inde* pages needed to fulfill a /uery e*ecution plan and brings the pages into the buffer cache before they are actually used by the /uery. This allows computation and I!" to overlap, taking full advantage of both the ?87 and the disk. The read-ahead mechanism allows the +atabase ;ngine to read up to <= contiguous pages (A@5(B from one file. The read is performed as a single scatter-gather read to the appropriate number of (probably non-contiguous buffers in the buffer cache. If any of the pages in the range are already present in the buffer cache, the corresponding page from the read will be discarded when the read completes. The range of pages may also be HtrimmedH from either end if the corresponding pages are already present in the cache. There are two kinds of read-ahead: one for data pages and one for inde* pages.

3eading +ata 8ages


Table scans used to read data pages are very efficient in the +atabase ;ngine. The inde* allocation map (I&$ pages in a ,-. ,erver database list the e*tents used by a table or inde*. The storage engine can read the I&$ to build a sorted list of the disk addresses that must be read. This allows the storage engine to optimi)e its I!"s as large se/uential reads that are performed in se/uence, based on their location on the disk. 2or more information about I&$ pages, see $anaging ,pace 7sed by "bGects.

Reading Inde! Pages


"

The storage engine reads inde* pages serially in key order. 2or e*ample, this illustration shows a simplified representation of a set of leaf pages that contains a set of keys and the intermediate inde* node mapping the leaf pages. 2or more information about the structure of pages in an inde*, see ?lustered Inde* ,tructures.

T e storage engine uses t e infor#ation in t e inter#ediate inde! $age a%o&e t e 'eaf 'e&e' to sc edu'e seria' read-a eads for t e $ages t at contain t e (e)s* If a request is #ade for a'' t e (e)s fro# A+C to DE,- t e storage engine first reads t e inde! $age a%o&e t e 'eaf $age* .o/e&er- it does not 0ust read eac data $age in sequence fro# $age "14 to $age ""2 3t e 'ast $age /it (e)s in t e s$ecified range4* Instead- t e storage engine scans t e inter#ediate inde! $age and %ui'ds a 'ist of t e 'eaf $ages t at #ust %e read* T e storage engine t en sc edu'es a'' t e reads in (e) order* T e storage engine a'so recogni5es t at $ages "14/"1" and "26/"27 are contiguous and $erfor#s a sing'e scatter read to retrie&e t e ad0acent $ages in a sing'e o$eration* 8 en t ere are #an) $ages to %e retrie&ed in a seria' o$eration- t e storage engine sc edu'es a %'oc( of reads at a ti#e* 8 en a su%set of t ese reads is co#$'eted- t e storage engine sc edu'es an equa' nu#%er of ne/ reads unti' a'' t e required reads a&e %een sc edu'ed* T e storage engine uses $refetc ing to s$eed %ase ta%'e 'oo(u$s fro# nonc'ustered inde!es* T e 'eaf ro/s of a nonc'ustered inde! contain $ointers to t e data ro/s t at contain eac s$ecific (e) &a'ue* As t e storage engine reads t roug t e 'eaf $ages of t e nonc'ustered inde!- it a'so starts sc edu'ing as)nc ronous reads for t e data ro/s / ose $ointers a&e a'read) %een retrie&ed* T is a''o/s t e storage engine to retrie&e data ro/s fro# t e under')ing ta%'e %efore it as co#$'eted t e scan of t e nonc'ustered inde!* Prefetc ing is used regard'ess of / et er t e ta%'e as a c'ustered inde!* 9:L 9er&er Enter$rise uses #ore $refetc ing t an ot er editions of 9:L 9er&er- a''o/ing #ore $ages to %e read a ead* T e 'e&e' of $refetc ing is not configura%'e in an) edition* ,or #ore infor#ation a%out nonc'ustered inde!es- see ;onc'ustered Inde! 9tructures*

%3ITI6I I6,T&6?;
The I!" from an instance of the +atabase ;ngine includes logical and physical writes. & logical write occurs when data is modified in a page in the buffer cache. & physical write occurs when the page is written from the buffer cache to disk. %hen a page is modified in the buffer cache, it is not immediately written back to diskF instead, the page is marked as dirty. This means that a page can have more than one logical write made before it is physically written to disk. 2or each logical write, a transaction log record is inserted in the log cache that records the modification. The log records must be written to disk before the associated dirty page is removed from the buffer cache and written to disk. ,-. ,erver uses a techni/ue known as write-ahead logging that prevents writing a dirty page before the associated log record is written to disk. This is essential to the correct working of the recovery manager. 2or more information, see %rite-&head Transaction .og. The following illustration shows the process for writing a modified data page.

%hen the buffer manager writes a page, it searches for adGacent dirty pages that can be included in a single gather-write operation. &dGacent pages have consecutive page I+s and are from the same fileF the pages do not have to be contiguous in memory. The search continues both forward and backward until one of the following events occurs: & clean page is found. 45 pages have been found. & dirty page is found whose log se/uence number (.,6 has not yet been flushed in the log. & page is found that cannot be immediately latched. In this way, the entire set of pages can be written to disk with a single gather-write operation. Just before a page is written, the form of page protection specified in the database is added to the page. If torn page protection is added, the page must be latched ;9(clusively for the I!". This is because the torn page protection modifies the page, making it unsuitable for any other thread to read. If checksum page protection is added, or the database uses no page protection, the page is latched with an 78(date latch for the I!". This latch prevents anyone else from modifying the page during the write, but still allows readers to use it. 2or more information about disk I!" page protection options, see Buffer $anagement.

& dirty page is written to disk in one of three ways.

La5) /riting
The la)y writer is a system process that keeps free buffers available by removing infre/uently used pages from the buffer cache. +irty pages are first written to disk.

;ager writing
The eager write process writes dirty data pages associated with nonlogged operations such as bulk insert and select into. This process allows creating and writing new pages to take place in parallel. That is, the calling operation does not have to wait until the entire operation finishes before writing the pages to disk.

?heckpoint
The checkpoint process periodically scans the buffer cache for buffers with pages from a specified database and writes all dirty pages to disk. ?heckpoints save time during a later recovery by creating a point at which all dirty pages are guaranteed to have been written to disk. The user may re/uest a checkpoint operation by using the ?#;?(8"I6T command, or the +atabase ;ngine may generate automatic checkpoints based on the amount of log space used and time elapsed since the last checkpoint. In addition, a checkpoint is generated when certain activities occur. 2or e*ample, when a data or log file is added or removed from a database, or when the instance of ,-. ,erver is stopped. 2or more information, see ?heckpoints and the &ctive 8ortion of the .og. The la)y writing, eager writing, and checkpoint processes do not wait for the I!" operation to complete. They always use asynchronous (or overlapped I!" and continue with other work, checking for I!" success later. This allows ,-. ,erver to ma*imi)e both ?87 and I!" resources for the appropriate tasks.

3;2;3;6?;,: tt$<//tec net*#icrosoft*co#/en-us/'i%rar)/aa336"2" tt$<//tec net*#icrosoft*co#/en-us/'i%rar)/#s1=146"3&>sq'*11"4*as$! tt$<//tec net*#icrosoft*co#/en-us/'i%rar)/aa336"213&>sq'*11"4*as$! tt$<//en*/i(i%oo(s*org//i(i/D+?9@%ufferA#anager tt$</////*scri%d*co#/doc/4="724"1/T e-+uffer-?anager-of-a-D+?9

Vous aimerez peut-être aussi