Vous êtes sur la page 1sur 14

c

hapter1

Introdution

Memory access is generally slow when compared with the speed of the central processing unit, and so the memory poses a signicant bottleneck in computer performance. A small but fast cache memory, in which the contents of the most commonly accessed locations are maintained, can be placed between the main memory and the CPU. A cache memory has fewer locations than a main memory, and as a result it reduces the access time. When a program executes, the cache memory is searched rst, and the referenced word is accessed in the cache if the word is present. If the referenced word is not in the cache, then a free location is created in the cache and the referenced word is brought into the cache from the main memory. The word is then accessed in the cache. Although this process takes longer than accessing main memory directly, the overall performance can be improved if a high proportion of memory accesses are satised by the cache. A cache memory is faster than main memory for many reasons. Faster electronics can be used, which also results in a greater expense in terms of money, size, and power requirements. Since the cache is small, this increase in cost is relatively small.

Indian Institute Of Information Technology, Allahabad

Page 1

Basics of Cache
hapter2

Principle of Latency
This is also known as Latency Period. It is the total time associated with the processing of a particular data unit. The cache is placed both physically closer and logically closer to the CPU than the main memory, and this placement avoids communication delays over a shared bus. Thus improves Latency. Main purpose of cache is to reduce the latency of reference to memory. Ideally total Latency should be 1 processor clock cycle for instruction read or data read or write. If the data or instruction required by the processor is stored only in the lower speed memory, then latency will be high. Cache-less computer contains a CPU that has a clock speed of 400 MHz, but communicates over a 66 MHz bus to a main memory that supports a lower clock speed of 10 MHz . A cache memory can be positioned closer to the CPU as shown in the right side of Figure 1, so that the CPU sees fast accesses over a 400 MHz direct path to the cache.

Indian Institute Of Information Technology, Allahabad

Page 2

Fig 1.1 word transfer in different systems

Locality of Reference
Large number of programs has shown that reference to memory in given time is tend to confined within some specific area of memory. This phenomenon is known as Locality of Reference. When a program loop is executed ,the CPU repeatedly refers to the set of instructions in memory that constitute the loop. Every time a given subroutine is called , its set of instructions are fetched from memory. Thus loop and memory tend to localize the reference to memory for fetching instructions. If the active portion of program and data are placed in a fast small memory, the average memory access time can be reduced, thus reducing the total execution time of the program. Such a small fast memory is known as cache memory. The principal of locality helped to speed up main memory access by introducing small fast memory known as Cache memory. Different Types of Locality: 1) Temporal Locality: Also known as Look Backward locality. If the Information recently referenced by a program is likely to be used again soon

Indian Institute Of Information Technology, Allahabad

Page 3

2) Spatial Locality: Also known as Look Forward Locality. If the portion of the address near the current location of reference are likely to be referenced in the near future. 3) Sequential Locality: it is special case of Spatial Locality, in which the address of the next reference will be the immediate successor of the present reference.

Types of Cache Memory


Caches are classified into four type according to their function or the nature of the information held in the cache. 1) Instruction cache: it holds only instructions. The design is optimized for the instruction stream only. 2) Data cache: Data Cache hold only data stream. 3) Unified cache: This type of cache holds both instruction and data. 4) Split cache: Split cache is a two-cache system, one for instruction and one for data. The performance of cache is measured in terms of Hit Ratio and its access time .

Hit Ratio: When CPU refers to the memory and finds the word in cache, this
condition is known as hit. If the word is not found in the cache, this is known as miss. The ratio of the number of hits divided by the total CPU references to memory (Hits + miss) is known as hit ratio. Hit ratio is measured by running representative programs in the computer and measuring the number of hits and miss during a given interval of time. High hit ratio verifies the validity of the locality of reference property.

Indian Institute Of Information Technology, Allahabad

Page 4

The average memory access time of a computer system can be improved considerably by use of a cache. If the hit ratio is high enough so that the most of the time CPU access the cache instead of the main memory, the average access time is closer to the access time of the fast cache.

Access Time: Time required for a character in memory to be transferred to or


from the CPU. Thus no time or very little time must be wasted when searching for words in the cache. Many mapping techniques have been developed to improve performance of cache memory.

Indian Institute Of Information Technology, Allahabad

Page 5

c
Mapping:
mapping process.

Mapping Techniques
hapter3

Transformation of data from main memory to cache memory is referred as a

The basic characteristic of memory is fast access time. There must be very little waste of time in searching the data or instruction in cache memory. The transformation of address from main memory to cache is known as mapping process. A number of hardware schemes have been developed for translating main memory addresses to cache memory addresses. The choice of cache mapping scheme affects cost and performance, and there is no single best method that is appropriate for all situations. Three commonly used Mapping methods are 1) Associative Mapped Cache 2) Direct-Mapped Cache 3) Set-Associative Mapped Cache

3.1 Associative Mapping:


Memories that can be searched for their contents, in parallel, are referred to as associative, or content-addressable memories. In Associative mapping

Indian Institute Of Information Technology, Allahabad

Page 6

technique both address and the content of the memory word are stored in the cache. In the figure we have shown CPU address of 15 bit represented in 5 digit octal number. Its corresponding 12 bit data is represented as 4 digit octal number. A 15bit CPU address in placed in Argument register and the associative memory is searched for a matching address. If address is found , corresponding 12 bit data will be read and sent to the CPU . if no match occurs main memory is searched for the word. The Address-data pair is then transferred to the associative memory.

Fig 3.1 Associative Mapping

Advantages of Associative Mapping:


Any main memory block can be placed into any cache slot. This means that regardless of how irregular the data and program references are, if a slot is available for the block, it can be stored in the cache. Indian Institute Of Information Technology, Allahabad Page 7

3.2 Direct Mapping:


Associative memories are expensive compared to random-access memories because of the added logic associated with each cell. The possibility of using a random-access memory for the cache is investigated in fig. 2.1.The CPU address of 15 bits is divided into two fields. The nine least significant bits constitute the index field and the remaining six bits forms the tag field. The figure shows that main memory needs an address bits required to access the cache memory.

Fig. 3.2 Addressing relationships between main and cache memories. In the general case, there are words in cache memory and 2n words in main

memory. The n-bit main memory address is divided into two fields: k bits for the index field and n-k bits for tag field. The direct mapping cache organization uses n-bit address to access the main memory and the k-bit index to access the cache. The internal organization of the words in the cache memory is as shown in fig.3.2. Each word in cache consists of the data word its associated tag. When a new word is first brought into the cache , the tag bits are stored alongside the data bits. When the CPU generates a memory request, the index field is used for the address to access the cache. Indian Institute Of Information Technology, Allahabad Page 8

Fig. 3.3 Direct mapping cache organisation The tag field of CPU address is compared with the tag in the word read from the cache. If the two tags match there is a hit and desired data word is in cache. If there is no match, there is a miss and the required word is read from main memory. It is then stored in the cache together with the new tag, replacing the previous value. Example: The word at address zero is presently stored in the cache (index=000, tag=00, data=1220).Suppose that the CPU now wants to access the word at address 02000.The cache tag is 00 but the address tag is 02,which does not produces a match. Therefore main memory is accessed and the data word 5670 is transferred to the CPU. The cache word at index address 00 is then replaced with a tag of 02 and data of 5670.

Indian Institute Of Information Technology, Allahabad

Page 9

Advantage: Direct mapping is faster than the associative mapping as it avoids searching through all the CM tags for a match. Disadvantage: Hit ratio can drop considerably if two or more words whose address have same index but different tags are accessed repeatedly.

3.3 Set Associative Mapping


It was mentioned previously that the disadvantages of direct mapping is that two words with the same index in their address but with different tag values can not reside in cache memory at the same time. Set-associative mapping is an improvement over direct mapping technique, in which cache can store two or more words of memory under the same index address. Each data word is stored together with its tag and the number of tag-data items in one word of cache is said to form a set. Example:

Fig.3.4 Two-way set-associative mapping cache As shown in Fig. 3.4 each index address refers to two data words and their associated tags. Each tag requires 6 bit s and each data word has 12 bits, so the

Indian Institute Of Information Technology, Allahabad

Page 10

word length is 2(6+12) =36 bits. An index address of 9 bits can accommodate 512 words. Thus the size of cache memory is 512*36. The octal numbers listed in fig.3.4 are with reference to the main memory contents illustrated in fig.3.2.The words stored at address 01000 and 02000 of main memory are stored in cache memory at index address 000. Similarly, the words at addresses 02777 and 00777 are stored in cache at index address 777. When CPU generates a memory request, the index value of the address is used to access the cache. The tag field of the CPU address is then compared with both tags in the cache to determine if a match occurs. Advantage: Each word of cache can store two or more words of memory under the same index address. Hit ratio increases Disadvantage: more complex comparison logic. When a miss occurs in a set-associative cache and the set is full, it is necessary to replace one of the tag-data items with a new value, for this different page replacement algorithms are used. 3.4 PAGE REPLACEMENT ALGORITHMS Page Replacement Algorithms decide which memory pages to page out (swap out, write to disk) when a page of memory needs to be allocated. Paging happens when a page fault occurs and a free page cannot be used to satisfy the allocation, either because there are none, or because the number of free pages is lower than some threshold . Page fault when CPU is searching for a page in cache memory, but it is not there, then page fault occurs. There are following main algorithms which are common:

Indian Institute Of Information Technology, Allahabad

Page 11

1.FIFO: This policy simply removes pages in the order they arrived in the main memory. Using this policy we simply remove a page based on the time of its arrival in the memory. When the buffer is full, the oldest page is replaced . Example:

2. Optimal Page Replacement In this, to replace page see which page is not used in future immediately.

Impossible to implement (need to know the future) but serves as a standard to compare with the other algorithms we shall study. It takes less page fault than FIFO.

Indian Institute Of Information Technology, Allahabad

Page 12

3.LRU : Replaces the page that has not been referenced for the longest time. By the principle of locality, this should be the page least likely to be referenced in the near future. It performs nearly as well as the optimal policy. Example:

Indian Institute Of Information Technology, Allahabad

Page 13

REFERENCES

J.Peir , W. Alan and J. Smith, Functional Implementation Techniques for CPU Cache Memories IEEE Trans. on Computers, VOL. 48, NO. 2, FEBRUARY 1999 M.Moris.Mano , Computer system architecture, third edition Harvey G. Cragon , Memory Systems and Pipelined Processors, ISBN 81-7319-226-X M.Abd-El-Barr, H. El-Rewini , Fundamentals of computer organization and architecture , ISBN: 978-0-471-46741-0 Dongxing Bao ,Xiaoming Li A Cache Scheme Based on LRU-Like Algorithm Proceedings of the 2010 IEEE International Conference on Information and Automation June 20 - 23, Harbin, China.

Indian Institute Of Information Technology, Allahabad

Page 14

Vous aimerez peut-être aussi