Vous êtes sur la page 1sur 21

Reducing Cache Misses through

Cache Line Overlapping


Akhilesh Sreedharan [2018H1230199H]
Valluru Udai Sai [2018H1230200H]
• The overlapped cache (OVLPC) optimization applies to data cache.

• By profiling SPEC2000 benchmark programmes, it is observed that


over 50% of bytes in data cache are zero- valued or have a size less
than or equal to half the word size.
Example – Image Data
0x00-0xFF

Pixels of a grayscale image can be represented using 1 byte (8-bits).


Pixels on High-Color displays can be represented using 2 byte (16-bits)
Example – Audio Data

0x0000-0xFFFF

Highest quality audio samples can be represented using 2 bytes (16-bits)


• To avoid waste of zero-valued spaces in a data cache OVLPC allows
one cache line to hold up to two entries.

• To reduce the waste of zero-valued space, OVLPC achieves


overlapping by switching order between Little Endian and Big
Endian
0x00001234 Little Endian

0x0000ABCD Big Endian

1000 1001 1002 1003

Cache Line 34 12 AB CD
Fields in OVLPC

TAG represents the tag value of the recent line

VTAG represents the tag value of the previous line

FLAG L concerns a word value whose stored format is Little Endian [2-bit]

FLAG B concerns a word value whose stored format is Big Endian [2-bit]
Fields in OVLPC

FLAG L/B [2-bits] STATUS

00 Zero- valued word


Effective word size less than or equal to
01
half word size [16-bits] e.g. 0x0000ABCD

10 Invalid word
Effective word size more than
11
half word size e.g. 0x0BCD1234
Fields in OVLPC

R indicates stored form (Big/Little) of the current line [1-bit]

R Format
0 Little endian
1 Big endian

DATA data words stored in the cache line

Apart from these bits, the cache block will contain other bits like dirty bit,
LRU bits, etc.
Direct Mapped Overlapped Cache [DM-OVLPC]
• Let us now integrate concepts of the overlapped cache to a direct
mapped cache. (DM-OVLPC)

• For our example we will build a DM-OVLPC with 2 blocks.

• Each block will contain two words.

• We will interface our cache with a main memory having 8


locations
Cache Read …
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
XX 0 00 00 AB CD
00
XX 10
01 01 10 10
10 0
1 XX
CD AB
XX 00
XX 00
XX 1D
XX 00
XX 00
XX 00
XX
XX XX 10 10 10 10 1 XX XX XX XX XX XX XX XX 1 00 00 00 1D
2 F0 E1 CD 12
3 00 00 0B 12
4 00 00 AF C9
5 00 00 65 FE
6 00 00 A5 68
7 00 00 75 5A
READ Addr: 0x01 MISS

TAG Block# offset


0 0 1
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
0 00 00 AB CD
01 CD AB 1D 00
01
00 XX 01 01 01
10 10 10 00 C9
AF 00 65
00 FE
00 1 00 00 00 1D
XX XX 10 10 10 10 1 XX XX XX XX XX XX XX XX
2 F0 E1 CD 12
3 00 00 0B 12
4 00 00 AF C9
5 00 00 65 FE
6 00 00 A5 68
7 00 00 75 5A
READ Addr: 0x04 MISS

Good Case Read: Two cache lines having effective


TAG Block# offset length less than half word can sit in
0
1 0 one location.
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
0 00 00 AB CD
01 00 01 01 01 01 1 CD AB AF C9 1D 00 65 FE
1 00 00 00 1D
00
XX XX 10
11 10
01 10 10 1
0 12
XX CD
XX XX
E1 F0
XX 12
XX 0B
XX 00
XX 00
XX 2 F0 E1 CD 12
3 00 00 0B 12
4 00 00 AF C9
5 00 00 65 FE
6 00 00 A5 68
7 00 00 75 5A
READ Addr: 0x02 MISS

TAG Block# offset


0 1 0
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
01 00 01 01 01 01 1 CD AB AF C9 1D 00 65 FE 0 00 00 AB CD
01 12 CD 12 0B 1 00 00 00 1D
00
01 XX 11
10 10
01 10
01 0
1 A5
E1 F0
68 75
00 5A
00 2 F0 E1 CD 12
3 00 00 0B 12
4 00 00 AF C9
5 00 00 65 FE
6 00 00 A5 68

READ Addr: 0x06 MISS 7 00 00 75 5A

Worst Case Read: One or both of the words of the


previous line gets invalidated since
TAG Block# offset
the current or previous word has
1 1 0
an effective length greater than
half word size.
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
0 00 00 AB CD
01 CD AB AF C9 1D 00 65 FE
01 00 01 01 01 1 1 00 00 00 1D
01 00 10 01 01 01 1 12 CD A5 68 12 0B 75 5A
2 F0 E1 CD 12
3 00 00 0B 12
4 00 00 AF C9

Little Endian 5 00 00 65 FE
6 00 00 A5 68
7 00 00 75 5A
00 00 00 1D

READ Addr: 0x01 HIT

TAG Block# offset


0 0 1
Cache Write…
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
0 00 00 AB CD
01
00 00 11
01 01 01
01 10 0
1 CD
78 AB
56 34
AF 12
C9 1D 00 65 FE
1 00 00 00 1D
01 00 10 01 01 01 1 12 CD A5 68 12 0B 75 5A 2 F0 E1 CD 12
3 00 00 0B 12
12 34 56 78
4 00 00 AF C9
5 00 00 65 FE
Addr: 0x00
WRITE MISS
6 00 00 A5 68
Data: 0x12345678 7 00 00 75 5A

OVLPC only considers writing on the recent line to


simplify the architecture by sacrificing a write
TAG Block# offset
opportunity on the previous line.
0 0 0
It always results in a miss in OVLPC when a write operation
occurs on the previous line.
DM-OVLPC Main- Memory
TAG VTAG FLAG L FLAG B R DATA – WORD-0 DATA – WORD-1 ADD. DATA
0 1 2 3 0 1 2 3
0 12 34 56 78
00 01 11 01 10 01 0 78 56 34 12 1D 00 65 FE
1 00 00 00 1D
01 00 10 01 01 01 1 12 CD A5 68 12 0B BE
75 FG
5A 2 F0 E1 CD 12
3 00 00 0B 12
00 00 BE FG
4 00 00 AF C9
5 00 00 65 FE
6 00 00 A5 68
Addr: 0x07 7
WRITE HIT 00 00 75 5A
Data: 0x0000BEFG

TAG Block# offset


1 1 1
Overheads and Limitations

• Additional SRAM cells to store VTAG, R , FLAG L/B for each cache line.

• DM-OVLPC has about 15-20% storage overhead over direct-mapped


cache.

• Suffers from additional latency involved due to compaction and


restoration.

• DM-OVLPC has about 9-15% latency overheads over direct-mapped


cache.
Benchmarks on DM-OVLPC

• Size: 16kB

• Cache Line Size: 4


words

• Word Size : 32-bits

• OVLPC shows 29% reduction in miss rate over DMC.


• OVLPC shows 19% reduction in miss rate over VC.
• Almost similar performance as compared to FVC.
Thank You

Ref: S. Koo, S. Kim, D. Azougagh, Y. Cho, and S. Maeng,


Reducing cache misses through cache line overlapping,
Electronics Letters 42 (2006), no. 10, 569

Vous aimerez peut-être aussi