## ECE/CS 552: Cache Memory

Instructor: Mikko H Lipasti

Fall 2010 University of Wisconsin-Madison

Lecture notes based on set created by Mark Hill Updated by Mikko Lipasti

#### **Big Picture**

- Memory
  - Just an "ocean of bits"
  - Many technologies are available
- Key issues
  - Technology (how bits are stored)
  - Placement (where bits are stored)
  - Identification (finding the right bits)
  - Replacement (finding space for new bits)
  - Write policy (propagating changes to bits)
- Must answer these regardless of memory type

| Туре          | Size       | Speed   | Cost/bit |
|---------------|------------|---------|----------|
| Register      | < 1KB      | < 1ns   | \$\$\$\$ |
| On-chip SRAM  | 8KB-6MB    | < 10ns  | \$\$\$   |
| Off-chip SRAM | 1Mb - 16Mb | < 20ns  | \$\$     |
| DRAM          | 64MB - 1TB | < 100ns | \$       |
| Disk          | 40GB – 1PB | < 20ms  | ~0       |







#### Technology – DRAM

- Logically similar to SRAM
- Commodity DRAM chips
  - E.g. 1Gb
  - Standardized address/data/control interfaces
- Very dense
  - 1T per cell (bit)
  - Data stored in capacitor decays over time
  - Must rewrite on read, refresh
- Density improving vastly over time
- Latency barely improving









#### Why Memory Hierarchy?

- Fast and small memories
  - Enable quick access (fast cycle time)
  - Enable lots of bandwidth (1+ L/S/I-fetch/cycle)
- Slower larger memories
  - Capture larger share of memory
  - Still relatively fast
- Slow huge memories
  - Hold rarely-needed state
  - Needed for correctness
- All together: provide appearance of large, fast memory with cost of cheap, slow memory

#### Why Does a Hierarchy Work?

- Locality of reference
  - Temporal locality
    - Reference same memory location repeatedly
  - Spatial locality
  - Reference near neighbors around the same time
- Empirically observed
  - Significant!
  - Even small local storage (8KB) often satisfies
     >90% of references to multi-MB data set



#### Four Burning Questions

- These are:
  - Placement
  - Where can a block of memory go?
  - Identification
  - How do I find a block of memory?
  - Replacement
     How do I make space for new blocks?
  - Write Policy
  - How do I propagate changes?
- Consider these for caches

- Usually SRAM

· Will consider main memory, disks later

# Placement

| Memory<br>Type  | Placement                 | Comments                                                |
|-----------------|---------------------------|---------------------------------------------------------|
| Registers       | Anywhere;<br>Int, FP, SPR | Compiler/programmer<br>manages                          |
| Cache<br>(SRAM) | Fixed in H/W              | Direct-mapped,<br>set-associative,<br>fully-associative |
| DRAM            | Anywhere                  | O/S manages                                             |
| Disk            | Anywhere                  | O/S manages                                             |







|                 |                                                                | acement and Identification |  |  |  |  |
|-----------------|----------------------------------------------------------------|----------------------------|--|--|--|--|
|                 | 32-bit Address<br>Tag                                          | Index Offset               |  |  |  |  |
| Portion         | Length                                                         | Purpose                    |  |  |  |  |
| Offset          | o=log2(block size)                                             | Select word within block   |  |  |  |  |
| Index           | i=log2(number of sets)                                         | Select set of blocks       |  |  |  |  |
| Tag             | t=32 - o - i                                                   | ID block within set        |  |  |  |  |
| onside<br><64,6 | er: <bs=block size,<br="">4,64&gt;: o=6, i=6, t=20:</bs=block> | S=sets, B=blocks>          |  |  |  |  |

#### Replacement

- Cache has finite size
- What do we do when it is full?
- Analogy: desktop full?
  - Move books to bookshelf to make room
- Same idea:
  - Move blocks to next level of cache

# Replacement How do we choose victim? Verbs: Victimize, evict, replace, cast out Several policies are possible

- FIFO (first-in-first-out)
- LRU (least recently used)
- NMRU (not most recently used)
- Pseudo-random (yes, really!)
- Pick victim within *set* where a = *associativity* - If a <= 2, LRU is cheap and easy (1 bit)
  - If a > 2, it gets harder
  - Pseudo-random works pretty well for caches

#### Write Policy

- Memory hierarchy
  - 2 or more copies of same block
    - Main memory and/or disk
    - Caches
- What to do on a write?
  - Eventually, all copies must be changed
  - Write must propagate to all levels

#### Write Policy

- Easiest policy: *write-through*
- Every write propagates directly through hierarchy
- Write in L1, L2, memory, disk (?!?)Why is this a bad idea?
- why is this a bad idea?
  - Very high bandwidth requirement
    Remember, large memories are slow
- Popular in real systems only to the L2
  - Every write updates L1 and L2
  - Beyond L2, use write-back policy

#### Write Policy

- Most widely used: write-back
- Maintain *state* of each line in a cache
  - Invalid not present in the cache
  - Clean present, but not written (unmodified)
  - Dirty present and written (modified)
- Store state in tag array, next to address tag – Mark dirty bit on a write
- On eviction, check dirty bit
  - If set, write back dirty line to next level
  - Called a writeback or castout

### Write Policy

- Complications of write-back policy
   Stale copies lower in the hierarchy
  - State copies lower in the ineractivy
     Must always check higher level for dirty copies before accessing copy in a lower level
- Not a big problem in uniprocessors – In multiprocessors: the cache coherence problem
- I/O devices that use DMA (direct memory
- access) can cause problems even in uniprocessors - Called coherent I/O
  - Must check caches for dirty copies before reading main memory























© Hill, Lipasti