L1 Data cache = 16 KB. 32 B/line, 4-WAY. (Write-Allocate)
L1 Instruction cache = 16 KB. 32 B/line, 4-WAY.
L2 cache size = 256 KB. 32 B/line, 8-WAY
TLB size = 64 items (4-WAY). Miss penalty = 5
PDE cache size = 2 entries cover 8 MB (or 4 MB in PAE mode).
| Size | Latency | Description |
|---|---|---|
| 16 K | 3 | TLB + L1 |
| 256 K | 8 | +5 (L2) |
| 8 M | 13 + 150 ns | +5 (TLB miss) + RAM |
| ... | 16 + 150 ns | +3 (PDE cache miss) |
TLB size = 8 items. Miss penalty = 6
| Size | Latency | Description |
|---|---|---|
| 16 K | 3 | TLB + L1 |
| 256 K | 8 | +5 (L2) |
| 32 M | 8 + 150 ns | + RAM |
| ... | 14 + 150 ns | +6 (TLB miss) |
32-bytes range cross penalty = 9 cycles.
4096-bytes range cross penalty = 86 cycles.
L2 B/W (Read with 32 Bytes stride) = 2.95 cycles per cache line
L2 B/W (Parallel Random Read) = 2.67 cycles per cache line
RAM B/W (Read with 32 Bytes stride) = 786 MB/s ?
Branch misprediction penalty = 9 cycles.
| # | In-Order | Out-of-Order | ||
|---|---|---|---|---|
| 1 | ICache | |||
| 2 | ILD | |||
| 3 | Decode1/Rotate | |||
| 4 | Decode2 | |||
| 5 | Decode3 | |||
| 6H | RAT | |||
| 6L |
| |||
| 7H | ROB/RRF read | Ready: RS Pdst-CAM match | ||
| 7L | RS data write |
|
||
| 8H | RS data Read | |||
| 8L | ByPass | |||
| 9H | Execute | |||
| 9L | RS/ROB writeback | |||
| 10H | Retire-ROB Read | |||
| 10L | Ip -1 | |||
| 11H | Ip -2 | |||
| 11L | Retire-RRF Write |