Intel P6 (Pentium III)

Configuration

Celeron (130 nm - Tulatin) 1200 MHz (100*12) + SDRAM PC-100 2-2-2-5-7

Cache

L1 Data cache = 16 KB. 32 B/line, 4-WAY. (Write-Allocate)

L1 Instruction cache = 16 KB. 32 B/line, 4-WAY.

L2 cache size = 256 KB. 32 B/line, 8-WAY

4 KB pages mode

TLB size = 64 items (4-WAY). Miss penalty = 5

PDE cache size = 2 entries cover 8 MB (or 4 MB in PAE mode).

Size Latency Description
16 K 3 TLB + L1
256 K 8 +5 (L2)
8 M 13 + 150 ns +5 (TLB miss) + RAM
... 16 + 150 ns +3 (PDE cache miss)

4 MB pages mode

TLB size = 8 items. Miss penalty = 6

Size Latency Description
16 K 3 TLB + L1
256 K 8 +5 (L2)
32 M 8 + 150 ns + RAM
... 14 + 150 ns +6 (TLB miss)

MISC

32-bytes range cross penalty = 9 cycles.

4096-bytes range cross penalty = 86 cycles.

L2 B/W (Read with 32 Bytes stride) = 2.95 cycles per cache line

L2 B/W (Parallel Random Read) = 2.67 cycles per cache line

RAM B/W (Read with 32 Bytes stride) = 786 MB/s ?

Pipeline

Branch misprediction penalty = 9 cycles.

# In-Order Out-of-Order
1 ICache
2 ILD
3 Decode1/Rotate
4 Decode2
5 Decode3
6H RAT
6L
ROB Write? RS Psrc/Pdsts write
7H ROB/RRF read Ready: RS Pdst-CAM match
7L RS data write
RS Shedule
RS Pdst read
8H RS data Read
8L ByPass
9H Execute
9L RS/ROB writeback
10H Retire-ROB Read
10L Ip -1
11H Ip -2
11L Retire-RRF Write