AMD K10 (Phenom)

Configuration

AMD Phenom II X4 920 (45 nm) 2800 MHz + dual DDR2-800 PC-6400 5-5-5-18-23-2T

Cache

L1 Data cache = 64 KB. 64 B/line, 2-WAY. (Write-Allocate), 8 Banks, Exclusive with L2 cache.

L1 Instruction cache = 64 KB. 64 B/line

L2 cache size = 512 KB. 64 B/line, 16-WAY

L2 cache latency = 12 (14, if high L2<->L1 traffic)

L3 cache size = 6 MB. 64 B/line, 16-WAY

L3 cache latency = 40

4 KB pages mode (64-bit Windows, 64-bit soft)

TLB L1 size = 48 items. Miss penalty = 5. Only two misses at time can be processed.

TLB L2 size = 512 items. Miss penalty = 35. Only one miss at time can be processed.

PDE cache size = ? entries cover ? MB. Miss penalty = ?

Size Latency Description
64 K 3 TLB + L1
192 K 12 +9 (L2)
512 K 17 +5 (L1 TLB miss)
2 M 57 +40 (L3)
6 M 91 +34 (L2 TLB miss)
48 M 91 + 60 ns +RAM
... 121? + 60 ns +? (PDE cache miss)

MISC

16-bytes range cross penalty = 3 cycles.

L1 B/W (Parallel Random Read) = 0.55-0.60 cycles per one access (it's more than 0.50 due bank conflicts)

L2<->L1 B/W (Parallel Random Read) = 8 cycles per cache line in each direction

L2<->L1 B/W (Seqential Read or Write with any stride) = 8 cycles per cache line in each direction

L2<->L1 B/W (Seqential Read / pointer chasing / 4,8, 16 bytes step) = 3.05 cycles per access

L2<->L1 B/W (Seqential Read / pointer chasing / 32 bytes step ) = 5.22 cycles per access

L2<->L1 B/W (Seqential Read / pointer chasing / 64 bytes step ) = 9.20 cycles per access

L2<->L1 B/W (Seqential Read / pointer chasing / >64 bytes step ) = 15 cycles per access

L3<->L1 B/W (Parallel Random Read) = 18 cycles per cache line

L3<->L1 B/W (Seqential Read or Write with any stride) = 19 cycles per cache line

RAM Read B/W (Parallel Random Read) = 18 ns / cache line = 3550 MB/s

RAM Read B/W (Read with 8 Bytes stride) = 5400 MB/s ?

RAM Read B/W (Read with 64 Bytes stride) = 6160 MB/s ?

RAM Read B/W (Read with 4 Bytes stride - pointer chasing) = 2900 MB/s ?

RAM Read B/W (Read with 64 Bytes stride - pointer chasing) = 4100 MB/s ?

RAM Write B/W (4 Bytes stride) = 4500 MB/s ?

RAM Write B/W (64 Bytes stride) = 6170 MB/s ?

Branch misprediction penalty = 12 cycles.