5
L1 Data cache = 24 KB. 64 B/line. 512 KBytes, 6-way set associative, 64-byte line size. Write back.
L1 Instruction cache = 32 KB. 64 B/line, 8-WAY.
L2 cache size = 512 KB. 512 8-way set associative, 64-byte line size. Probably it reads two lines (128 bytes) for each Read operation.
L2 cache latency = 15 (12, if low L2->L1 traffic)
L1 TLB size = 16 items . Miss penalty = 7
L2 TLB size = 64 items. Miss penalty = 16
| Size | Latency | Description |
|---|---|---|
| 24 K | 3 | L1-TLB + L1 |
| 64 K | 15 | + 12 (L1-Cache miss) |
| 256 K | 22 | + 7 (L1-TLB miss) |
| 512 K | 38 | + 16 (L2-TLB miss) |
| ... | 38 + 115 ns | + RAM |
64-bytes range cross penalty = 14 cycles.
Reading B/W:
L2 Read B/W (64 Bytes stride) = 15 cycles per cache line
L2 Write B/W (64 Bytes stride) = 13 cycles per cache line
RAM Read B/W (4 Bytes stride) = 2900 MB/s
RAM Write B/W (4 Bytes stride) = 1070 MB/s
Branch misprediction penalty = 17 cycles.
Integer pipeline (from Intel's presentations):
| # | Name | Description |
|---|---|---|
| 1 | IF1 | Instruction Fetch |
| 2 | IF2 | |
| 3 | IF3 | |
| 4 | ID1 | Instruction Decode |
| 5 | ID2 | |
| 6 | ID3 | |
| 7 | SC | Instruction Dispatch |
| 8 | IS | |
| 9 | IRF | Source Operand Read |
| 10 | AG | Data Cache Access |
| 11 | DC1 | |
| 12 | DC2 | |
| 13 | EX1 | Execute |
| 14 | FT1 | Exceptions and MT handling |
| 15 | FT2 | |
| 16 | IWB/DC1 | Commit |