ARM Cortex-A8

Freescale i.MX515

Freescale i.MX515, 800 MHz.

4 KB pages mode

Size Latency Description
32 K 3 TLB + L1
128 K 17 + 14 (L2)
256 K 64 + 47 (TLB miss)
... 64 + 200 ns + 200 ns (RAM)

TI DM3730

TI DM3730, 1000 MHz.

4 KB pages mode

Size Latency Description
32 K 3 TLB + L1
128 K 12 + 9 (L2)
256 K 50 + 38 (TLB miss)
... 50 + 160 ns + 160 ns (RAM)

Samsung Hummingbird

Samsung Hummingbird, 1000 MHz.

4 KB pages mode

Size Latency Description
32 K 3 TLB + L1
128 K 13 + 10 (L2)
512 K 49 + 36 (TLB miss)
... 49 + 107 ns + 107 ns (RAM)

Pipeline

Branch misprediction penalty = 13-14 cycles.

Integer pipeline (from ARM's presentations):

# Name Stage Description2
1 F0 Fetch AGU
2 F1 RAM
TLB
BTB 512-entry 2-way
GHB 4096 * 2 bit
Return Stack: 8 entries
3 F2 12-entry Fetch Queue
4 D0 Decode Early Dec
5 D1 Dec / Sec
6 D2 Dec Queue read / write
7 D3 Score-Board + Issue Logic
8 D4 RegFile + ID + remap
9 E0 Execute Architectular RegFile
10 E1 MUL 1 / Shift / Shift / AGU
11 E2 MUL 2 / ALU+flags / ALU+flags / RAM + TLB
12 E3 MUL 3 / Sat / Sat / Format forward
13 E4 / L1 ACC / BP update / BP update / L2 Update - Arb
14 E5 / L2 WB / WB / WB / L2 tag : RAM 1
15 L3 L2 Load L2 tag : RAM 2
16 L4 Tag miss
17 L5 L2 data : RAM 1
18 L6 L2 data : RAM 2
19 L7 L2 data : RAM 3
20 L8 Bank mux
21 L9 Data format