|16 K||2||TLB + L1|
|512K||10||+ 8 (L1-Cache miss, L2 hit)|
|2 M||60||+ 50 (TLB miss)|
|...||60 + 200 ns||+ RAM (L2-Cache Miss)|
|16 K||2||TLB + L1|
|128K||10||+ 8 (L1-Cache miss, L2 hit)|
|512K||10 + 140 ns||+ RAM (L2-Cache Miss)|
|...||60 + 140 ns||+ 50 (TLB miss)|
|1||Fetch||Prior to their execution, instructions are fetched from the Instruction Cache (I-cache) and placed in the Instruction Buffer, where eventually they will be selected to be executed. Accessing the I-cache is done during the F Stage. Up to four instructions are fetched along with branch prediction information, the predicted target address of a branch, and the predicted set of the target. The high bandwidth provided by the I-cache (4 instructions/cycle) allows UltraSPARC-IIi to prefetch instructions ahead of time based on the current instruction flow and on branch prediction. Providing a fetch bandwidth greater than or equal to the maximum execution bandwidth assures that, for well behaved code, the processor does not starve for instructions. Exceptions to this rule occur when branches are hard to predict, when branches are very close to each other, or when the I-cache miss rate is high.|
|2||Decode||After being fetched, instructions are pre-decoded and then sent to the Instruction
Buffer. The pre-decoded bits generated during this stage accompany the instructions
during their stay in the Instruction Buffer. Upon reaching the next stage (where the
grouping logic lives) these bits speed up the parallel decoding of up to 4
While it is being filled, the Instruction Buffer also presents up to 4 instructions to the next stage. A pair of pointers manage the Instruction Buffer, ensuring that as many instructions as possible are presented in order to the next stage.
|3||Grouping||The G-stage logic's main task is to group and dispatch a maximum of four valid instructions in one cycle. It receives a maximum of four valid instructions from the Prefetch and Dispatch Unit (PDU), it controls the Integer Core Register File (ICRF), and it routes valid data to each integer functional unit. The G-stage sends up to two foating-point or graphics instructions out of the four candidates to the Floating-Point and Graphics Unit (FGU). The G-stage logic is responsible for comparing register addresses for integer data bypassing and for handling pipeline stalls due to interlocks.|
|4||Execute||Data from the integer register fille is processed by the two integer ALUs during this
cycle (if the instruction group includes ALU operations). Results are computed and
are available for other instructions (through bypasses) in the very next cycle. The
virtual address of a memory operation is also calculated during the E Stage, in
parallel with ALU computation.
FLOATING-POINT AND GRAPHICS UNIT: The Register (R) Stage of the FGU. The floating-point register file is accessed during this cycle. The instructions are also further decoded and the FGU control unit selects the proper bypasses for the current instructions.
|5||Cache Access||The virtual address of memory operations calculated in the E-stage is sent to the tag
RAM to determine if the access (load or store type) is a hit or a miss in the D-cache.
In parallel the virtual address is sent to the data MMU to be translated into a
physical address. On a load when there are no other outstanding loads, the data
array is accessed so that the data can be forwarded to dependent instructions in the
pipeline as soon as possible.
ALU operations executed in the E-stage generate condition codes in the C Stage. The condition codes are sent to the PDU, which checks whether a conditional branch in the group was correctly predicted. If the branch was mispredicted, earlier instructions in the pipe are flushed and the correct instructions are fetched. The results of ALU operations are not modified after the E Stage; the data merely propagates down the pipeline (through the annex register file), where it is available for bypassing for subsequent operations.
FLOATING-POINT AND GRAPHICS UNIT: The X1 Stage of the FGU. Floating-point and graphics instructions start their execution during this stage. Instructions of latency one also finish their execution phase during the X1 Stage.
|6||N1||A data cache (D-cache) miss/hit or a TLB miss/hit is determined during the N1
Stage. If a load misses the D-cache, it enters the Load Buffer. The access will arbitrate
for the E-cache if there are no older unissued loads. If a TLB miss is detected, a trap
will be taken and the address translation is obtained through a software routine.
The physical address of a store is sent to the Store Buffer during this stage. To avoid
pipeline stalls when store data is not immediately available, the store address and
data parts are decoupled and sent to the Store Buffer separately.
FLOATING-POINT AND GRAPHICS UNIT: The X2 stage of the FGU. Execution continues for most operations.
|7||N2||Most floating-point instructions finish their execution during this stage. After N2,
data can be bypassed to other stages or forwarded to the data portion of the Store
Buffer. All loads that have entered the Load Buffer in N1 continue their progress
through the buffer; they will reappear in the pipeline only when the data comes
back. Normal dependency checking is performed on all loads, including those in the
FLOATING-POINT AND GRAPHICS UNIT: The X3 stage of the FGU.
|8||N3||UltraSPARC-IIi resolves traps at this stage.|
|9||Write||All results are written to the register files (integer and floating-point) during this stage. All actions performed during this stage are irreversible. After this stage, instructions are considered terminated.|
UltraSPARC II at Wikipedia