Speculation: Allow the execution of one or more instructions before the processor "knows" they should be executed.
A simple example of program controlled speculation is the annuling branch of the SPARC Instruction Set Architecture.
Independently of the outcome of the branch, the delay slot instuction is fetched and its execution begins. The annul bit in the branch instruction tells the hardware to nullify the effect of the delay slot instruction in the case that the branch is not taken. The nullifying branch feature enables compilers to have more flexiblity in scheduling the delay slot. If the compiler predicts the branch to be taken, it can put an instruction from the target in the delay slot. When the branch is not taken, the work done on the delay slot instruction is wasted but it doesn't cause incorrect results.
The software must mark certain instructions as speculative.
The tough requirement for hardware that performs executes instructions out of program order or speculatively is the requirement to make interrupts precise. The last non-speculative instruction whose execution affected the architectural (programmer visible) state must be well defined.
See page 306.
Suppose the job is to set R14 from the contents of memory word B is
memory word A contains zero,and otherwise set R14 to the result of an ALU
operation on the data loaded from A. (The example below is from the
text. The class example was slightly different.)
The IA-64 architecture has a "load ahead" and an explicit "check load" instruction. If the check instruction does not take an exception, then the loaded data is ok to use and becomes no longer speculative. The registers do have extra bits to mark when their contents are speculative. These are the "poison" bits in PH's terminology.
To extend Tomasulo's algorithm to enable speculative execution, a reorder buffer is added between the Common Data Bus and both the interface to store data in memory and to update the architectural registers. See figure 4.34 on PH, page 311.
Speculative instruction execution steps:
(1) Issue: A functional unit or reservation station PLUS a reorder
(result) buffer slot is chosen for the current instruction (or "RISC operation
combination" called a "Quad" of the AMD K6). The instuctions are
issued in program order. The architectural name of the result register
is replaced by the particular reorder buffer slot. A future
instruction with a source that is the old architectural name will be issued
with source renamed to the reorder buffer slot.
(2) Execute: (NOT done in program order!!) Perform the operation AFTER all the input data it needs is available. This enables issued operations to proceed as soon as all RAW hazards caused by the operation reading data have been cleared.
In general, out of order execution and speculative execution enables
hardware to perform operations by following a DATA FLOW MODEL of the
computation. Each operation can be performed as soon as all the operations
that source data to it have been completed.
Data flow diagram: Nodes are operations. Arrows are true
data dependencies.
(3) Write Result: (NOT done in program order!) Save the result of a completed operation execution in a tempory or reorder register (the one allocated to the operation by the issue stage) so that:
Steps (3) and (4) are separated so that exceptions can be precise.