Académique Documents
Professionnel Documents
Culture Documents
Author details
No connection to Sun Microsystems/Oracle. Worked on low level Mainframe and Embedded class Kernel, Distributed Microkernel for International Hardware/Microprocessor Companies as well as having extensive experience developing a proprietary ANSI C compiler toolchain for Intel, MIPS and Alpha processors. Also experience with parallel computers and DFT for signal processing using CUDA/GPU Now looking for new contracts, projects (primarily software) in GB, Europe or US in Kernel, Compiler, Signal Processing, Parallel area (as of Jun 2011) Contact dmctek@gmail.com
Evolution to SST o During a data-dependent stall (eg L1 cache miss) enter execute-ahead mode doing speculative execution
(..contd)
Evolution to SST
Speculative Execution Depends on =>
o o
Checkpointing Transactional Memory Ahead thread executing instructions speculatively Behind thread executing instructions with resolved data dependencies [+] Single threaded software code is being executed simultaneously from 2 different locations using hardware threads [+] Achieves MLP and ILP [-] Program locality works toward ensuring cache misses are kept to a minimum or the prefetcher may be able to produce the result with a very low cycle latency
Advantages =>
o o o
Hazards
Common to OO and SST Data
o
RAW, WAR, WAW Branching, Exceptions Scheme must not break effect of Total Store Ordering (The Von-Neuman/Turing ordering of a code). In other words the results of the dynamic machine scheduling of code must not differ with the static program schedule)
Control
o
SST
RAW => Defers instructions and any resolved operands in a deferred queue (DQ) o WAR, WAW => Speculatively retired
o
Data hazards
Executing instructions out of order is problematical as potentially N versions of operands held in finite set of registers When does the register have the correct value for the right instruction?
RAW a=5; a=10; b=a+1; b should be 11 not 6 WAR a=5 b=a+1 a=6 b should be 5 not 6 WAW a=5; b=50; b should be 50 not 5
Avoids RAW by using NT bit and deferring the instruction Avoids WAR by saving resolved operands alongside relevant instruction in the DQ Avoids WAW the NT bits determines if it can update the ARF (architectural register file) if not the WAW bit is set preventing this and the SRF register update may only be used to do data forwarding Reg[dest] = Reg[operand_1] || Reg[operand_n]
Behind thread
o o
loads set the cache line S speculatively read bit (transactional memory support) o If cache logic evicts or invalidates a line with the S bit set then ahead thread speculation has failed for this episode
Checkpoints
For N=2 At start of an SST episode 2 checkpoints are created
o
Architectural Checkpoint
Initially active Once active ahead-thread progresses with speculative execution
When deferred queue empty for speculative episode a merge operation is performed
Merge is Ahead-thread results + Behind-thread results => Architectural Checkpoint NT = SNT && W ; SNT and W bit vectors cleared ; Architectural Checkpoint is discarded ; Speculative Checkpoint is made active aka it becomes the new Architectural Checkpoint
When deferred queue empty for all speculative episodes a join operation is performed
Join similar to Merge except nothing remains in the Deferred Queue and the speculative episode is ended returning the Ahead-thread to Normal mode
2 Defer Queues
Hold instructions & resolved operands used by behind thread
SST logic
L1 Resolved WAIT more data expected Speculation Successful Program Execution resumes were speculation finished
ActiveArchitectural InactiveSpeculative
Arch Checkpoint
DQ Empty for current & spec ckpt? Tx Fail Sbit Detect Mem Order Violation
DQ Full? Execute Instr and Retire OO Instr has Data Dependencies? WAIT Restore Checkpoint
Br Mispredict
SST scheduling
Program Order
LDX addr1, %r1 ADD %r1, 0x04, %r2 STX %r2, addr2 SETHI 0x01, %r2 STX %r2, addr3 etc.. Deferring data-dependent instructions prevents RAW here %r2 was read at 3 but written before at 2 Saving operands in DQ prevents WAR as any valid data in register at that time is captured and saved for Behind-Thread to use later regardless of future writes by Ahead-Thread
; Ahead-Thread
1 LDX addr1, %r1
; Load Miss on addr1, Defer and set R1[ NT ]) To Defer Q ; Checkpoint Start Ahead-Thread, Behind-Thread Waits for data read
RAW
;Deferred Queue
LDX addr1, %r1[ NT ]
WAR
ADD
SST Order
LDX addr1, %r1 ADD %r1, 0x04, %r2 STX %r2, addr2 SETHI 0x01, %r2 STX %r2, addr3 etc..
WAW
; Load Miss resolves start Behind-Thread 6 ADD %r1, 0x04, %r2[NT=0,SNT=1] ; NT was reset at 4, set waw bit 7 STX %r2, addr3
Registers with WAW bit not committed to Architectural state here %r2 was written at 4 & 6