MACHINE SPECIFICATION ================================ CPU: Intel Xeon E5410 at 2.33GHz RAM: 8GB GPU: NVIDIA Tesla C1060 ================================ ========================================================================================= BSCallDemo ========================================================================================= ---------------------------------------------------- GPU PROPERTIES Using NVIDIA Device: Tesla C1060 Compute Version: 1.3 Clock Rate: 1296MHz Num. Processors: 30 Max Threads/Block: 512 Warp Size: 32 threads Total Memory: 4095MB Constant Memory: 64KB Shared Memory: 16KB Max Registers/Block: 16384 Concurrent Cpy & Exec: true Concurrent Kernels: false ---------------------------------------------------- CPU initialization time = 2634.16ms GPU initialization time = 0ms Start of Auto-tuning: workPerThd = 5 to 50 Found new min: runtime=16.11686ms, workPerThd=5, thdsPerBlk=32, Est.Blks/SM=62.5 Found new min: runtime=15.50422ms, workPerThd=5, thdsPerBlk=64, Est.Blks/SM=31.26667 Found new min: runtime=15.03578ms, workPerThd=5, thdsPerBlk=224, Est.Blks/SM=8.933333 Found new min: runtime=15.00211ms, workPerThd=6, thdsPerBlk=64, Est.Blks/SM=26.06667 Found new min: runtime=14.83338ms, workPerThd=7, thdsPerBlk=64, Est.Blks/SM=22.33333 Found new min: runtime=14.71462ms, workPerThd=7, thdsPerBlk=288, Est.Blks/SM=4.966667 Found new min: runtime=14.44154ms, workPerThd=7, thdsPerBlk=480, Est.Blks/SM=3 Found new min: runtime=14.43638ms, workPerThd=8, thdsPerBlk=128, Est.Blks/SM=9.766666 Found new min: runtime=14.39926ms, workPerThd=8, thdsPerBlk=256, Est.Blks/SM=4.9 Found new min: runtime=14.1719ms, workPerThd=8, thdsPerBlk=320, Est.Blks/SM=3.933333 Found new min: runtime=14.09526ms, workPerThd=9, thdsPerBlk=384, Est.Blks/SM=2.9 Found new min: runtime=14.07254ms, workPerThd=10, thdsPerBlk=256, Est.Blks/SM=3.933333 Found new min: runtime=13.95731ms, workPerThd=12, thdsPerBlk=288, Est.Blks/SM=2.9 Found new min: runtime=13.83638ms, workPerThd=15, thdsPerBlk=96, Est.Blks/SM=6.966667 Found new min: runtime=13.4912ms, workPerThd=16, thdsPerBlk=160, Est.Blks/SM=3.933333 Found new min: runtime=13.33376ms, workPerThd=20, thdsPerBlk=128, Est.Blks/SM=3.933333 Found new min: runtime=13.29699ms, workPerThd=21, thdsPerBlk=480, Est.Blks/SM=1 Found new min: runtime=13.14422ms, workPerThd=29, thdsPerBlk=352, Est.Blks/SM=1 Found new min: runtime=13.09696ms, workPerThd=32, thdsPerBlk=160, Est.Blks/SM=1.966667 Found new min: runtime=13.01709ms, workPerThd=35, thdsPerBlk=96, Est.Blks/SM=3 Found new min: runtime=12.99821ms, workPerThd=35, thdsPerBlk=288, Est.Blks/SM=1 Auto-tuning complete Tuning Statistics: nRuns=736 ave=17.61388 min=12.99821 max=31.88858 stdDev=3.52604 DESCRIPTTION: Simple Black-Scholes path dynamics with deterministic term structures of interest and volatility. Uses NAG GPU device-level normal random number generators. ALGTHM: Milstein (with caching), SINGLE precision OPTION: Type = European CALL Maturity = 0.77 Strike = 91 S(0) = 100 Time step = 0.006875 SIMULATION PARAMS: NumTrials = 300000 ThdsPerBlk = 288 WorkPerThd = 35 Est.Blks/SM = 1 RESULTS: CPU Price = 18.93683472 Std Error of CPU estimate = 0.01594193839 GPU Price = 18.93683359 Std Error of GPU estimate = 0.01594193652 CPU runtime = 2787.199463ms - NOTE: this is single thread, unoptimised GPU runtime = 12.97686386ms Speedup = 214.7822113x Attempting to match CPU values ... RESULTS: CPU Price = 18.936834722 Std Error of CPU estimate = 0.015941938385 GPU Price = 18.936833593 Std Error of GPU estimate = 0.015941936523 ========================================================================================= SobolBSCallDemo ========================================================================================= ---------------------------------------------------- GPU PROPERTIES Using NVIDIA Device: Tesla C1060 Compute Version: 1.3 Clock Rate: 1296MHz Num. Processors: 30 Max Threads/Block: 512 Warp Size: 32 threads Total Memory: 4095MB Constant Memory: 64KB Shared Memory: 16KB Max Registers/Block: 16384 Concurrent Cpy & Exec: true Concurrent Kernels: false ---------------------------------------------------- CPU initialization time = 3622.8452148ms GPU initialization time = 113.64419556ms Start of Auto-tuning: workPerThd = 3 to 50 Found new min: runtime=4.134528ms, workPerThd=3, thdsPerBlk=32, Est.Blks/SM=112 Found new min: runtime=3.54352ms, workPerThd=3, thdsPerBlk=64, Est.Blks/SM=56.53333 Found new min: runtime=3.516064ms, workPerThd=3, thdsPerBlk=96, Est.Blks/SM=37.33333 Found new min: runtime=3.4832ms, workPerThd=4, thdsPerBlk=128, Est.Blks/SM=21.33333 Found new min: runtime=3.405408ms, workPerThd=5, thdsPerBlk=64, Est.Blks/SM=34.13334 Found new min: runtime=3.391776ms, workPerThd=5, thdsPerBlk=96, Est.Blks/SM=22.4 Found new min: runtime=3.29968ms, workPerThd=6, thdsPerBlk=64, Est.Blks/SM=28.8 Found new min: runtime=3.285152ms, workPerThd=8, thdsPerBlk=64, Est.Blks/SM=21.33333 Found new min: runtime=3.285056ms, workPerThd=8, thdsPerBlk=128, Est.Blks/SM=10.66667 Found new min: runtime=3.2696ms, workPerThd=12, thdsPerBlk=64, Est.Blks/SM=14.93333 Found new min: runtime=3.23504ms, workPerThd=16, thdsPerBlk=224, Est.Blks/SM=3.2 Found new min: runtime=3.133248ms, workPerThd=27, thdsPerBlk=128, Est.Blks/SM=3.2 Auto-tuning complete Tuning Statistics: nRuns=768 ave=4.675205 min=3.133248 max=7.63744 stdDev=0.9379215 DESCRIPTTION: Simple Black-Scholes path dynamics with deterministic term structures of interest and volatility. Uses NAG GPU quasi-random (Sobol) generator with optional scrambling and constructs sample paths using a Brownian bridge. ALGTHM: Milstein (with caching) with OWEN scrambling, SINGLE precision OPTION: Type = European CALL Maturity = 0.77 Strike = 91 S(0) = 100 SIMULATION PARAMS: NumTrials = 10001 ThdsPerBlk = 128 WorkPerThd = 27 Est.Blks/SM = 3.2 RESULTS: CPU Price = 18.94181396, Std Error of CPU estimate = 0.0009396008682 GPU Price = 18.94181344, Std Error of GPU estimate = 0.0009396178648 CPU runtime = 3854.480713ms - NOTE: this is single thread, unoptimised GPU runtime = 116.8228836ms Speedup = 32.99422836x ========================================================================================= MultiOptionSobol ========================================================================================= ---------------------------------------------------- GPU PROPERTIES Using NVIDIA Device: Tesla C1060 Compute Version: 1.3 Clock Rate: 1296MHz Num. Processors: 30 Max Threads/Block: 512 Warp Size: 32 threads Total Memory: 4095MB Constant Memory: 64KB Shared Memory: 16KB Max Registers/Block: 16384 Concurrent Cpy & Exec: true Concurrent Kernels: false ---------------------------------------------------- CPU initialization time = 224.861496ms GPU initialization time = 2.131583929ms Start of Auto-tuning: workPerThd = 5 to 50 Found new min: runtime=2.20896ms, workPerThd=5, thdsPerBlk=32, Est.Blks/SM=25.2 Found new min: runtime=1.242176ms, workPerThd=5, thdsPerBlk=64, Est.Blks/SM=12.6 Found new min: runtime=1.239616ms, workPerThd=5, thdsPerBlk=160, Est.Blks/SM=5.2 Found new min: runtime=1.160064ms, workPerThd=7, thdsPerBlk=192, Est.Blks/SM=3 Auto-tuning complete Tuning Statistics: nRuns=736 ave=2.559934 min=1.160064 max=4.821408 stdDev=0.9842071 DESCRIPTTION: Simple Black-Scholes path dynamics with deterministic term structures of interest and volatility. Prices multiple European Call options with different parameters in parallel. Uses NAG GPU quasi-random (Sobol) generator, and constructs sample paths using a Brownian bridge. ALGTHM: Milstein (with caching), SINGLE precision OPTION: OPTION 1 TYPE=CALL S0=100 K=100 T=1.7 OPTION 2 TYPE=CALL S0=100 K=100 T=1.7 OPTION 3 TYPE=CALL S0=100 K=70 T=1.5 OPTION 4 TYPE=CALL S0=100 K=100 T=1 OPTION 5 TYPE=CALL S0=100 K=100 T=1.7 OPTION 6 TYPE=CALL S0=100 K=70 T=1.2 SIMULATION PARAMS: NumTrials = 20001 ThdsPerBlk = 192 WorkPerThd = 7 Est.Blks/SM = 3 RESULTS: OPTION 1 OPTION 2 OPTION 3 OPTION 4 OPTION 5 OPTION 6 CPUPrice: 19.30513348 26.8460519 37.66472181 10.084801 32.11863009 31.67306755 GPUPrice: 19.30513196 26.84604864 37.66471832 10.0848006 32.11862668 31.67368585 CPU runtime = 350.6978455ms - NOTE: this is single thread, unoptimised GPU runtime = 3.570208073ms Speedup = 98.22896576x