Profiling Results
Benchmark | Cosimulation Cycles | LightningSim Cycles | OmniSim Cycles | Cosimulation Runtime (in s) | LightningSim Runtime (in s) | OmniSim Runtime (in s) | LightningSim Speedup (vs. Cosim) | OmniSim Speedup (vs. Cosim) | OmniSim Speedup (vs. LightningSim) |
---|---|---|---|---|---|---|---|---|---|
Fixed-point square root | 30 | 30 | 30 | 27.25 | 4.97 | 3.65 | 5.48× | 7.47× | 1.36× |
FIR filter | 172 | 172 | 172 | 20.12 | 2.43 | 1.94 | 8.23× | 10.37× | 1.25× |
Fixed-point window conv | 35 | 35 | 35 | 28.30 | 3.69 | 3.15 | 7.67× | 8.98× | 1.71× |
Floating-point conv | 35 | 35 | 35 | 49.78 | 2.42 | 2.46 | 20.57× | 20.24× | 0.98× |
Arbitrary Precision ALU | 36 | 36 | 36 | 24.17 | 2.12 | 2.03 | 11.40× | 11.91× | 1.04× |
Parallel loops | 32 | 32 | 32 | 26.81 | 2.34 | 2.16 | 11.48× | 12.41× | 1.08× |
Imperfect loops | 34 | 34 | 34 | 25.80 | 2.24 | 2.13 | 11.52× | 12.11× | 1.05× |
Loop with max bound | 31 | 31 | 31 | 24.76 | 2.25 | 2.14 | 11.01× | 11.57× | 1.05× |
Perfect nested loops | 406 | 406 | 406 | 24.76 | 2.27 | 2.12 | 10.91× | 11.68× | 1.07× |
Pipelined nested loops | 405 | 405 | 405 | 24.92 | 2.23 | 2.19 | 11.18× | 11.38× | 1.02× |
Sequential accumulators | 32 | 32 | 32 | 26.59 | 2.29 | 2.20 | 11.61× | 12.09× | 1.04× |
Accumulators + asserts | 33 | 33 | 33 | 27.13 | 2.30 | 2.30 | 11.80× | 11.80× | 1.00× |
Accumulators + dataflow | 31 | 31 | 32 | 27.26 | 2.29 | 2.19 | 11.90× | 12.45× | 1.05× |
Static memory example | 66 | 66 | 66 | 33.23 | 2.18 | 2.12 | 15.24× | 15.67× | 1.03× |
Pointer casting example | 408 | 408 | 408 | 32.55 | 2.15 | 2.13 | 15.14× | 15.28× | 1.01× |
Double pointer example | 25 | 25 | 25 | 31.70 | 2.14 | 1.91 | 14.81× | 16.60× | 1.12× |
AXI-4 master | 178 | 177 | 177 | 21.06 | 2.19 | 2.07 | 9.62× | 16.60× | 10.17× |
AXIS w/o side channel | 52 | 51 | 51 | 19.12 | 2.06 | 1.94 | 9.28× | 9.86× | 1.06× |
Multiple array access | 252 | 252 | 252 | 24.32 | 2.18 | 2.08 | 11.16× | 11.69× | 1.05× |
Resolved array access | 131 | 131 | 131 | 24.36 | 2.20 | 2.05 | 11.07× | 11.88× | 1.07× |
URAM with ECC | 115 | 115 | 115 | 22.07 | 2.21 | 2.05 | 9.99× | 10.77× | 1.08× |
Fixed-point Hamming | 259 | 259 | 259 | 33.28 | 2.37 | 2.46 | 14.04× | 13.53× | 0.96× |
Unoptimized FFT | 261781 | 261150 | 261150 | 153.53 | 2.78 | 2.91 | 55.23× | 52.76× | 0.96× |
Multi-stage FFT | 3770 | 3772 | 3722 | 61.43 | 2.67 | 2.93 | 23.01× | 20.97× | 0.92× |
Huffman encoding | 10283 | 10272 | 10272 | 46.89 | 2.63 | 2.32 | 17.83× | 20.21× | 1.13× |
Matrix Multiplication | 1036 | 1036 | 1036 | 26.33 | 2.61 | 2.59 | 10.09× | 10.17× | 1.01× |
Parallelized merge sort | 131 | 131 | 131 | 48.79 | 2.27 | 2.15 | 21.49× | 22.69× | 1.06× |
Vector add with stream | 4261 | 4261 | 4261 | 27.21 | 4.48 | 3.56 | 6.07× | 7.64× | 1.26× |
FlowGNN GIN | 260359 | 260337 | 260337 | 4219.85 | 28.90 | 11.97 | 146.02× | 352.53× | 2.41× |
FlowGNN GCN | 112836 | 112561 | 112561 | 534.33 | 30.90 | 17.18 | 17.29× | 31.10× | 1.80× |
FlowGNN GAT | 17282 | 17282 | 17282 | 838.24 | 41.60 | 24.60 | 20.15× | 34.07× | 1.69× |
FlowGNN PNA | 344206 | 344206 | 344206 | 3285.45 | 30.50 | 29.00 | 107.72× | 113.29× | 1.05× |
FlowGNN DGN | 110710 | 110710 | 110710 | 996.13 | 26.90 | 11.71 | 37.03× | 85.07× | 2.30× |
Important Takeaways!
The above results have been published for version 2021.1 of the Vitis Development Suite.
The timing estimates provided by both LightningSim and OmniSim are 99.9% accurate with respect to the results from C/RTL Co-simulation.
LightningSim achieves a speedup of up to 146.02× over C/RTL Co-simulation.
OmniSim achieves a speedup of up to 352.53× over C/RTL Co-simulation
Both LightningSim and OmniSim provide the incremental design space exploration features, thereby achieving a speedup of up to 577× over C/RTL Co-simulation.