Performance
Proven Results at Scale
Real benchmark data from Vitis v2021.1. LightningSim and OmniSim deliver up to 352× acceleration over traditional C/RTL co-simulation — with 99.9% cycle-count accuracy.
Peak speedup · OmniSim vs co-sim
Cycle-count accuracy
Benchmarks tested
Visualization
Runtime Performance Comparison
Actual runtimes across representative benchmarks showing dramatic acceleration vs traditional C/RTL co-simulation.
Benchmark Breakdown
Performance Across Workload Categories
Consistent speedups across DSP kernels, loop structures, and complex AI/ML workloads.
DSP & Mathematical Operations
Fixed-point Square Root
FIR Filter
Window Convolution
Floating-point Conv
Arbitrary Precision ALU
Loop & Control Flow Operations
Parallel Loops
Imperfect Loops
Pipelined Nested Loops
AI/ML & Complex Workloads
FlowGNN — GIN
Graph neural network · 260K cycles
FlowGNN — DGN
Directed graph neural network
Analysis
Key Takeaways
99.9% Accuracy Across All Benchmarks
Cycle-count estimates from LightningSim and OmniSim match C/RTL co-simulation results to within 0.1% on every tested workload.
Consistent 10–55× on Standard Kernels
DSP filters, convolutions, and FFT operations see reliable double-digit speedups — turning minute-long simulations into seconds.
352× on Complex AI/ML Workloads
Graph neural network models (FlowGNN) show the largest gains — OmniSim reduces 70-minute co-simulations to 12 seconds.
Up to 577× with Design Space Exploration
Combined with incremental DSE, total workflow acceleration reaches 577× — enabling rapid FIFO sizing and parameter sweeps.
Raw Data
Complete Benchmark Results
Full dataset from Vitis v2021.1 — cycle counts and runtime in seconds.
| Benchmark | Cosim (s) | LightningSim (s) | OmniSim (s) | LS Speedup | OS Speedup |
|---|---|---|---|---|---|
| Fixed-point Square Root | 27.25 | 4.97 | 3.65 | 5.48× | 7.47× |
| FIR Filter | 20.12 | 2.43 | 1.94 | 8.23× | 10.37× |
| Window Convolution | 28.30 | 3.69 | 3.15 | 7.67× | 8.98× |
| Floating-point Conv | 49.78 | 2.42 | 2.46 | 20.57× | 20.24× |
| Unoptimized FFT | 153.53 | 2.78 | 2.91 | 55.23× | 52.76× |
| FlowGNN — GIN | 4219.85 | 28.90 | 11.97 | 146.02× | 352.53× |
| FlowGNN — DGN | 996.13 | 26.90 | 11.71 | 37.03× | 85.07× |
30+ benchmarks tested across DSP operations, loop structures, memory access patterns, and AI/ML workloads. View complete profiling data →
Ready to Accelerate Your Workflow?
Replace hour-long co-simulations with seconds. Get started with LightningSim or OmniSim.