Performance

Proven Results at Scale

Real benchmark data from Vitis v2021.1. LightningSim and OmniSim deliver up to 352× acceleration over traditional C/RTL co-simulation — with 99.9% cycle-count accuracy.

352 ×

Peak speedup · OmniSim vs co-sim

99.9 %

Cycle-count accuracy

30+

Benchmarks tested

Visualization

Runtime Performance Comparison

Actual runtimes across representative benchmarks showing dramatic acceleration vs traditional C/RTL co-simulation.

Benchmark Breakdown

Performance Across Workload Categories

Consistent speedups across DSP kernels, loop structures, and complex AI/ML workloads.

DSP & Mathematical Operations

Fixed-point Square Root

7.47 × vs Cosim

FIR Filter

10.37 × vs Cosim

Window Convolution

8.98 × vs Cosim

Floating-point Conv

20.24 × vs Cosim

Arbitrary Precision ALU

11.91 × vs Cosim

Loop & Control Flow Operations

Parallel Loops

12.41 × OmniSim

Imperfect Loops

12.11 × OmniSim

Pipelined Nested Loops

11.38 × OmniSim

AI/ML & Complex Workloads

FlowGNN — GIN

Graph neural network · 260K cycles

352.53 × OmniSim vs Cosim

FlowGNN — DGN

Directed graph neural network

85.07 × OmniSim vs Cosim

Analysis

Key Takeaways

99.9% Accuracy Across All Benchmarks

Cycle-count estimates from LightningSim and OmniSim match C/RTL co-simulation results to within 0.1% on every tested workload.

Consistent 10–55× on Standard Kernels

DSP filters, convolutions, and FFT operations see reliable double-digit speedups — turning minute-long simulations into seconds.

352× on Complex AI/ML Workloads

Graph neural network models (FlowGNN) show the largest gains — OmniSim reduces 70-minute co-simulations to 12 seconds.

Up to 577× with Design Space Exploration

Combined with incremental DSE, total workflow acceleration reaches 577× — enabling rapid FIFO sizing and parameter sweeps.

Raw Data

Complete Benchmark Results

Full dataset from Vitis v2021.1 — cycle counts and runtime in seconds.

Benchmark	Cosim (s)	LightningSim (s)	OmniSim (s)	LS Speedup	OS Speedup
Fixed-point Square Root	27.25	4.97	3.65	5.48×	7.47×
FIR Filter	20.12	2.43	1.94	8.23×	10.37×
Window Convolution	28.30	3.69	3.15	7.67×	8.98×
Floating-point Conv	49.78	2.42	2.46	20.57×	20.24×
Unoptimized FFT	153.53	2.78	2.91	55.23×	52.76×
FlowGNN — GIN	4219.85	28.90	11.97	146.02×	352.53×
FlowGNN — DGN	996.13	26.90	11.71	37.03×	85.07×

30+ benchmarks tested across DSP operations, loop structures, memory access patterns, and AI/ML workloads. View complete profiling data →

Ready to Accelerate Your Workflow?

Replace hour-long co-simulations with seconds. Get started with LightningSim or OmniSim.