Performance

Proven Results at Scale

Real benchmark data from Vitis v2021.1. LightningSim and OmniSim deliver up to 352× acceleration over traditional C/RTL co-simulation — with 99.9% cycle-count accuracy.

352 ×

Peak speedup · OmniSim vs co-sim

99.9 %

Cycle-count accuracy

30+

Benchmarks tested

Visualization

Runtime Performance Comparison

Actual runtimes across representative benchmarks showing dramatic acceleration vs traditional C/RTL co-simulation.

Benchmark Breakdown

Performance Across Workload Categories

Consistent speedups across DSP kernels, loop structures, and complex AI/ML workloads.

DSP & Mathematical Operations

Fixed-point Square Root

7.47 × vs Cosim

FIR Filter

10.37 × vs Cosim

Window Convolution

8.98 × vs Cosim

Floating-point Conv

20.24 × vs Cosim

Arbitrary Precision ALU

11.91 × vs Cosim

Loop & Control Flow Operations

Parallel Loops

12.41 × OmniSim

Imperfect Loops

12.11 × OmniSim

Pipelined Nested Loops

11.38 × OmniSim

AI/ML & Complex Workloads

FlowGNN — GIN

Graph neural network · 260K cycles

352.53 × OmniSim vs Cosim

FlowGNN — DGN

Directed graph neural network

85.07 × OmniSim vs Cosim

Analysis

Key Takeaways

99.9% Accuracy Across All Benchmarks

Cycle-count estimates from LightningSim and OmniSim match C/RTL co-simulation results to within 0.1% on every tested workload.

Consistent 10–55× on Standard Kernels

DSP filters, convolutions, and FFT operations see reliable double-digit speedups — turning minute-long simulations into seconds.

352× on Complex AI/ML Workloads

Graph neural network models (FlowGNN) show the largest gains — OmniSim reduces 70-minute co-simulations to 12 seconds.

Up to 577× with Design Space Exploration

Combined with incremental DSE, total workflow acceleration reaches 577× — enabling rapid FIFO sizing and parameter sweeps.

Raw Data

Complete Benchmark Results

Full dataset from Vitis v2021.1 — cycle counts and runtime in seconds.

BenchmarkCosim (s)LightningSim (s)OmniSim (s)LS SpeedupOS Speedup
Fixed-point Square Root27.254.973.655.48×7.47×
FIR Filter20.122.431.948.23×10.37×
Window Convolution28.303.693.157.67×8.98×
Floating-point Conv49.782.422.4620.57×20.24×
Unoptimized FFT153.532.782.9155.23×52.76×
FlowGNN — GIN4219.8528.9011.97146.02×352.53×
FlowGNN — DGN996.1326.9011.7137.03×85.07×

30+ benchmarks tested across DSP operations, loop structures, memory access patterns, and AI/ML workloads. View complete profiling data →

Ready to Accelerate Your Workflow?

Replace hour-long co-simulations with seconds. Get started with LightningSim or OmniSim.