End-to-End Performance

This benchmark compares the full basin stability estimation pipeline across MATLAB and Python implementations.

Methodology

All implementations use the same:

ODE system: Damped driven pendulum
Parameters: α=0.1, T=0.5, K=1.0
Integration: t_span=(0, 1000), rtol=1e-8, atol=1e-6
Sample sizes: 100 to 100,000 initial conditions

Implementations Compared

Implementation	Platform	Parallelization
MATLAB bSTAB-M	CPU	MATLAB `parfor`
pybasin + JAX	CPU	Vectorized (`vmap`)
pybasin + JAX	CUDA GPU	Vectorized (`vmap`)
Attractors.jl	CPU	Threaded

Results

Performance Comparison

N	MATLAB	Python CPU	Python CUDA	Attractors.jl	pynamicalsys
100	0.76s	1.30s (0.6×)	12.86s (0.1×)	0.12s (6.2×)	0.16s (4.7×)
200	1.02s	1.50s (0.7×)	12.87s (0.1×)	0.16s (6.4×)	0.28s (3.6×)
500	1.90s	1.62s (1.2×)	12.98s (0.1×)	0.41s (4.6×)	0.62s (3.1×)
1,000	3.27s	2.00s (1.6×)	12.05s (0.3×)	0.85s (3.9×)	1.15s (2.8×)
2,000	6.29s	2.72s (2.3×)	12.32s (0.5×)	1.68s (3.8×)	2.51s (2.5×)
5,000	15.90s	5.73s (2.8×)	12.82s (1.2×)	4.22s (3.8×)	5.55s (2.9×)
10,000	31.01s	10.52s (2.9×)	12.64s (2.5×)	9.12s (3.4×)	11.11s (2.8×)
20,000	62.73s	20.94s (3.0×)	12.27s (5.1×)	21.08s (3.0×)	23.03s (2.7×)
50,000	153.04s	30.07s (5.1×)	12.40s (12.3×)	86.38s (1.8×)	56.02s (2.7×)
100,000	309.07s	62.94s (4.9×)	12.57s (24.6×)	—	115.96s (2.7×)

Bold marks the fastest per row. When the GPU wins, the best CPU-only option is also bolded — use it as the recommended alternative when no GPU is available.

Attractors.jl memory limit

Attractors.jl is not benchmarked at N=100,000. The GroupViaClustering (DBSCAN) step allocates a full N×N pairwise distance matrix, requiring ~80 GB of RAM at that scale. The practical ceiling on this machine is N=50,000 (~20 GB).

Scaling Analysis

Implementation	Scaling	Exponent α	R²
Attractors.jl	O(N)	1.05 ± 0.08	0.989
pynamicalsys	O(N)	0.96 ± 0.02	0.999
Python CPU	O(N^0.59)	0.59 ± 0.10	0.942
MATLAB	O(N)	0.90 ± 0.06	0.992
Python CUDA	O(1)	-0.00 ± 0.01	0.168

Comparison Plot

Benchmark Comparison

Scaling Plot (Log-Log)

Scaling Analysis

Key Findings

Python CPU becomes 3-5× faster than MATLAB for N > 5,000
Python CUDA achieves near-constant time (~12s) regardless of N due to GPU parallelization
At N=100,000: GPU is ~25× faster than MATLAB (as long as data fits in GPU memory)
Attractors.jl scales linearly (O(N)) and matches Python CPU throughput up to N=50,000

Hardware

Benchmarks run on:

CPU: Intel Core Ultra 9 275HX
GPU: NVIDIA GeForce RTX 5070 Ti Laptop GPU (12 GB VRAM)