Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix F: Performance Benchmarks

This appendix presents benchmark data for transpilation speed, runtime performance comparisons between Python and Rust, and memory usage across the Sovereign AI Stack.

Transpilation Speed

Time to transpile source code to Rust, measured on a 24-core AMD EPYC system:

SourceFilesLinesTranspile TimeLines/sec
Python (pure functions)505,0001.2s4,167
Python (ML with numpy)12025,0008.4s2,976
C (systems code)3012,0003.1s3,871
Shell scripts152,0000.6s3,333
Mixed (Python + C + Shell)20040,00012.8s3,125

Transpilation is I/O-bound for small projects and CPU-bound for large ones. Files within a language group are transpiled in parallel.

Runtime Performance: Python vs Rust

Benchmarks comparing original Python code against transpiled and optimized Rust code:

Compute-Intensive Workloads

WorkloadPythonRust (scalar)Rust (SIMD)Rust (GPU)
Matrix multiply 1024x10242,400 ms85 ms (28x)12 ms (200x)2.1 ms (1,143x)
FFT 1M points180 ms14 ms (13x)3.2 ms (56x)0.8 ms (225x)
K-means (10K pts, 10 clusters)850 ms32 ms (27x)8.5 ms (100x)1.9 ms (447x)
Random Forest inference (1K)45 ms1.8 ms (25x)0.9 ms (50x)N/A

I/O-Intensive Workloads

WorkloadPythonRustSpeedupNotes
CSV parse 100MB4.2s0.38s11xRust uses zero-copy parsing
JSON serialize 1M records3.8s0.22s17xserde vs json module
File scan 10K files1.9s0.15s13xParallel with rayon
HTTP server (req/sec)2,80095,00034xaxum vs flask

ML Inference

ModelPython (PyTorch)Rust (realizar)SpeedupNotes
BERT-base (batch=1)12 ms4.2 ms2.9xCPU
Qwen 1.5B (tok/s, CPU)8.5182.1xAVX2
Qwen 1.5B (tok/s, GPU)240RTX 4090 CUDA, APR Q4K (GH-88)
Whisper-tiny (1s audio)180 ms45 ms4.0xCPU

Memory Usage Comparisons

WorkloadPython Peak RSSRust Peak RSSReduction
Idle process28 MB1.2 MB23x
Load 100MB dataset380 MB105 MB3.6x
BERT inference1.2 GB420 MB2.9x
Qwen 1.5B Q4K4.8 GB1.1 GB4.4x
10K concurrent connections2.1 GB85 MB25x

Benchmark Methodology

All benchmarks follow these principles:

  • Warm-up: 5 iterations discarded before measurement
  • Iterations: Minimum 100 iterations or 10 seconds
  • Statistics: Median reported with 95% confidence interval
  • Environment: Isolated system, no other workloads
  • Reproduction: Benchmark code included in benches/ directory
# Run the full benchmark suite
cargo bench

# Run a specific benchmark
cargo bench -- matrix_multiply

# Compare against baseline
cargo bench -- --baseline python_baseline

Hardware Reference

Benchmark hardware unless otherwise noted:

ComponentSpecification
CPUAMD EPYC 7443P (24 cores, 48 threads)
RAM256 GB DDR4-3200 ECC
GPUNVIDIA RTX 4090 (24 GB VRAM)
StorageNVMe SSD (7 GB/s read)
OSLinux 6.8.0, Ubuntu 24.04

Navigate: Table of Contents