Thesis

2.1 The Claim

Can a single Rust binary (apr) match Python-ecosystem HumanEval/MBPP scores for Qwen2.5-Coder-7B, with zero Python dependencies?

This is the one falsifiable question that drives the entire project. If the answer is yes, the sovereign Rust AI stack works end-to-end. If no, apr compare-hf pinpoints exactly where it falls short.

2.2 The Problem with the Status Quo

The Python ML ecosystem requires:

200+ transitive dependencies (transformers, torch, accelerate, bitsandbytes, peft, trl, vllm)
Vendor-locked CUDA toolchains (nvcc, libcudart, cuDNN — NVIDIA only)
Multi-GB Docker images (pytorch/pytorch: ~6 GB; vllm: ~15 GB)
30-60 minute setup (CUDA toolkit install, conda env, pip conflicts)

These are not engineering choices — they are historical accidents. Nothing about LoRA fine-tuning, weight merging, or INT4 quantization requires Python or CUDA.

2.3 The Constraint

Every optimization step must be expressible as an apr subcommand:

apr import → apr distill → apr finetune → apr merge → apr prune → apr quantize → apr eval → apr publish

Hard rules:

No Python. No notebooks. No HuggingFace Transformers library.
No GPU vendor lock-in. Primary backend: wgpu (Vulkan/Metal/DX12). Optional: CUDA for hardware that lacks wgpu support (e.g., Blackwell sm_121).
Pure sovereign stack: aprender, entrenar, trueno.

2.4 Compute Reality

Resource	Dev Workstation	gx10 (Eval Server)
GPUs	2x AMD Radeon Pro W5700X (Navi10)	NVIDIA Blackwell GB10 (sm_121)
VRAM/Memory	16 GB per GPU, 32 GB total	119 GB unified
GPU backend	wgpu / Vulkan 1.3.255 (RADV)	CUDA 13.0
CPU	16 cores, 64 GB RAM	aarch64, 10 cores
Best HumanEval	—	87.20% (7B few-shot)

No GPU vendor lock-in. wgpu is the primary backend (any vendor); CUDA is optional for hardware where wgpu support lags. CPU/GPU parity verified: 7B produces identical 85.37% on both backends.

2.5 Inference Without GPU

Inference-only techniques (merging, quantization) and small-model inference (≤7B quantized) run on CPU via trueno SIMD (AVX2/NEON). GPU is recommended for training-phase techniques (distillation, fine-tuning) but not required for evaluation.

2.6 Falsification Criteria

The thesis is falsified if any of these hold after applying the full pipeline:

HumanEval pass@1 < 80% for Qwen2.5-Coder-7B (below "Strong" tier) — NOT FALSIFIED: 87.20% ✅
Inference parity gap > 5% vs HuggingFace reference implementation — NOT FALSIFIED: 0.60pp gap ✅
Any pipeline stage requires Python to complete — NOT FALSIFIED: zero Python ✅
wgpu training fails to produce decreasing loss on Qwen2.5-Coder-1.5B — NOT FALSIFIED: loss decreases ✅

See §15 for complete success criteria and §18 for acceptance criteria.

APR Leaderboard Specification