Example 1: Python ML Project

This walkthrough demonstrates a full transpilation of a Python ML pipeline using scikit-learn and NumPy into pure Rust powered by the Sovereign AI Stack.

Scenario

A data science team maintains a fraud detection service written in Python. The pipeline reads CSV data, normalizes features with StandardScaler, trains a RandomForestClassifier, and serves predictions over HTTP. Latency is 12 ms per request. The team wants sub-millisecond inference in a single static binary.

Source Project Layout

fraud_detector/
  requirements.txt      # numpy, scikit-learn, pandas, flask
  train.py              # Training script
  serve.py              # Flask prediction endpoint
  tests/test_model.py   # pytest suite

Step 1 – Analyze

batuta analyze --languages --tdg ./fraud_detector

Batuta scans every file, detects Python, identifies NumPy, scikit-learn, and Flask imports, and computes a Technical Debt Grade. Output includes a dependency graph and framework detection summary.

Languages detected: Python (100%)
ML frameworks: numpy (32 ops), scikit-learn (8 algorithms)
Web framework: Flask (1 endpoint)
TDG Score: B (72/100)

Step 2 – Detect Frameworks

batuta analyze --ml-frameworks ./fraud_detector

The ML framework detector maps every NumPy call to a trueno operation and every scikit-learn algorithm to an aprender equivalent. The report shows which conversions are fully automated and which require manual review.

Step 3 – Transpile

batuta transpile ./fraud_detector --tool depyler --output ./fraud_detector_rs

Depyler converts Python to Rust. Batuta replaces NumPy calls with trueno operations and scikit-learn models with aprender equivalents. The Flask endpoint becomes an axum handler.

Step 4 – Optimize

batuta optimize ./fraud_detector_rs --backend auto

The MoE backend selector analyzes each operation. Small element-wise operations stay scalar. Feature normalization across thousands of rows uses SIMD via trueno. The random forest ensemble uses GPU when the data exceeds the 5x PCIe transfer cost threshold.

Step 5 – Validate

batuta validate ./fraud_detector_rs --reference ./fraud_detector

Batuta runs the original Python test suite and the generated Rust test suite side by side, comparing outputs with configurable tolerance (default 1e-6 for floating point). Syscall tracing via renacer confirms identical I/O behavior.

Result

Metric	Python	Rust
Inference	12 ms	0.4 ms
Binary size	48 MB	3.2 MB
Dependencies	127	4 crates
Memory	180 MB	12 MB

Key Takeaways

The 5-phase pipeline (Analyze, Transpile, Optimize, Validate, Build) handles the entire conversion without manual Rust authoring for standard patterns.
Batuta’s Jidoka principle stops the pipeline at the first validation failure, preventing broken code from reaching later phases.
Framework-specific converters (NumPy, sklearn, PyTorch) are detailed in the following sub-chapters.

Navigate: Table of Contents

Keyboard shortcuts

The Batuta Book