batuta oracle
Query the Sovereign AI Stack knowledge graph for component recommendations, backend selection, and integration patterns.
Synopsis
batuta oracle [OPTIONS] [QUERY]
Description
Oracle Mode provides an intelligent query interface to the Sovereign AI Stack. It analyzes your requirements and recommends:
- Primary component for your task
- Supporting components that integrate well
- Compute backend (Scalar/SIMD/GPU/Distributed)
- Code examples ready to use
Options
| Option | Description |
|---|---|
--list | List all stack components |
--show <component> | Show details about a specific component |
--capabilities <cap> | Find components by capability (e.g., simd, ml, transpilation) |
--integrate <from> <to> | Show integration pattern between two components |
--interactive | Start interactive query mode |
--format <format> | Output format: text (default), json, markdown, or code |
--rag | Use RAG-based retrieval from indexed stack documentation |
--rag-index | Index/reindex stack documentation for RAG queries |
--rag-index-force | Clear cache and rebuild index from scratch |
--rag-stats | Show cache statistics (fast, manifest only) |
--rag-dashboard | Launch TUI dashboard for RAG index statistics |
--local | Show local workspace status (~/src PAIML projects) |
--dirty | Show only dirty (uncommitted changes) projects |
--publish-order | Show safe publish order respecting dependencies |
-h, --help | Print help information |
Examples
List Stack Components
$ batuta oracle --list
๐ Sovereign AI Stack Components:
Layer 0: Compute Primitives
- trueno v0.8.8: SIMD-accelerated tensor operations + simulation testing framework
- trueno-db v0.3.7: High-performance vector database
- trueno-graph v0.1.4: Graph analytics engine
- trueno-viz v0.1.5: Visualization toolkit
Layer 1: ML Algorithms
- aprender v0.19.0: First-principles ML library
Layer 2: Training & Inference
- entrenar v0.3.0: Training loop framework
- realizar v0.3.0: ML inference runtime
...
Query Component Details
$ batuta oracle --show aprender
๐ฆ Component: aprender v0.19.0
Layer: ML Algorithms
Description: Next-generation machine learning library in pure Rust
Capabilities:
- random_forest (Machine Learning)
- gradient_boosting (Machine Learning)
- clustering (Machine Learning)
- neural_networks (Machine Learning)
Integrates with:
- trueno: Uses SIMD-accelerated tensor operations
- realizar: Exports models for inference
- alimentar: Loads training data
References:
[1] Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32
[2] Chen & Guestrin (2016). XGBoost: A Scalable Tree Boosting System
Find by Capability
$ batuta oracle --capabilities simd
๐ Components with 'simd' capability:
- trueno: SIMD-accelerated tensor operations
Natural Language Query
$ batuta oracle "How do I train a random forest on 1M samples?"
๐ Analysis:
Problem class: Supervised Learning
Algorithm: random_forest
Data size: Large (1M samples)
๐ก Primary Recommendation: aprender
Path: aprender::tree::RandomForest
Confidence: 95%
๐ง Backend: SIMD
Rationale: SIMD vectorization optimal for 1M samples
๐ป Code Example:
use aprender::tree::RandomForest;
let model = RandomForest::new()
.n_estimators(100)
.max_depth(Some(10))
.fit(&x, &y)?;
Integration Patterns
$ batuta oracle --integrate depyler aprender
๐ Integration: depyler โ aprender
Pattern: sklearn_migration
Description: Convert sklearn code to aprender
Before (Python/sklearn):
from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(n_estimators=100)
After (Rust/aprender):
use aprender::tree::RandomForest;
let model = RandomForest::new().n_estimators(100);
Interactive Mode
$ batuta oracle --interactive
๐ฎ Oracle Mode - Ask anything about the Sovereign AI Stack
oracle> What's the fastest way to do matrix multiplication?
๐ Analysis:
Problem class: Linear Algebra
๐ก Primary Recommendation: trueno
Confidence: 85%
Rationale: SIMD-accelerated matrix operations
๐ป Code Example:
use trueno::prelude::*;
let a = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0]).reshape([2, 2]);
let b = Tensor::from_vec(vec![5.0, 6.0, 7.0, 8.0]).reshape([2, 2]);
let c = a.matmul(&b);
oracle> exit
Goodbye!
JSON Output
$ batuta oracle --format json "random forest"
{
"problem_class": "Supervised Learning",
"algorithm": "random_forest",
"primary": {
"component": "aprender",
"path": "aprender::tree::RandomForest",
"confidence": 0.9,
"rationale": "Random forest for supervised learning"
},
"compute": {
"backend": "SIMD",
"rationale": "SIMD vectorization optimal"
},
"distribution": {
"needed": false,
"rationale": "Single-node sufficient"
}
}
Code Output
Extract raw code snippets for piping to other tools. No ANSI escapes, no metadata โ just code. All code output includes TDD test companions (#[cfg(test)] modules) appended after the main code:
# Extract code from a recipe (includes test companion)
$ batuta oracle --recipe ml-random-forest --format code
use aprender::tree::RandomForest;
let model = RandomForest::new()
.n_estimators(100)
.max_depth(Some(10))
.fit(&x, &y)?;
#[cfg(test)]
mod tests {
#[test]
fn test_random_forest_construction() {
let n_estimators = 100;
assert!(n_estimators > 0);
}
// ... 2-3 more focused tests
}
# Natural language queries also include test companions
$ batuta oracle "train a model" --format code > example.rs
# Pipe to rustfmt and clipboard
$ batuta oracle --recipe training-lora --format code | rustfmt | pbcopy
# Dump all cookbook recipes as code (each includes test companion)
$ batuta oracle --cookbook --format code > all_recipes.rs
# Count test companions
$ batuta oracle --cookbook --format code 2>/dev/null | grep -c '#\[cfg('
34
# Commands without code exit with code 1
$ batuta oracle --list --format code
No code available for --list (try --format text)
$ echo $?
1
When the requested context has no code available (e.g., --list, --capabilities, --rag), the process exits with code 1 and a stderr diagnostic suggesting --format text.
RAG-Based Query
Query using Retrieval-Augmented Generation from indexed stack documentation:
$ batuta oracle --rag "How do I fine-tune a model with LoRA?"
๐ RAG Oracle Query: "How do I fine-tune a model with LoRA?"
๐ Retrieved Documents (RRF-fused):
1. entrenar/CLAUDE.md (score: 0.847)
"LoRA (Low-Rank Adaptation) enables parameter-efficient fine-tuning..."
2. aprender/CLAUDE.md (score: 0.623)
"For training workflows, entrenar provides autograd and optimization..."
๐ก Recommendation:
Use `entrenar` for LoRA fine-tuning with quantization support (QLoRA).
๐ป Code Example:
use entrenar::lora::{LoraConfig, LoraTrainer};
let config = LoraConfig::new()
.rank(16)
.alpha(32.0)
.target_modules(&["q_proj", "v_proj"]);
let trainer = LoraTrainer::new(model, config);
trainer.train(&dataset)?;
Index Stack Documentation
Build or update the RAG index from stack CLAUDE.md files and ground truth corpora:
$ batuta oracle --rag-index
๐ RAG Indexer (Heijunka Mode)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Scanning Rust stack repositories...
โ trueno/CLAUDE.md โโโโโโโโโโโโโโโ (12 chunks)
โ trueno/README.md โโโโโโโโโโโโโโโ (8 chunks)
โ aprender/CLAUDE.md โโโโโโโโโโโโโโโ (15 chunks)
โ realizar/CLAUDE.md โโโโโโโโโโโโโโโ (8 chunks)
...
Scanning Python ground truth corpora...
โ hf-ground-truth-corpus/CLAUDE.md โโโโโโโโโโโโโโโ (6 chunks)
โ hf-ground-truth-corpus/README.md โโโโโโโโโโโโโโโ (12 chunks)
โ src/hf_gtc/hub/search.py โโโโโโโโโโโโโโโ (4 chunks)
โ src/hf_gtc/preprocessing/tokenization.py โโโโโโโโโโโโโโ (6 chunks)
...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Complete: 28 documents, 186 chunks indexed
Vocabulary: 3847 unique terms
Avg doc length: 89.4 tokens
Reindexer: 28 documents tracked
Query Ground Truth Corpora
Query for Python ML patterns and get cross-language results:
$ batuta oracle --rag "How do I tokenize text for BERT?"
๐ RAG Oracle Mode
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Index: 28 documents, 186 chunks
Query: How do I tokenize text for BERT?
1. [hf-ground-truth-corpus] src/hf_gtc/preprocessing/tokenization.py#12 โโโโโโโโโโ 82%
def preprocess_text(text: str) -> str:
text = text.strip().lower()...
2. [trueno] trueno/CLAUDE.md#156 โโโโโโโโโโ 65%
For text preprocessing, trueno provides...
3. [hf-ground-truth-corpus] hf-ground-truth-corpus/README.md#42 โโโโโโโโโโ 58%
from hf_gtc.preprocessing.tokenization import preprocess_text...
$ batuta oracle --rag "sentiment analysis pipeline"
# Returns Python pipeline patterns + Rust inference equivalents
RAG Cache Statistics
Show index statistics without a full load (reads manifest only):
$ batuta oracle --rag-stats
๐ RAG Index Statistics
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Version: 1.0.0
Batuta version: 0.6.2
Indexed at: 2025-01-30 14:23:45 UTC
Cache path: /home/user/.cache/batuta/rag
Sources:
- trueno: 4 docs, 42 chunks (commit: abc123)
- aprender: 3 docs, 38 chunks (commit: def456)
- hf-ground-truth-corpus: 12 docs, 100 chunks
Force Rebuild Index
Rebuild from scratch, ignoring fingerprint-based skip. The old cache is retained until the new index is saved (crash-safe two-phase write):
$ batuta oracle --rag-index-force
Force rebuild requested (old cache retained until save)...
๐ RAG Indexer (Heijunka Mode)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Scanning Rust stack repositories...
โ trueno/CLAUDE.md โโโโโโโโโโโโโโโ (12 chunks)
...
Complete: 28 documents, 186 chunks indexed
Index saved to /home/user/.cache/batuta/rag
RAG Dashboard
Launch the TUI dashboard to monitor RAG index health:
$ batuta oracle --rag-dashboard
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RAG Oracle Dashboard โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Index Status: HEALTHY Last Updated: 2 hours ago โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Documents by Priority: โ
โ P0 (Critical): โโโโโโโโโโโโโโโโโโโโ 12 CLAUDE.md โ
โ P1 (High): โโโโโโโโโโโโ 8 README.md โ
โ P2 (Medium): โโโโโโ 4 docs/ โ
โ P3 (Low): โโโโ 2 examples/ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Retrieval Quality (last 24h): โ
โ MRR: 0.847 โโโโโโโโโโโโโโโโโโโโ โ
โ Recall@5: 0.923 โโโโโโโโโโโโโโโโโโโโ โ
โ NDCG@10: 0.891 โโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Reindex Queue (Heijunka): โ
โ - entrenar/CLAUDE.md (staleness: 0.72) โ
โ - realizar/CLAUDE.md (staleness: 0.45) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Local Workspace Discovery
Discover PAIML projects in ~/src with development state awareness:
$ batuta oracle --local
๐ Local Workspace Status (PAIML projects in ~/src)
๐ Summary:
Total projects: 42
โ
Clean: 28
๐ง Dirty: 10
๐ค Unpushed: 4
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ Project โ Local โ Crates.io โ State โ Git Status โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโผโโโโโโโโโโโโโโโโโโค
โ trueno โ 0.11.0 โ 0.11.0 โ โ
Clean โ โ
โ aprender โ 0.24.0 โ 0.24.0 โ โ
Clean โ โ
โ depyler โ 3.21.0 โ 3.20.0 โ ๐ง Dirty โ 15 mod, 3 new โ
โ entrenar โ 0.5.0 โ 0.5.0 โ ๐ค Unpushed โ 2 ahead โ
โ batuta โ 0.5.0 โ 0.5.0 โ โ
Clean โ โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโดโโโโโโโโโโโโโโโโโโ
๐ก Dirty projects use crates.io version for deps (stable)
Development State Legend
| State | Icon | Meaning |
|---|---|---|
| Clean | โ | No uncommitted changes, safe to use local version |
| Dirty | ๐ง | Active development, use crates.io version for deps |
| Unpushed | ๐ค | Clean but has unpushed commits |
Key Insight: Dirty projects donโt block the stack! The crates.io version is stable and should be used for dependencies while local development continues.
Show Only Dirty Projects
Filter to show only projects with uncommitted changes:
$ batuta oracle --dirty
๐ง Dirty Projects (active development)
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Project โ Local โ Crates.io โ Changes โ
โโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ depyler โ 3.21.0 โ 3.20.0 โ 15 modified, 3 untrackedโ
โ renacer โ 0.10.0 โ 0.9.0 โ 8 modified โ
โ pmat โ 0.20.0 โ 0.19.0 โ 22 modified, 5 untrackedโ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ก These projects are safe to skip - crates.io versions are stable.
Focus on --publish-order for clean projects ready to release.
Publish Order
Show the safe publish order respecting inter-project dependencies:
$ batuta oracle --publish-order
๐ฆ Suggested Publish Order (topological sort)
Step 1: trueno-graph (0.1.9 โ 0.1.10)
โ
Ready - no blockers
Dependencies: (none)
Step 2: aprender (0.23.0 โ 0.24.0)
โ
Ready - no blockers
Dependencies: trueno
Step 3: entrenar (0.4.0 โ 0.5.0)
โ
Ready - no blockers
Dependencies: aprender
Step 4: depyler (3.20.0 โ 3.21.0)
โ ๏ธ Blocked: 15 uncommitted changes
Dependencies: aprender, entrenar
Step 5: batuta (0.4.9 โ 0.5.0)
โ ๏ธ Blocked: waiting for depyler
Dependencies: all stack components
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Summary:
Ready to publish: 3 projects
Blocked: 2 projects
๐ก Run 'cargo publish' in order shown above.
Skip blocked projects - they'll use crates.io stable versions.
Auto-Update System
The RAG index stays fresh automatically through three layers:
Layer 1: Shell Auto-Fresh (ora-fresh)
# Runs automatically on shell login (non-blocking background check)
# Manual invocation:
$ ora-fresh
โ
Index is fresh (3h old)
# When a stack repo has been committed since last index:
$ ora-fresh
๐ Stack changed since last index, refreshing...
Layer 2: Post-Commit Hooks
All 26 stack repos have a post-commit hook that touches a stale marker:
# Installed in .git/hooks/post-commit across all stack repos
touch "$HOME/.cache/batuta/rag/.stale" 2>/dev/null
Layer 3: Fingerprint-Based Change Detection
On reindex, BLAKE3 content fingerprints skip work when nothing changed:
# Second run detects no changes via fingerprints
$ batuta oracle --rag-index
โ
Index is current (no files changed since last index)
# Force reindex ignores fingerprints (old cache retained until save)
$ batuta oracle --rag-index-force
Force rebuild requested (old cache retained until save)...
๐ RAG Indexer (Heijunka Mode)
...
Complete: 5016 documents, 264369 chunks indexed
Each DocumentFingerprint tracks:
- Content hash (BLAKE3 of file contents)
- Chunker config hash (detect parameter changes)
- Model hash (detect embedding model changes)
Exit Codes
| Code | Description |
|---|---|
0 | Success |
1 | General error / no code available (--format code on non-code context) |
2 | Invalid arguments |
See Also
- Oracle Mode: Intelligent Query Interface - Full documentation
batuta analyze- Project analysisbatuta transpile- Code transpilation
Previous: batuta reset
Next: Migration Strategy