batuta agent
Sovereign agent runtime using the perceive-reason-act pattern.
Synopsis
batuta agent run --manifest <MANIFEST> --prompt <PROMPT> [--max-iterations <N>] [--daemon]
batuta agent chat --manifest <MANIFEST>
batuta agent validate --manifest <MANIFEST>
batuta agent status --manifest <MANIFEST>
batuta agent sign --manifest <MANIFEST> [--signer <ID>] [--output <PATH>]
batuta agent verify-sig --manifest <MANIFEST> --pubkey <PATH> [--signature <PATH>]
batuta agent contracts
Subcommands
run
Execute a single agent invocation with the given prompt.
batuta agent run --manifest agent.toml --prompt "Summarize the codebase"
Options:
| Flag | Description |
|---|---|
--manifest <PATH> | Path to agent manifest TOML file |
--prompt <TEXT> | Prompt to send to the agent |
--max-iterations <N> | Override max iterations from manifest |
--daemon | Run as a long-lived service (for forjar deployments) |
chat
Start an interactive chat session with the agent. Type quit or exit to end.
batuta agent chat --manifest agent.toml
The chat loop runs run_agent_loop() for each user message, maintaining
persistent memory across turns (recalled via BM25 when using TruenoMemory).
validate
Validate an agent manifest without running it.
batuta agent validate --manifest agent.toml
status
Display agent manifest summary, resource quotas, model config, and capabilities.
batuta agent status --manifest agent.toml
Reports validation errors (if any), manifest metadata, resource limits (max iterations, tool calls, cost budget), model configuration, and the list of granted capabilities.
sign
Cryptographically sign an agent manifest using Ed25519 via pacha+BLAKE3.
batuta agent sign --manifest agent.toml --signer "admin@paiml.com"
batuta agent sign --manifest agent.toml --output agent.toml.sig
The manifest is normalized to canonical TOML before hashing to ensure deterministic signatures regardless of whitespace or key ordering.
verify-sig
Verify an Ed25519 signature on an agent manifest.
batuta agent verify-sig --manifest agent.toml --pubkey key.pub
batuta agent verify-sig --manifest agent.toml --pubkey key.pub --signature agent.toml.sig
contracts
Display the design-by-contract invariants from contracts/agent-loop-v1.yaml.
batuta agent contracts
Shows all invariants (INV-001 through INV-007), their test bindings, and verification targets (coverage, mutation, complexity thresholds).
Agent Manifest
The agent manifest is a TOML file that configures the runtime:
name = "code-reviewer"
version = "0.1.0"
description = "Reviews code for quality issues"
[model]
model_path = "/models/llama3-8b.gguf"
max_tokens = 4096
temperature = 0.3
system_prompt = "You are a code review assistant."
[resources]
max_iterations = 20
max_tool_calls = 50
max_cost_usd = 0.0 # 0 = unlimited (sovereign)
capabilities = ["Rag", "Memory"]
privacy = "Sovereign"
Architecture
The agent uses a perceive-reason-act loop (Toyota Way: Jidoka):
┌─────────────────────────────────────┐
│ Perceive (Memory Recall) │
│ Recall relevant memories, augment │
│ system prompt with context │
├─────────────────────────────────────┤
│ Context Management [F-003] │
│ Pre-subtract system+tool tokens, │
│ truncate messages via SlidingWindow│
├─────────────────────────────────────┤
│ Reason (LLM Completion) │
│ Send truncated conversation to │
│ LlmDriver with retry+backoff │
├─────────────────────────────────────┤
│ Act (Tool Execution) │
│ Execute tools with capability │
│ checks (Poka-Yoke), store results │
├─────────────────────────────────────┤
│ Guard (Jidoka) │
│ Check iteration limits, ping-pong │
│ detection, cost budget │
└─────────────────────────────────────┘
Context Management
The agent integrates serve::context::ContextManager for token-aware
truncation before each LLM call. This prevents context overflow errors
and ensures long conversations degrade gracefully.
Budget calculation:
effective_window = driver.context_window()
- estimate_tokens(system_prompt)
- estimate_tokens(tool_definitions)
- output_reserve (max_tokens)
The system prompt and tool schemas are pre-subtracted from the window.
Only conversation messages are passed to the SlidingWindow truncation
strategy, which keeps the most recent messages when the budget is exceeded.
Error modes:
- If messages fit: no truncation, zero overhead
- If messages overflow: oldest messages dropped (SlidingWindow)
- If overflow after truncation:
AgentError::ContextOverflow
Retry with Exponential Backoff
Driver calls use automatic retry for transient errors:
| Error Type | Retryable | Backoff |
|---|---|---|
RateLimited | Yes | 1s, 2s, 4s |
Overloaded | Yes | 1s, 2s, 4s |
Network | Yes | 1s, 2s, 4s |
ModelNotFound | No | Immediate fail |
InferenceFailed | No | Immediate fail |
Maximum 3 retry attempts with exponential backoff (base 1s).
Safety Features
- LoopGuard: Prevents runaway loops (max iterations, tool call limits)
- Ping-pong detection: FxHash-based detection of oscillatory tool calls
- Capability filtering: Tools only accessible if manifest grants capability
- Cost circuit breaker: Stops execution when cost budget exceeded
- Context truncation: Automatic SlidingWindow truncation for long conversations
- Consecutive MaxTokens: Circuit-breaks after 5 consecutive truncated responses
- Privacy tier: Sovereign (local-only), Private, or Standard
Daemon Mode
The --daemon flag runs the agent as a long-lived service process,
suitable for forjar deployments:
batuta agent run \
--manifest /etc/batuta/agent.toml \
--prompt "Monitor system health" \
--daemon
Daemon mode:
- Runs the agent loop as a background service
- Responds to SIGTERM/SIGINT for graceful shutdown
- Designed for systemd integration via forjar provisioning
Examples
# Validate a manifest
batuta agent validate --manifest examples/agent.toml
# Run with a prompt
batuta agent run \
--manifest examples/agent.toml \
--prompt "What are the main modules in this project?"
# Override iteration limit
batuta agent run \
--manifest examples/agent.toml \
--prompt "Find all TODO comments" \
--max-iterations 5
# Run as daemon (forjar)
batuta agent run \
--manifest examples/agent.toml \
--prompt "Monitor logs" \
--daemon
Driver Backends
| Driver | Privacy Tier | Feature | Description |
|---|---|---|---|
RealizarDriver | Sovereign | inference | Local GGUF/APR inference via realizar |
MockDriver | Sovereign | agents | Deterministic responses for testing |
RemoteDriver | Standard | native | HTTP to Anthropic/OpenAI APIs |
RoutingDriver | Configurable | native | Local-first with remote fallback |
RoutingDriver
The RoutingDriver wraps a primary (typically local/sovereign) and fallback
(typically remote/cloud) driver. Three strategies:
| Strategy | Behavior |
|---|---|
PrimaryWithFallback | Try primary; on retryable error, spillover to fallback |
PrimaryOnly | Primary only, no fallback |
FallbackOnly | Fallback only, skip primary |
Privacy tier inherits the most permissive of the two drivers — if the
fallback is Standard, data may leave the machine on spillover.
RemoteDriver
Supports both Anthropic Messages API and OpenAI Chat Completions API:
| Provider | Endpoint | Tool Format |
|---|---|---|
| Anthropic | /v1/messages | tool_use content blocks |
| OpenAI | /v1/chat/completions | function tool_calls |
Error mapping: HTTP 429 → RateLimited, 529/503 → Overloaded, other → Network.
Builtin Tools
| Tool | Capability | Feature | Description |
|---|---|---|---|
MemoryTool | Memory | agents | Read/write agent persistent state |
RagTool | Rag | rag | Search indexed documentation via BM25+vector |
ShellTool | Shell | agents | Sandboxed subprocess execution with allowlisting |
ComputeTool | Compute | agents | Parallel task execution via JoinSet |
BrowserTool | Browser | agents-browser | Headless Chromium automation |
ShellTool
Executes shell commands with capability-based allowlisting (Poka-Yoke):
- Only allowlisted commands are executable
- Working directory is restricted
- Output truncated to 8192 bytes to prevent context overflow
- Configurable timeout (default: 30 seconds)
ComputeTool
Parallel task execution for compute-intensive workflows:
- Single task execution (
runaction) - Parallel execution (
parallelaction) via tokio JoinSet - Max concurrent tasks configurable (default: 4)
- Output truncated to 16KB per task
- Configurable timeout (default: 5 minutes)
BrowserTool Actions
| Action | Input | Description |
|---|---|---|
navigate | { "url": "..." } | Navigate to URL (Sovereign: localhost only) |
screenshot | {} | Take page screenshot (base64 PNG) |
evaluate | { "expression": "..." } | Evaluate JavaScript |
eval_wasm | { "expression": "..." } | Evaluate WASM expression |
click | { "selector": "..." } | Click CSS selector |
wait_wasm | {} | Wait for WASM runtime readiness |
console | {} | Get console messages |
Programmatic Usage
Basic Usage
#![allow(unused)]
fn main() {
use batuta::agent::manifest::AgentManifest;
use batuta::agent::driver::mock::MockDriver;
use batuta::agent::memory::InMemorySubstrate;
use batuta::agent::runtime::run_agent_loop;
use batuta::agent::tool::ToolRegistry;
let manifest = AgentManifest::default();
let driver = MockDriver::single_response("Hello!");
let registry = ToolRegistry::default();
let memory = InMemorySubstrate::new();
let result = run_agent_loop(
&manifest,
"Say hello",
&driver,
®istry,
&memory,
None, // Optional stream event channel
).await?;
println!("Response: {}", result.text);
}
Using AgentBuilder
#![allow(unused)]
fn main() {
use batuta::agent::AgentBuilder;
use batuta::agent::manifest::AgentManifest;
use batuta::agent::driver::mock::MockDriver;
let manifest = AgentManifest::default();
let driver = MockDriver::single_response("Built!");
let result = AgentBuilder::new(&manifest)
.driver(&driver)
.run("Hello builder")
.await?;
println!("{}", result.text); // "Built!"
}
With Stream Events
#![allow(unused)]
fn main() {
use tokio::sync::mpsc;
use batuta::agent::AgentBuilder;
use batuta::agent::driver::StreamEvent;
let (tx, mut rx) = mpsc::channel(64);
let result = AgentBuilder::new(&manifest)
.driver(&driver)
.stream(tx)
.run("Hello")
.await?;
while let Ok(event) = rx.try_recv() {
match event {
StreamEvent::PhaseChange { phase } => {
println!("Phase: {phase}");
}
StreamEvent::TextDelta { text } => {
print!("{text}");
}
_ => {}
}
}
}
Quality Gates
The agent module passes all PMAT quality gates:
- Zero SATD comments (QA-001)
- All source files ≤500 lines (QA-002)
- 95%+ line coverage (QA-003)
- Zero cognitive complexity violations (QA-005)
- 16/16 design-by-contract invariants verified
- 27/27 integration demo scenarios passing
Run quality verification:
# Contract invariants
cargo run --example agent_contracts --features agents
# Full integration demos
cargo run --example agent_demo --features agents