`batuta serve`

Serve ML models via Realizar inference server with optional OpenAI-compatible API.

Synopsis

batuta serve [OPTIONS] [MODEL]

The serve command launches a local inference server for ML models. It supports multiple model sources (Pacha registry, HuggingFace, local files) and can expose an OpenAI-compatible REST API for drop-in integration with existing toolchains.

Arguments

Argument	Description
`[MODEL]`	Model reference: `pacha://name:version`, `hf://org/model`, or local path

Options

Option	Description
`-H, --host <HOST>`	Host to bind to (default: `127.0.0.1`)
`-p, --port <PORT>`	Port to bind to (default: `8080`)
`--openai-api`	Enable OpenAI-compatible API at `/v1/*`
`--watch`	Enable hot-reload on model changes
`-v, --verbose`	Enable verbose output
`-h, --help`	Print help

Examples

Serve a Local Model

$ batuta serve ./model.gguf --port 8080

Serve from Pacha Registry

$ batuta serve pacha://llama3:8b

OpenAI-Compatible API

$ batuta serve pacha://llama3:8b --openai-api

# Then use standard OpenAI clients:
# curl http://localhost:8080/v1/chat/completions ...

Hot-Reload During Development

$ batuta serve ./model.apr --watch

The Batuta Book