Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

batuta serve

Serve ML models via Realizar inference server with optional OpenAI-compatible API.

Synopsis

batuta serve [OPTIONS] [MODEL]

Description

The serve command launches a local inference server for ML models. It supports multiple model sources (Pacha registry, HuggingFace, local files) and can expose an OpenAI-compatible REST API for drop-in integration with existing toolchains.

Arguments

ArgumentDescription
[MODEL]Model reference: pacha://name:version, hf://org/model, or local path

Options

OptionDescription
-H, --host <HOST>Host to bind to (default: 127.0.0.1)
-p, --port <PORT>Port to bind to (default: 8080)
--openai-apiEnable OpenAI-compatible API at /v1/*
--watchEnable hot-reload on model changes
-v, --verboseEnable verbose output
-h, --helpPrint help

Examples

Serve a Local Model

$ batuta serve ./model.gguf --port 8080

Serve from Pacha Registry

$ batuta serve pacha://llama3:8b

OpenAI-Compatible API

$ batuta serve pacha://llama3:8b --openai-api

# Then use standard OpenAI clients:
# curl http://localhost:8080/v1/chat/completions ...

Hot-Reload During Development

$ batuta serve ./model.apr --watch

See Also


Previous: batuta mcp Next: batuta deploy