First Server

Serve a model as an HTTP API:

apr serve model.gguf --port 8080

Then query it:

curl http://localhost:8080/v1/completions \
  -H "Content-Type: application/json" \
  -d '{"prompt": "What is 2+2?", "max_tokens": 32}'

OpenAI-Compatible API

The server implements the OpenAI completions API:

# Chat completions
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Hello"}],
    "max_tokens": 100
  }'