Skip to content

Public API

Lattis serves one local HTTP API that speaks both the OpenAI and Anthropic dialects. It listens on 127.0.0.1:1234 by default (configurable; the router child uses the next port up).

Point any OpenAI- or Anthropic-compatible client at it. Use a local or connected cloud model by passing its id as the model field; Lattis routes and translates formats as needed.

Method & pathPurpose
GET /healthLiveness check.
GET /v1/modelsLocal + connected-remote models, merged.
POST /v1/chat/completionsOpenAI chat (stream or whole).
POST /v1/responsesOpenAI Responses.
GET /v1/responsesOpenAI Responses over WebSocket (Codex).
POST /v1/messagesAnthropic Messages.
POST /v1/messages/count_tokensAnthropic token counting.

OpenAI chat completions:

Terminal window
curl http://127.0.0.1:1234/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3-4b-instruct-2507","messages":[{"role":"user","content":"Hello!"}]}'

Anthropic messages:

Terminal window
curl http://127.0.0.1:1234/v1/messages \
-H 'Content-Type: application/json' \
-d '{"model":"qwen3-4b-instruct-2507","max_tokens":256,"messages":[{"role":"user","content":"Hello!"}]}'

List models (local + connected cloud):

Terminal window
curl http://127.0.0.1:1234/v1/models

Each entry includes the model’s context window (meta.n_ctx) where Lattis knows it — read from GGUF/MLX metadata for local models, and from a built-in table for cloud models.

POST /v1/chat/completions and POST /v1/messages support streaming responses. The GET /v1/responses WebSocket transport is used by Codex clients on the OpenAI subscription path.

  • The API is unauthenticated and assumes a loopback bind. Don’t expose it on a non-loopback interface without putting your own auth in front of it.
  • For local administration (downloads, loading models, connecting providers), see the Control API.