Public API

Lattis serves one local HTTP API that speaks both the OpenAI and Anthropic dialects. It listens on 127.0.0.1:1234 by default (configurable; the router child uses the next port up).

Point any OpenAI- or Anthropic-compatible client at it. Use a local or connected cloud model by passing its id as the model field; Lattis routes and translates formats as needed.

Endpoints

Method & path	Purpose
`GET /health`	Liveness check.
`GET /v1/models`	Local + connected-remote models, merged.
`POST /v1/chat/completions`	OpenAI chat (stream or whole).
`POST /v1/responses`	OpenAI Responses.
`GET /v1/responses`	OpenAI Responses over WebSocket (Codex).
`POST /v1/messages`	Anthropic Messages.
`POST /v1/messages/count_tokens`	Anthropic token counting.

Examples

OpenAI chat completions:

curl http://127.0.0.1:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3-4b-instruct-2507","messages":[{"role":"user","content":"Hello!"}]}'

Anthropic messages:

curl http://127.0.0.1:1234/v1/messages \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3-4b-instruct-2507","max_tokens":256,"messages":[{"role":"user","content":"Hello!"}]}'

List models (local + connected cloud):

curl http://127.0.0.1:1234/v1/models

Each entry includes the model’s context window (meta.n_ctx) where Lattis knows it — read from GGUF/MLX metadata for local models, and from a built-in table for cloud models.

Streaming

POST /v1/chat/completions and POST /v1/messages support streaming responses. The GET /v1/responses WebSocket transport is used by Codex clients on the OpenAI subscription path.

Notes

The API is unauthenticated and assumes a loopback bind. Don’t expose it on a non-loopback interface without putting your own auth in front of it.
For local administration (downloads, loading models, connecting providers), see the Control API.