One endpoint, two dialects
A single local HTTP server speaks both the OpenAI and Anthropic APIs. Point any existing SDK or tool at localhost — no code changes.
Local + cloud LLM gateway
Lattis is a desktop manager and gateway for LLMs. One local endpoint speaks both the OpenAI and Anthropic APIs, routing every request to local or cloud models — with usage and cost you can actually account for.
Open source · macOS, Linux, Windows · Rust + iced
$ curl localhost:1234/v1/chat/completions \
-d '"model": "qwen3-4b-instruct-2507", ...' # local GGUF
$ curl localhost:1234/v1/messages \
-d '"model": "claude-opus-4-8", ...' # cloud, OAuth
$ curl localhost:1234/v1/chat/completions \
-d '"model": "gpt-5.5", ...' # cloud, API key
→ one endpoint · OpenAI + Anthropic APIs · local + cloud What it does
A single local HTTP server speaks both the OpenAI and Anthropic APIs. Point any existing SDK or tool at localhost — no code changes.
Download GGUF models from Hugging Face and serve them with a bundled llama-server in router mode. Keep several resident; swap on demand.
Connect Anthropic and OpenAI / Codex via OAuth or API key. Requests are translated between OpenAI, Anthropic, and Responses formats automatically.
Detects Apple's mlx_lm if installed and serves MLX models alongside the rest — Metal-accelerated, hidden when unavailable.
Per-app, per-model, and per-project token usage with estimated cost. Know exactly what each project and provider is spending.
The daemon owns storage, downloads, the inference engine, and cloud connections. Launch on login and it serves the API 24/7 — app optional.
How it routes
Behind a single endpoint, Lattis routes each request to the right engine — a resident local GGUF, an MLX model on Apple Silicon, or a connected cloud provider — translating API formats as needed so your client never has to care.
See the API →Get going
Grab the desktop app (coming soon) or build from source.
Download a GGUF from the Library, or connect a cloud provider.
Aim your client at localhost:1234 and build.