Introduction
Lattis is a desktop manager and gateway for large language models. It serves
a single local HTTP endpoint that speaks both the OpenAI and Anthropic APIs,
so existing SDKs and tools work unchanged when pointed at localhost.
Behind that one endpoint, Lattis routes each request to either:
- Local models — GGUF models you download from Hugging Face, served by a
bundled
llama-server(llama.cpp) running in router mode, which keeps several models resident and swaps them on demand; or - Cloud models — remote providers (Anthropic, OpenAI / Codex) you connect via OAuth or an API key. Requests are translated between the OpenAI, Anthropic, and Responses formats as needed.
It also tracks per-app, per-model, and per-project token usage and estimated cost, and can run as a background daemon that launches on login.
Two processes
Section titled “Two processes”The desktop app is a thin controller for a background daemon. The daemon owns everything that matters; the app just commands and observes it.
| Process | Role |
|---|---|
lattisd | Background service: storage, downloads, llama-server, cloud proxy, API. |
lattis | Desktop controller (iced); talks to the daemon over localhost. |
The daemon runs independently of the GUI. With launch-on-login enabled it serves the API 24/7; the app only needs to be open to download models, change what is being served, or connect a cloud provider.
Why a gateway
Section titled “Why a gateway”- One integration surface. Write against the OpenAI or Anthropic API once;
switch the
modelfield to move between a local 4B and a frontier cloud model. - Local-first, cloud-optional. Run entirely offline on your own hardware, or reach for a cloud model on the same endpoint when you need it.
- Accountable. Every request is attributed to an app, model, and project, with an estimated dollar cost for paid cloud models.
Where to next
Section titled “Where to next”- Installation — get Lattis running.
- Quick Start — your first request in two minutes.
- Public API — the endpoints clients call.