Introduction

Lattis is a desktop manager and gateway for large language models. It serves a single local HTTP endpoint that speaks both the OpenAI and Anthropic APIs, so existing SDKs and tools work unchanged when pointed at localhost.

Behind that one endpoint, Lattis routes each request to either:

Local models — GGUF models you download from Hugging Face, served by a bundled llama-server (llama.cpp) running in router mode, which keeps several models resident and swaps them on demand; or
Cloud models — remote providers (Anthropic, OpenAI / Codex) you connect via OAuth or an API key. Requests are translated between the OpenAI, Anthropic, and Responses formats as needed.

It also tracks per-app, per-model, and per-project token usage and estimated cost, and can run as a background daemon that launches on login.

Two processes

The desktop app is a thin controller for a background daemon. The daemon owns everything that matters; the app just commands and observes it.

Process	Role
`lattisd`	Background service: storage, downloads, `llama-server`, cloud proxy, API.
`lattis`	Desktop controller (iced); talks to the daemon over `localhost`.

The daemon runs independently of the GUI. With launch-on-login enabled it serves the API 24/7; the app only needs to be open to download models, change what is being served, or connect a cloud provider.

Why a gateway

One integration surface. Write against the OpenAI or Anthropic API once; switch the model field to move between a local 4B and a frontier cloud model.
Local-first, cloud-optional. Run entirely offline on your own hardware, or reach for a cloud model on the same endpoint when you need it.
Accountable. Every request is attributed to an app, model, and project, with an estimated dollar cost for paid cloud models.

Where to next

Installation — get Lattis running.
Quick Start — your first request in two minutes.
Public API — the endpoints clients call.