Local Models
Lattis serves local models through two engines: a bundled llama-server
(llama.cpp) for GGUF models, and an optional mlx_lm.server for MLX
models on Apple Silicon.
GGUF models (llama-server router)
Section titled “GGUF models (llama-server router)”Lattis runs llama-server in router mode as a child process. The router
keeps several models resident at once and swaps the least-recently-used out
when you exceed the resident limit, so switching models is fast.
Download from the Library
Section titled “Download from the Library”In the app’s Library, pick a model and download it. Weights are pulled directly from Hugging Face — no API token required for the curated catalogue. Each model lives in its own folder under the data directory (see Configuration & Storage).
You can also add a custom model by providing one or more direct GGUF URLs.
Resident set & swapping
Section titled “Resident set & swapping”The router loads a model on first use and keeps it resident. The maximum number of resident models is configurable; beyond it, the least-recently-used model is evicted. The set that was resident at shutdown is restored on the next start, so the system comes back into the same state.
MLX models (Apple Silicon)
Section titled “MLX models (Apple Silicon)”If Apple’s mlx_lm is installed, Lattis
serves MLX models through an mlx_lm.server child process, Metal-accelerated.
MLX is detected at startup — when it isn’t available, MLX models are simply hidden.
Today a single MLX model is served at a time; loading another swaps it.
See Installation for setup and how Lattis locates a suitable Python interpreter.
Serving controls
Section titled “Serving controls”You can drive the router from the Control API (the GUI uses the same endpoints):
# Load a model into the routercurl -X POST http://127.0.0.1:1234/control/load \ -H 'Content-Type: application/json' -d '{"id":"qwen3-4b-instruct-2507"}'
# Unload itcurl -X POST http://127.0.0.1:1234/control/unload \ -H 'Content-Type: application/json' -d '{"id":"qwen3-4b-instruct-2507"}'Local and connected-cloud models appear together in GET /v1/models. Use a
model by passing its id as the model field on any request — see the
Public API.