v1.7.0 — tool-call approval gate & VRAM headroom control

Run any local LLM engine,
auto-tuned to your GPU

Bring your own llama.cpp fork. No compiling. No Electron. No Python. Point Claude Code at your own machine in one command — fully offline.

Get Started GitHub

2.2×

Faster than other inferences

~2MB

npm package size

Engine types supported

Telemetry collected

Any engine, including forks

Point it at any llama.cpp-compatible binary — a build you compiled, a community fork, or the one it auto-provisions for your GPU. The fastest community innovations land in forks first.

Auto-tuned to your hardware

Benchmarks on load, derives fast defaults, and shows a VRAM-fit verdict before you load — no more flag guessing.

Real tokens/sec, never faked

Speed in the model list is measured on your machine from actual generation — live while you chat, and remembered per model.

Drop-in APIs

OpenAI and Anthropic-compatible — so Claude Code and every existing tool work unchanged.

Offline-first & private

No account, no backend, no internet, no telemetry. Your prompts, chats, files, and keys never leave your machine.

Use from any device

The UI runs in the browser, so any phone, tablet, or laptop on your LAN can use the model on your GPU box.

Qwen3.6-35B-A3B · 200K	TurboLLM	LM Studio	Speed-up
official llama.cpp — q4_0	74.7 t/s	61.0 t/s	1.2×
official llama.cpp — q8_0	72.3 t/s	~66 t/s	1.1×
TurboQuant fork — turbo4	24.6 t/s	11.4 t/s	2.2×

How TurboLLM works

Clients connect to one lightweight daemon, which runs any engine on your GPU. The daemon serves OpenAI and Anthropic-compatible APIs, so any tool can talk to your local models.

	TurboLLM	LM Studio	Ollama	Open WebUI
Run any engine / forks	✓	✗	✗	✗
Benchmark-based auto-tune	✓	◐	◐	✗
Measured t/s in model list	✓	◐	◐	✗
Anthropic API → Claude Code	✓	✓	✓	✗
OpenAI-compatible API	✓	✓	✓	◐
Lightweight (no Electron / Python)	✓	✗	✓	✗
Offline-first, no telemetry	✓	◐	✓	✓

Get started in one command

No installation, no setup. Just run it.

npx turbollm

Or install globally: npm install -g turbollm

Run any local LLM engine,
auto-tuned to your GPU

Why TurboLLM

Any engine, including forks

Auto-tuned to your hardware

Real tokens/sec, never faked

Drop-in APIs

Offline-first & private

Use from any device

Speed: TurboLLM vs LM Studio

How TurboLLM works

How TurboLLM compares

Get started in one command

Run any local LLM engine, auto-tuned to your GPU

Why TurboLLM

Any engine, including forks

Auto-tuned to your hardware

Real tokens/sec, never faked

Drop-in APIs

Offline-first & private

Use from any device

Speed: TurboLLM vs LM Studio

How TurboLLM works

How TurboLLM compares

Get started in one command

Run any local LLM engine,
auto-tuned to your GPU