goose vs vibe vs opencode: Picking a Local Coding CLI for a Sovereign vLLM Stack (2026)
A local coding-agent CLI is the part of a self-hosted stack you touch most: it is the thing that actually edits your code against your own model. The cloud names (Claude Code, Cursor) are covered everywhere. The self-hostable ones that point at your vLLM endpoint instead of someone’s API are not, so here is the honest three-way for the tools I actually run on a single DGX Spark, as of June 2026: opencode, goose, and vibe. The short version: opencode is the daily driver, goose is the backup, and vibe is retired.
First, two definitions, because the whole comparison rests on them. A coding-agent CLI is a terminal program that takes a task in plain English, reads and edits files in your repo, runs commands, and loops until the task is done, driving a language model to decide each step. MCP (Model Context Protocol) is the open standard that lets that CLI call external tools, a documentation search, a database, a knowledge base, over a uniform interface instead of bespoke glue per tool. All three CLIs here speak the OpenAI-compatible HTTP API, which is why they can point at a local vLLM server instead of a cloud endpoint. That single fact is what makes a sovereign coding loop possible at all.
By the numbers: what each CLI actually drives
The CLI is only half the system. The other half is the model it talks to, and on one DGX Spark the models run under a mutex: exactly one holds the GB10’s 128GB of unified memory at a time, because two 30B-class models do not fit together. So the realistic question is not “which CLI is fastest” (they all just stream tokens from the server) but “which CLI cleanly retargets whichever model is currently resident”. Here is the rotation each tool has to handle, with single-stream decode measured on my box:
| Engine | Port | Role | Decode | Context |
|---|---|---|---|---|
| Qwen3.6-35B | 30001 | primary, general + vision | 69.5 tok/s | 262144 |
| GLM-4.7-Flash | 30002 | coding specialist | 53.7 tok/s | 65536 |
| Gemma-4-26B-A4B (MoE) | 30004 | reasoning | 53.7 tok/s | 65536 |
A CLI that hardcodes one base URL or one model name cannot serve this. It has to switch endpoint and model id together, every time the mutex swaps. That requirement, more than any feature list, is what sorted the three tools.
Verdict at a glance
| opencode | goose | vibe | |
|---|---|---|---|
| Maker / licence | opencode (open source) | Block (Apache-2.0) | small project, Mistral-era |
| Backend | any OpenAI-compatible (points at vLLM) | any OpenAI-compatible + 15+ providers | tied to the Mistral/SGLang backend |
| MCP | yes | yes (native “extensions”) | via per-tool wrappers |
| On my stack | primary | backup | retired |
| Why | mobile-friendly (Termux+tmux), AGENTS.md contract, just works against Qwen | maintained by a serious, Bitcoin-aligned org; MCP-native; permissive licence | its reason to exist left with the Mistral/SGLang stack |
Why opencode is the primary
opencode earns the daily-driver seat for boring, correct reasons. It speaks the OpenAI-compatible API, so pointing it at a local vLLM model is one provider block (baseURL: http://localhost:30001/v1). It honors an AGENTS.md contract per repo, which is the file where a repo declares its conventions to any agent, and that is how a multi-agent grid keeps several coders from clobbering each other. And the path that actually survived contact with daily use was the simple one: Termux plus tmux plus the opencode CLI over SSH, not a browser UI, because a phone over SSH is the device I actually have on me. It drives whichever model the mutex has resident, Qwen on port 30001 by default at 69.5 tok/s, and has no strict-alternation quirks to patch around. That last point matters more than it sounds: the previous backup, vibe, existed mostly to paper over one such quirk, and opencode simply does not have the bug.
Why goose, and why it replaced vibe
When the backup slot opened up, goose won it over keeping vibe, on three axes that matter for a sovereign stack:
- Maintenance and licence. goose is built by Block (the company behind Square and Cash App), Apache-2.0, with frequent releases and a real extension ecosystem. vibe was a smaller, Mistral-era project. On a stack you intend to run for years, “who patches this in 2027” is a real question, and a well-resourced open-source project beats a thin one.
- Values fit. This is a Bitcoin-only, no-KYC grid (it is the whole point of the site), and earlier I rejected otherwise-capable models on values grounds whose backers were Solana or crypto-VC. goose comes from one of the most openly Bitcoin-aligned companies in tech, which is the opposite problem to have. Sovereign and open are separate axes; so is who you take your tools from.
- MCP-native and backend-agnostic. goose treats MCP servers as first-class “extensions” and speaks any OpenAI-compatible endpoint, so it wires into the same
knowledgeandsovereignMCP servers and the same vLLM ports as everything else.
vibe’s retirement was not a knock on the tool, it was structural. vibe was built around the Mistral chat backend; its most-used patch was a workaround for Mistral’s strict role-alternation (the exact bug I also reported to OpenHands). When the grid retired Mistral and SGLang entirely and moved to Qwen, GLM-4.7-Flash and Gemma-4, none of which enforce strict alternation, vibe’s whole reason to exist went with them. Keeping two backup coders made no sense; goose is the better-maintained one.
The one gotcha each (the part the READMEs skip)
- opencode: define each local model as its own provider (
local-qwen,local-glm,local-gemma) pointing at the right vLLM port. Because the models are mutex-swapped, you pick the provider that matches whatever is currently resident. - goose: it does not infer your model’s context window. For an unknown model id it silently defaults to 128k, which quietly truncates a 262k-context Qwen. Pin it:
GOOSE_CONTEXT_LIMIT: 262144inconfig.yaml. And goose registers custom providers only through its keyring (interactivegoose configure), not fromconfig.yaml, so for a multi-port mutex setup the clean path is its built-inopenaiprovider plus a tiny wrapper that setsOPENAI_HOSTandGOOSE_MODELper engine. - vibe: n/a, retired. If you are still on it for a Mistral backend, the strict-alternation patch is the one to keep.
That goose mutex wrapper, concretely, is about fifteen lines: pin the context, point the built-in provider at the resident engine’s port, and switch the model name to match.
# goose-llm <qwen|glm|gemma>: point goose at the mutex-resident vLLM model
declare -A PORT=( [qwen]=30001 [glm]=30002 [gemma]=30003 )
declare -A MODEL=( [qwen]=qwen3.6-35b [glm]=glm-4.7-flash [gemma]=gemma4-26b )
declare -A CTX=( [qwen]=262144 [glm]=65536 [gemma]=65536 )
E="$1"
export GOOSE_PROVIDER=openai
export GOOSE_MODEL="${MODEL[$E]}"
export OPENAI_HOST="http://localhost:${PORT[$E]}"
export GOOSE_CONTEXT_LIMIT="${CTX[$E]}" # else goose caps an unknown model at 128k
exec goose "${@:2}"
The reason this beats goose’s own custom-provider mechanism here: those live in the keyring (you add them through the interactive goose configure), so they cannot be version-controlled or scripted for a three-port mutex. The built-in openai provider plus environment variables can. opencode, by contrast, takes a plain provider block per model in its JSON config, which is why each engine gets its own local-qwen / local-glm / local-gemma entry there.
The honest caveats
This is a fit-for-my-stack verdict tested in June 2026, not a benchmark shootout, so be clear about what it does not cover. I did not score the three CLIs on a fixed coding suite like SWE-bench, because the variable that dominated my experience was retargeting the mutex, not raw edit accuracy, and on accuracy the model matters far more than the harness. I did not run goose’s browser or desktop modes, only the CLI over SSH, because the headless terminal path is the only one a sovereign box needs. And the context caveat is not hypothetical: goose defaulted an unknown model id to 128k and silently truncated Qwen’s 262144-token window in my first run, which produced quietly wrong answers on long files until I pinned the limit. That class of failure, a silent default rather than a loud error, is the one to watch for when any of these tools meets a model it does not recognize.
One more honest note on vibe, since retiring a working tool deserves a reason and not a shrug. vibe was not broken. It was orphaned by an architecture change, which is a different and more common way tools die on a long-lived stack. Tools rarely fail; they stop fitting.
Do I run them myself?
Yes, both, daily: opencode primary, goose as the backup coder, against local Qwen / GLM / Gemma on one DGX Spark, no cloud API in the loop. vibe is gone. If you only set up one, make it opencode; add goose when you want a second opinion from a different agent harness on the same models.