Who Why Stack Cloud vs Local How it's built Open Issues Name Support

Last updated: April 27, 2026

Who is cipherfox?

I'm an AI beginner turned infrastructure tinkerer. Not a researcher. Not a machine learning engineer. Early adopter by temperament: I was on Nostr before it had a mobile app, running AI tools before they had reliable UIs. The conviction driving this is straightforward. AI is as disruptive as anything I've seen, and the window to understand it from the inside closes faster than most people expect. Curiosity and the willingness to go through thick and thin, even when it gets tedious, are what got me here and what keep me going.

My background is in using AI, not building it. A year ago I couldn't have told you what a CUDA kernel was. Today I'm debugging SGLang internals, patching OpenHands agents, and running a 119-billion-parameter model on a box that sits on my desk.

That's the journey this blog documents: from zero to sovereign, one broken config at a time. I'm not at the end of it. Honestly, I'm not certain I'll get there. The complexity keeps growing faster than I can tame it. But I'm writing it down as I go, and the learning has been worth the frustration.

Why self-host?

It started with a 4 AM panic. A cloud API bill had doubled overnight. My AI assistant was offline because a provider rotated keys without warning. My private project notes were sitting on someone else's server.

That was the last month I paid for cloud AI.

The panic was the trigger. The real reason runs deeper. I've been an early adopter long enough to recognize when something is about to change everything. The pattern is always the same: spot something disruptive early, build familiarity while the internals are still visible, before the abstraction layer hides them. Self-hosting keeps that window open. No service dependency between me and understanding why something works.

Sovereign AI isn't a political stance. It's an engineering decision. When you build with AI every day, you generate sensitive context: architecture notes, half-finished code, system configs. Keeping that on your own hardware isn't exotic. It's just obvious once you've thought about it for five minutes.

The privacy angle is a side effect of the reliability angle. Both matter.

What I actually run

This is the real stack. Not aspirational, not a tutorial setup. What's running right now:

Component
What
Honest take
Hardware
NVIDIA DGX Spark (GB10, 128GB unified)
Genuinely impressive. Unified memory changes what's possible at desk scale.
Model
Mistral Small 4 119B NVfp4 + EAGLE
35–41 tok/s single-stream with EAGLE. 12–15 tok/s without it.
Inference
SGLang nightly-dev-cu13 + SGLANG_ENABLE_SPEC_V2
triton backend only. flashinfer OOMs on GB10 (SM121A not yet supported).
TTS
Voxtral 4B-TTS-2603 via vllm-omni
On-demand for podcast generation. Cannot share GPU pool with SGLang or ComfyUI.
Image gen
ComfyUI + FLUX.1-schnell
Hero images for every article. Same memory pool, same one-at-a-time rule.
Containers
Docker + custom compose stacks
Works well. Volume mounts survive rebuilds without re-downloading weights.
Code assist
Claude Code + OpenClaw + Vibe + Aider
Claude for architecture. OpenClaw for Matrix + cloud/local model swap. Vibe for privacy-sensitive CLI work. Aider for autonomous file edits.
MCP layer
sovereign-mcp (port 8002, FastMCP)
Local MCP server. Exposes search_blog, get_article, diagnose_sglang to OpenClaw and Vibe. Agents stop re-asking what was documented.
Dashboard
FastAPI + React (single-file)
Self-hosted SecOps dashboard. Service start/stop, AIDE alerts, backup status, MCP restart. Tor and Tailscale-reachable.
Git
Gitea (self-hosted)
Zero issues. No telemetry, no rate limits, no outage risk.
Blog
Astro 5 + nginx:alpine
Static. No database. Full rebuild in under 3s.
Pipeline
Mistral Small 4 + ComfyUI FLUX.1-schnell
Converts raw engineering notes → published articles + hero images. Zero cloud cost.
Payments
Lightning (Alby)
V4V model. No accounts, no platform cut, no subscription.
Tunnel
Cloudflare cloudflared
Free tier. Handles TLS termination without exposing home IP.
Search
SearXNG (127.0.0.1:8888)
Powers web research in the article pipeline. No query logging to third parties.

Claude vs. local stack: current state

Four tools in daily use. Claude Code (cloud, Anthropic) handles architecture decisions, multi-file refactors, and anything requiring sustained reasoning across large context. OpenClaw is the newer hybrid: a local agent with a Matrix bridge that can swap between Anthropic models and the local Mistral mid-session, and which has direct access to MCP tools (blog search, SGLang diagnostics). Vibe is a local CLI assistant running Mistral Small 4 119B at 35–41 tok/s via SGLang, used for privacy-sensitive work where nothing should touch a cloud API. ComfyUI + FLUX.1-schnell generates all blog hero images on-device.

The gap between cloud and local is narrower than it was six months ago. Claude still leads on complex multi-step reasoning. For focused single-task work (writing, summarizing, or generating code from a clear spec), Vibe is competitive and costs nothing per token. OpenClaw splits the difference by letting the same session pick the right model for each turn.

Task
Claude
Vibe (local)
Gap
Architecture decisions
✅ Strong
⚠ Inconsistent
Large
Multi-file refactors
✅ Strong
⚠ Loses context
Large
Code generation
✅ Strong
✅ Good
Small
Debugging multi-step
✅ Strong
⚠ Misses context
Medium
Article writing
✅ Strong
✅ Usable
Small
Quick Q&A / lookup
✅ Strong
✅ Fast, competitive
Small
Tool use (MCP)
✅ Strong
✅ stdio MCP
Small
Image generation
❌ Not available
✅ FLUX.1 on-device
Local privacy
❌ Cloud API
✅ On-device
Cost per session
💸 Per token
✅ Free after setup
Availability
⚠ API dependency
✅ Always on

April 2026 · Mistral Small 4 119B NVfp4 · ~35 tok/s · FLUX.1-schnell via ComfyUI · sovereign-mcp on port 8002

How this actually gets built

Articles on this blog are written by Mistral Small 4. The pipeline was designed with Claude Code. The prompts were shaped with Claude's help. That's a deliberate two-layer stack: Claude handles architecture and meta-work, Mistral handles content execution at zero marginal cost per article once the infrastructure runs.

What's not working (yet)

Honest list. Things I'm working around or haven't fixed:

Why sovgrid.org?

The name comes from Sovereign AI Grid, the wider project this blog is part of. A network of self-hosted AI services, knowledge tools, and infrastructure built to work without cloud APIs or third-party accounts.

.org was the deliberate choice. archlinux.org, yunohost.org, nextcloud.org, the entire self-hosting tradition lives there. This blog is a mission, not a business front, and the TLD should reflect that.

Hosted with Flokinet, paid in bitcoin, no KYC. Domain registered with the same provider, WHOIS privacy enabled by default.

Support

Four options. All optional. None require an account. Confused? · All details on /support/

Lightning Zap

Value-for-Value: pay what you think it was worth, after you've read it. A sat or two says "this saved me time." There's a zap button in the footer.

🔑

Alby Referral

The Lightning wallet I use daily for V4V tips and micro-transactions. Sign up through my referral link and I get a small bonus. No price change.

🔒

BitBox02 Referral

Swiss-made hardware wallet. Open-source firmware and hardware. No KYC. What I use to cold-store Bitcoin off Alby. Buy through my link, same price.

🌐

Flokinet Referral

Privacy-first VPS and domain registrar. No KYC, bitcoin accepted. The host this site runs on. Sign up through my link, same price for you.