Last updated: April 27, 2026
Who is cipherfox?
I'm an AI beginner turned infrastructure tinkerer. Not a researcher. Not a machine learning engineer. Early adopter by temperament: I was on Nostr before it had a mobile app, running AI tools before they had reliable UIs. The conviction driving this is straightforward. AI is as disruptive as anything I've seen, and the window to understand it from the inside closes faster than most people expect. Curiosity and the willingness to go through thick and thin, even when it gets tedious, are what got me here and what keep me going.
My background is in using AI, not building it. A year ago I couldn't have told you what a CUDA kernel was. Today I'm debugging SGLang internals, patching OpenHands agents, and running a 119-billion-parameter model on a box that sits on my desk.
That's the journey this blog documents: from zero to sovereign, one broken config at a time. I'm not at the end of it. Honestly, I'm not certain I'll get there. The complexity keeps growing faster than I can tame it. But I'm writing it down as I go, and the learning has been worth the frustration.
Why self-host?
It started with a 4 AM panic. A cloud API bill had doubled overnight. My AI assistant was offline because a provider rotated keys without warning. My private project notes were sitting on someone else's server.
That was the last month I paid for cloud AI.
The panic was the trigger. The real reason runs deeper. I've been an early adopter long enough to recognize when something is about to change everything. The pattern is always the same: spot something disruptive early, build familiarity while the internals are still visible, before the abstraction layer hides them. Self-hosting keeps that window open. No service dependency between me and understanding why something works.
Sovereign AI isn't a political stance. It's an engineering decision. When you build with AI every day, you generate sensitive context: architecture notes, half-finished code, system configs. Keeping that on your own hardware isn't exotic. It's just obvious once you've thought about it for five minutes.
The privacy angle is a side effect of the reliability angle. Both matter.
What I actually run
This is the real stack. Not aspirational, not a tutorial setup. What's running right now:
Claude vs. local stack: current state
Four tools in daily use. Claude Code (cloud, Anthropic) handles architecture decisions, multi-file refactors, and anything requiring sustained reasoning across large context. OpenClaw is the newer hybrid: a local agent with a Matrix bridge that can swap between Anthropic models and the local Mistral mid-session, and which has direct access to MCP tools (blog search, SGLang diagnostics). Vibe is a local CLI assistant running Mistral Small 4 119B at 35–41 tok/s via SGLang, used for privacy-sensitive work where nothing should touch a cloud API. ComfyUI + FLUX.1-schnell generates all blog hero images on-device.
The gap between cloud and local is narrower than it was six months ago. Claude still leads on complex multi-step reasoning. For focused single-task work (writing, summarizing, or generating code from a clear spec), Vibe is competitive and costs nothing per token. OpenClaw splits the difference by letting the same session pick the right model for each turn.
April 2026 · Mistral Small 4 119B NVfp4 · ~35 tok/s · FLUX.1-schnell via ComfyUI · sovereign-mcp on port 8002
Every article is scored for content quality against the same criteria Google uses. See the full breakdown: Content Insights →
Measured: throughput numbers for Mistral Small 4 on GB10 are in the SGLang setup article. Real numbers, not marketing specs.
How this actually gets built
Articles on this blog are written by Mistral Small 4. The pipeline was designed with Claude Code. The prompts were shaped with Claude's help. That's a deliberate two-layer stack: Claude handles architecture and meta-work, Mistral handles content execution at zero marginal cost per article once the infrastructure runs.
- Session memory
Mistral has no session persistence. The fix is
VIBE.md: a markdown file committed to the project, pasted at session start. It covers architecture decisions, known bugs, open tasks, and active workarounds — updated by the pipeline after every run. Claude sessions use a companionBRIEFING.md. Context injection beats re-derivation every time. - Style drift Left to defaults, all four article styles produced structurally identical output — conclusion articles with code blocks, setup articles without specificity. The fix is config-driven: per-style code policy, max section count, and section style injected into every prompt. The model follows explicit constraints when they're explicit enough.
- Image repetition FLUX.1-schnell defaults to the same metaphors regardless of topic — overflowing glass, lone figure at a desk — even when the prompt explicitly bans them. Negative instructions don't override strong prior distributions. What works: redirecting to a different visual domain per style, plus a rolling motif blacklist so the pipeline can't loop between sessions.
- Quality gate Every article is scored against a style-aware weighted composite before it publishes. Four styles — Best Practice, Code Deep Dive, Strategy, Infotainment — each with their own signal weights and minimum threshold. The score grows with concrete signals (version refs, file paths, error lines, defined terms) and shrinks with AI slop (filler phrases, hedging). Articles below threshold get a second Mistral pass with targeted gap feedback.
- Affiliate model Two sponsors. Alby for Lightning payments — the wallet I use daily for V4V tips and micro-transactions. BitBox02 for cold storage — Swiss-made, open-source hardware, no KYC. Both match the stack: local, privacy-first, no subscriptions. Referral links are marked as such. No sponsored content, no hidden pitches.
- Entry point
All pipeline operations — writing, image generation, optimization, build — run through
python3 scripts/master.py. It dispatches to individual scripts with explanations of what each does and when to use it. There's also a desktop GUI (sovereign_dashboard.py) with live output streaming.
New to the stack? The Sovereign AI Grid hub article covers the hardware, inference setup, knowledge base architecture, and what's being built next. Eight articles linked in reading order.
What's not working (yet)
Honest list. Things I'm working around or haven't fixed:
- flashinfer on GB10: OOM on first batch. Workaround:
--attention-backend triton. SM121A isn't supported in the flashinfer build that ships with this nightly tag. - Sequential-only GPU services: SGLang, Voxtral and ComfyUI cannot share the 128 GB unified pool. Coordination via the dashboard with a 60-second guard between transitions. Manual context-switching cost for the operator hasn't been fully eliminated.
- OpenHands autonomous tasks: crashes on long multi-step runs. Usable for short tasks. Aider, OpenClaw or Claude Code for anything requiring more than a few sequential steps.
- OpenClaw streaming watchdog reset: when switching to a new Anthropic model mid-session, the first stream sometimes drops at the 30s watchdog. Workaround: send a new message, the next stream resyncs.
- Hero image diversity: FLUX.1-schnell defaults to recurring visual metaphors across articles. The pipeline now uses per-style visual vocabularies and a rolling motif blacklist to break the pattern. Improvement is ongoing.
- reasoning_tokens reporting: SGLang always reports
reasoning_tokens: 0even when reasoning is active. The reasoning_content field in the response is populated correctly. Known SGLang reporting bug, not a model issue. - MCP knowledge base sync:
sovereign-mcpreads its KB at process start. After publishing new articles the MCP server needs a restart to pick them up. Dashboard exposes a one-click restart, but a post-build hook would be cleaner.
Why sovgrid.org?
The name comes from Sovereign AI Grid, the wider project this blog is part of. A network of self-hosted AI services, knowledge tools, and infrastructure built to work without cloud APIs or third-party accounts.
.org was the deliberate choice. archlinux.org, yunohost.org,
nextcloud.org, the entire self-hosting tradition lives there. This blog is a
mission, not a business front, and the TLD should reflect that.
Hosted with Flokinet, paid in bitcoin, no KYC. Domain registered with the same provider, WHOIS privacy enabled by default.
Support
Four options. All optional. None require an account. Confused? · All details on /support/
Lightning Zap
Value-for-Value: pay what you think it was worth, after you've read it. A sat or two says "this saved me time." There's a zap button in the footer.
Alby Referral
The Lightning wallet I use daily for V4V tips and micro-transactions. Sign up through my referral link and I get a small bonus. No price change.
BitBox02 Referral
Swiss-made hardware wallet. Open-source firmware and hardware. No KYC. What I use to cold-store Bitcoin off Alby. Buy through my link, same price.
Flokinet Referral
Privacy-first VPS and domain registrar. No KYC, bitcoin accepted. The host this site runs on. Sign up through my link, same price for you.