About | Sovereign AI Blog

Last updated: April 27, 2026

Who is cipherfox?

I'm an AI beginner turned infrastructure tinkerer. Not a researcher. Not a machine learning engineer. Early adopter by temperament: I was on Nostr before it had a mobile app, running AI tools before they had reliable UIs. The conviction driving this is straightforward. AI is as disruptive as anything I've seen, and the window to understand it from the inside closes faster than most people expect. Curiosity and the willingness to go through thick and thin, even when it gets tedious, are what got me here and what keep me going.

My background is in using AI, not building it. A year ago I couldn't have told you what a CUDA kernel was. Today I'm debugging SGLang internals, patching OpenHands agents, and running a 119-billion-parameter model on a box that sits on my desk.

That's the journey this blog documents: from zero to sovereign, one broken config at a time. I'm not at the end of it. Honestly, I'm not certain I'll get there. The complexity keeps growing faster than I can tame it. But I'm writing it down as I go, and the learning has been worth the frustration.

Why self-host?

It started with a 4 AM panic. A cloud API bill had doubled overnight. My AI assistant was offline because a provider rotated keys without warning. My private project notes were sitting on someone else's server.

That was the last month I paid for cloud AI.

The panic was the trigger. The real reason runs deeper. I've been an early adopter long enough to recognize when something is about to change everything. The pattern is always the same: spot something disruptive early, build familiarity while the internals are still visible, before the abstraction layer hides them. Self-hosting keeps that window open. No service dependency between me and understanding why something works.

Sovereign AI isn't a political stance. It's an engineering decision. When you build with AI every day, you generate sensitive context: architecture notes, half-finished code, system configs. Keeping that on your own hardware isn't exotic. It's just obvious once you've thought about it for five minutes.

The privacy angle is a side effect of the reliability angle. Both matter.

What I actually run

This is the real stack. Not aspirational, not a tutorial setup. What's running right now:

Component

What

Honest take

Hardware

NVIDIA DGX Spark (GB10, 128GB unified)

Genuinely impressive. Unified memory changes what's possible at desk scale.

Model

Mistral Small 4 119B NVfp4 + EAGLE

35–41 tok/s single-stream with EAGLE. 12–15 tok/s without it.

Inference

SGLang nightly-dev-cu13 + SGLANG_ENABLE_SPEC_V2

triton backend only. flashinfer OOMs on GB10 (SM121A not yet supported).

TTS

Voxtral 4B-TTS-2603 via vllm-omni

On-demand for podcast generation. Cannot share GPU pool with SGLang or ComfyUI.

Image gen

ComfyUI + FLUX.1-schnell

Hero images for every article. Same memory pool, same one-at-a-time rule.

Containers

Docker + custom compose stacks

Works well. Volume mounts survive rebuilds without re-downloading weights.

Code assist

Claude Code + OpenClaw + Vibe + Aider

Claude for architecture. OpenClaw for Matrix + cloud/local model swap. Vibe for privacy-sensitive CLI work. Aider for autonomous file edits.

MCP layer

sovereign-mcp (port 8002, FastMCP)

Local MCP server. Exposes search_blog, get_article, diagnose_sglang to OpenClaw and Vibe. Agents stop re-asking what was documented.

Dashboard

FastAPI + React (single-file)

Self-hosted SecOps dashboard. Service start/stop, AIDE alerts, backup status, MCP restart. Tor and Tailscale-reachable.

Git

Gitea (self-hosted)

Zero issues. No telemetry, no rate limits, no outage risk.

Blog

Astro 5 + nginx:alpine

Static. No database. Full rebuild in under 3s.

Pipeline

Mistral Small 4 + ComfyUI FLUX.1-schnell

Converts raw engineering notes → published articles + hero images. Zero cloud cost.

Payments

Lightning (Alby)

V4V model. No accounts, no platform cut, no subscription.

Tunnel

Cloudflare cloudflared

Free tier. Handles TLS termination without exposing home IP.

SearXNG (127.0.0.1:8888)

Powers web research in the article pipeline. No query logging to third parties.

Claude vs. local stack: current state

Four tools in daily use. Claude Code (cloud, Anthropic) handles architecture decisions, multi-file refactors, and anything requiring sustained reasoning across large context. OpenClaw is the newer hybrid: a local agent with a Matrix bridge that can swap between Anthropic models and the local Mistral mid-session, and which has direct access to MCP tools (blog search, SGLang diagnostics). Vibe is a local CLI assistant running Mistral Small 4 119B at 35–41 tok/s via SGLang, used for privacy-sensitive work where nothing should touch a cloud API. ComfyUI + FLUX.1-schnell generates all blog hero images on-device.

The gap between cloud and local is narrower than it was six months ago. Claude still leads on complex multi-step reasoning. For focused single-task work (writing, summarizing, or generating code from a clear spec), Vibe is competitive and costs nothing per token. OpenClaw splits the difference by letting the same session pick the right model for each turn.

Task

Claude

Vibe (local)

Gap

Architecture decisions

✅ Strong

⚠ Inconsistent

Large

Multi-file refactors

✅ Strong

⚠ Loses context

Large

Code generation

✅ Strong

✅ Good

Small

Debugging multi-step

✅ Strong

⚠ Misses context

Medium

Article writing

✅ Strong

✅ Usable

Small

Quick Q&A / lookup

✅ Strong

✅ Fast, competitive

Small

Tool use (MCP)

✅ Strong

✅ stdio MCP

Small

Image generation

❌ Not available

✅ FLUX.1 on-device

—

Local privacy

❌ Cloud API

✅ On-device

—

Cost per session

💸 Per token

✅ Free after setup

—

Availability

⚠ API dependency

✅ Always on

—

April 2026 · Mistral Small 4 119B NVfp4 · ~35 tok/s · FLUX.1-schnell via ComfyUI · sovereign-mcp on port 8002

Every article is scored for content quality against the same criteria Google uses. See the full breakdown: Content Insights →

Measured: throughput numbers for Mistral Small 4 on GB10 are in the SGLang setup article. Real numbers, not marketing specs.

How this actually gets built

Articles on this blog are written by Mistral Small 4. The pipeline was designed with Claude Code. The prompts were shaped with Claude's help. That's a deliberate two-layer stack: Claude handles architecture and meta-work, Mistral handles content execution at zero marginal cost per article once the infrastructure runs.

Session memory Mistral has no session persistence. The fix is VIBE.md: a markdown file committed to the project, pasted at session start. It covers architecture decisions, known bugs, open tasks, and active workarounds — updated by the pipeline after every run. Claude sessions use a companion BRIEFING.md. Context injection beats re-derivation every time.
Style drift Left to defaults, all four article styles produced structurally identical output — conclusion articles with code blocks, setup articles without specificity. The fix is config-driven: per-style code policy, max section count, and section style injected into every prompt. The model follows explicit constraints when they're explicit enough.
Image repetition FLUX.1-schnell defaults to the same metaphors regardless of topic — overflowing glass, lone figure at a desk — even when the prompt explicitly bans them. Negative instructions don't override strong prior distributions. What works: redirecting to a different visual domain per style, plus a rolling motif blacklist so the pipeline can't loop between sessions.
Quality gate Every article is scored against a style-aware weighted composite before it publishes. Four styles — Best Practice, Code Deep Dive, Strategy, Infotainment — each with their own signal weights and minimum threshold. The score grows with concrete signals (version refs, file paths, error lines, defined terms) and shrinks with AI slop (filler phrases, hedging). Articles below threshold get a second Mistral pass with targeted gap feedback.
Affiliate model Two sponsors. Alby for Lightning payments — the wallet I use daily for V4V tips and micro-transactions. BitBox02 for cold storage — Swiss-made, open-source hardware, no KYC. Both match the stack: local, privacy-first, no subscriptions. Referral links are marked as such. No sponsored content, no hidden pitches.
Entry point All pipeline operations — writing, image generation, optimization, build — run through python3 scripts/master.py. It dispatches to individual scripts with explanations of what each does and when to use it. There's also a desktop GUI (sovereign_dashboard.py) with live output streaming.

New to the stack? The Sovereign AI Grid hub article covers the hardware, inference setup, knowledge base architecture, and what's being built next. Eight articles linked in reading order.

What's not working (yet)

Honest list. Things I'm working around or haven't fixed:

flashinfer on GB10: OOM on first batch. Workaround: --attention-backend triton. SM121A isn't supported in the flashinfer build that ships with this nightly tag.
Sequential-only GPU services: SGLang, Voxtral and ComfyUI cannot share the 128 GB unified pool. Coordination via the dashboard with a 60-second guard between transitions. Manual context-switching cost for the operator hasn't been fully eliminated.
OpenHands autonomous tasks: crashes on long multi-step runs. Usable for short tasks. Aider, OpenClaw or Claude Code for anything requiring more than a few sequential steps.
OpenClaw streaming watchdog reset: when switching to a new Anthropic model mid-session, the first stream sometimes drops at the 30s watchdog. Workaround: send a new message, the next stream resyncs.
Hero image diversity: FLUX.1-schnell defaults to recurring visual metaphors across articles. The pipeline now uses per-style visual vocabularies and a rolling motif blacklist to break the pattern. Improvement is ongoing.
reasoning_tokens reporting: SGLang always reports reasoning_tokens: 0 even when reasoning is active. The reasoning_content field in the response is populated correctly. Known SGLang reporting bug, not a model issue.
MCP knowledge base sync: sovereign-mcp reads its KB at process start. After publishing new articles the MCP server needs a restart to pick them up. Dashboard exposes a one-click restart, but a post-build hook would be cleaner.

Why sovgrid.org?

The name comes from Sovereign AI Grid, the wider project this blog is part of. A network of self-hosted AI services, knowledge tools, and infrastructure built to work without cloud APIs or third-party accounts.

.org was the deliberate choice. archlinux.org, yunohost.org, nextcloud.org, the entire self-hosting tradition lives there. This blog is a mission, not a business front, and the TLD should reflect that.

Hosted with Flokinet, paid in bitcoin, no KYC. Domain registered with the same provider, WHOIS privacy enabled by default.

Support

Four options. All optional. None require an account. Confused? · All details on /support/

⚡

Lightning Zap

Value-for-Value: pay what you think it was worth, after you've read it. A sat or two says "this saved me time." There's a zap button in the footer.

🔑

Alby Referral

The Lightning wallet I use daily for V4V tips and micro-transactions. Sign up through my referral link and I get a small bonus. No price change.

🔒

BitBox02 Referral

Swiss-made hardware wallet. Open-source firmware and hardware. No KYC. What I use to cold-store Bitcoin off Alby. Buy through my link, same price.

🌐

Flokinet Referral

Privacy-first VPS and domain registrar. No KYC, bitcoin accepted. The host this site runs on. Sign up through my link, same price for you.