// build year

Timeline

Dated entries, newest first. Where a decision was reversed later, both entries stay: the engineering log doesn't get to retroactively clean up.

First time here? read this first

This page is the project-level narrative spine: the milestones in order, without article-length detail. Each entry deep-links to the full article where one exists.

New to self-hosted AI? Start with the Sovereign AI Stack reference architecture for the why, or the Self-Hosted AI Start Here guide for the how. Topic-specific journeys are on the reading paths.

Jargon terms get a short parenthetical translation; the link is the deep-dive path.

June 2026

A no-KYC cloud fallback joins the stack (2026-06-26). For the few jobs only a frontier model does well, the grid now reaches Claude through ppq.ai^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.}, an OpenAI-compatible proxy paid per query over Bitcoin Lightning with no account, wired as the fallback behind local Qwen: Qwen stays primary and Claude is reached only on failover, with the key in a 0600 file the config never stores in plaintext. The honest accounting of where that helps and where it betrays the on-device privacy premise (a proxy still sees your prompt, encryption protects the pipe not the endpoints) is its own write-up: Frontier AI on Bitcoin. The local coding CLIs were tuned to match, with opencode and goose each carrying all four backends (Qwen, GLM, Gemma, ppq) as real switchable targets, and the whole stack was de-staled and documented end to end for a clean handoff to the local model.
The local LLM lineup is rebuilt (2026-06-25). Mistral (SGLang) and Voxtral are retired entirely, images plus weights, freeing roughly 100 GB, and replaced by a single-resident mutex rotation on the one GB10: Qwen3.6 stays the general and vision primary (about 69 tokens/sec), GLM-4.7-Flash joins for coding (53.7 tokens/sec), and a Gemma-4-26B-A4B FP8 mixture-of-experts for reasoning (53.7 tokens/sec; the dense Gemma-4-31B was measured at about 7 tokens/sec and dropped, same reasoning score at a fraction of the speed), switched one at a time via switch-llm.sh. Bringing the two new models up surfaced two real vLLM bugs on Blackwell sm_121. GLM-4.7-Flash-AWQ (compressed-tensors, MLA attention) boots healthy but dies on the first token, because the kv_b_proj.weight guard from PR #34695 is incomplete; a three-line patch completes it and the model runs at 53.7 tokens/sec. And a fresh vLLM 0.23 nightly regressed Gemma's NVFP4 path with a modelopt tie_weights NotImplementedError, so the grid rolled back to the proven 0.20. The GLM crash turned out to be already tracked upstream (vLLM #43888, fix in PR #43889), so the grid added a confirming report from this sm_121 hardware rather than filing a duplicate; the Gemma regression is tracked as #45543. The same week the long-standing Mistral strict-alternation report filed at OpenHands (#14287) was closed not-planned, now moot since the Mistral backend it described is gone from this stack.
The sovereignty essays ship: a thirteen-part numbered series on owning your own model (2026-06-22). Each essay reads one serious thinker at the source and presses the idea against one lived fact from this stack, steelmanning the strongest objection before answering it. The case is conceded where it is weakest (cost, frontier capability) and built where it is strongest (control). Hub: /philosophy. The series is also the backbone of the in-progress book, Sovereign AI: First Principles.
/learn field guide goes live (2026-06-21). A reference hub of more than 100 self-hosted-AI terms (BLOG-106), each a short definition card with a verify-it-yourself check, auto-linked from article bodies on first mention. Built as a glossary, not a curriculum: the reading order stays the engineering log.
Qwen3.6 text-primary switched from the 4.75-bit PrismaQuant build to Intel AutoRound int4-mixed in production (2026-06-11). Same served name and port, so every client moved over transparently. The swap buys +12.7% decode (about 69 tokens/sec at DFlash k=3, measured prefill-separated) at zero measurable quality cost: 18 out of 18 on the agent-bench coding gate, same as the full-precision-gate PrismaQuant. A calibrated 4.0-bit round beats a careless 4.75-bit one, and the sensitive mixture-of-experts gate layers stay at 16 bit. Full duel: AutoRound int4 vs PrismaQuant.
Per-article Value-for-Value zaps replace the old localStorage reader-vote (2026-06-07). Every article footer now offers a Lightning zap, attributed off public NIP-57 zap receipts (no payer data, no cookies) and surfaced as a sortable column on /insights/, on the reasoning that a vote which costs the sender nothing carries no signal. Decision path: the insights dashboard companion.
Spark-Arena benchmark recipes tested, and mostly did not reproduce (2026-06-07). The leaderboard's 138 and 239 tokens/sec figures for Qwen3.6 did not hold on this box; the tuned DFlash draft model at k=3 (around 71 tokens/sec) stayed as production. Nemotron-3-Super-120B was dropped on a licence check, self-hostable but not OSI-approved open source, a reminder that sovereign and open are two separate axes. Write-up: The Leaderboard Said 239 Tokens a Second. My DGX Spark Said 71.
Honest reach breakdown plus a broad-crawl filter ship on /insights/ (2026-06-06). Distinct human readers and page views across four windows (today / 7d / 30d / all-time), and a single-IP catalogue-sweep rule that caught a crawler reading 205 articles in one hour, which had been 57 percent of that day's "human" traffic. Methodology: Insights Dashboard for a DGX Business.
A second sovereign box, built for a friend on a Lenovo Legion Pro 7 Gen 10 (RTX 5080 Mobile, Blackwell) (2026-06-01). Encrypted Ubuntu, a local Ollama model with a shared second brain, and a playbook for setting someone else up without leaking your own identity. The 24-hour build log: 24 Hours Setting Up a Lenovo Legion Pro 7.

May 2026

watchdocker v0.1.0 released as public open-source (2026-05-30). A bash-native, systemd-timer-driven Watchtower replacement, 14 of 14 smoke tests green, MIT-licensed, later mirrored to GitHub. Write-up: watchdocker: A Bash-Native Successor to Watchtower.
Big 32-article publish from the backlog (2026-05-27). Pre-buy decision tree, four budget tiers, four industry verticals, the comparison quadrilogy, and the new HUB Sovereign AI Stack reference architecture. Site goes from 88 to 120 live articles, reading-paths go from 6 to 9.
Memory-pending-audit cadence instituted (2026-05-25). One audit session uncovered five stale memory claims. Next audits: 2026-08-25, 2026-11-25, 2027-02-25, 2027-05-25. The discipline behind this is in The Engineering Honesty Manifesto (Rule 6).
Newsletter decision: no traditional email list (2026-05-25). NIP-23 long-form posts on Nostr (a decentralised publishing protocol) plus the RSS feed at /rss.xml are the sovereign-native substitute. Reasoning is in What Sovereign Actually Means in 2026 (collecting subscriber emails would be a regression on the sovereignty framework).
BLOG-057 + BLOG-058 shipped (2026-05-25). Google-friendly site-search hint (WebSite SearchAction JSON-LD) and a custom 5xx fallback page via Caddy handle_errors that survives any backend outage.
Astro 5 → 6.3.7 migration (2026-05-24, commit a16ebd0). 30-minute single sitting. Documented in Astro 6 + Caddy: The Static-First AI Blog Stack.
Cloudflared tunnel retired (2026-05-24). sovgrid.org now serves direct Caddy + Let's Encrypt on the Floki VPS, no Cloudflare in the path. Stack moves from 5/6 to 6/6 on the sovereignty framework. Full receipts in Caddy and Cloudflare Tunnel: The Reliability Pattern.
vps-healthcheck v0.1.0 plus sovereign-mcp v1.0.0 distributed as public open-source (2026-05-24). awesome-sysadmin PR #807 submitted; CHANGELOGs, GitHub topics, and CI all set up. The six-week MCP build log is at MCP for Engineers Who Hate Marketing.
/data/config migration to sovereign-ops/infra/ (2026-05-23). Symlink keeps backwards-compat. AGENTS.md (a per-repo contract for AI coding agents) consolidated across 16 Gitea repos. Multi-agent operating discipline becomes first-class infrastructure.
Floki healthcheck script + cron: 12-check daily audit of the public VPS, single Matrix push, JSON sidecar. Public OSS at github.com/cipherfoxie/vps-healthcheck.
NSM aggregator (the analytics that power /insights/) gains distributed-scraper detection. A 614-hit fake-viral spike was filtered out: one user-agent across many /24 subnets is a bot, not a reader. Operator-DSL /16 also moved out of source code into an out-of-repo side-car file (privacy).
Privacy-scrub pass: tailnet identifiers, CGNAT IPs, residential-ISP ranges removed from four blog articles and from agent memory.
Qwen3.6 PrismaQuant becomes text-primary on 2026-05-13 at 57 to 62 tokens/sec with DFlash speculative decoding, a technique where a small model drafts tokens that the large model verifies in parallel. Mistral Small 4 moves to Vision-only and text-fallback. Reasoning: Next Model Choices on DGX Spark. Head-to-head: Mistral / Qwen / GLM-5 on DGX Spark.
EAGLE on Mistral indefinitely deferred. EAGLE is a speculative-decoding scheme; the SGLang nightly regressed and made it slower than the no-EAGLE baseline. Stable safer-eagle restored 2026-05-22. Pattern: EAGLE Content-Dependent Throughput.
VibeVoice / Higgs Audio v2 / IndexTTS-2 spike kicked off after the Voxtral V6 spot-listen showed a ceiling on long-form expressivity. Pivot rationale: Voxtral Capped at 3/10: Picking the Next Open TTS.
opencode-web behind Caddy + Tailscale shipped, then deprecated for daily use in favour of Termux + tmux + opencode CLI. Lesson: the mobile path that worked was the simpler one. Context: Coding-Agent Comparison 2026.
OpenWebUI custom model "Sovereign Qwen" goes live: Qwen3.6 + system prompt + sovereign-kb retrieval + SearXNG web search + an mcpo bridge to the sovereign-mcp server.

April 2026

sovereign-mcp goes live on Floki at mcp.sovgrid.org/self-hosted-ai. Listed on the official MCP registry as org.sovgrid/self-hosted-ai (DNS-auth via ed25519: the registry verifies ownership by checking a cryptographic TXT record on the domain). Setup: Setup: Sovereign MCP. Registry submission: 100/100 on Smithery in 4 Hours.
Floki VPS bring-up: FlokiNET no-KYC EU, Debian 13, Docker CE + Caddy. sovgrid.org goes live with Astro 5. Setup: Floki VPS Setup.
DGX Spark put into daily operation. The SGLang nightly + CUDA 13 SM121A-fix combination identified as the only build that actually compiles for the Blackwell GPU. Setup: Mistral + SGLang Setup.
Mistral Small 4 NVFP4 (a 4-bit floating-point quantisation that lets a 119B-parameter model fit in 128 GB unified memory) runs on SGLang. EAGLE first attempt works pre-regression. Quantisation explainer: NVFP4 Quantisation Explained.
AGENTS.md adopted as the per-repo contract that AI coding agents read before touching the codebase. Consolidated across sovereign-blog, podcast-studio, ~/.vibe, ~/.openclaw. The older VIBE.md as a primary contract is retired.
Backup pipeline rebuilt with age-encryption (a modern symmetric encryption tool) and dual-target: NVMe plus USB. A Floki snapshot pull was added as Step 0. Full discipline: Backing Up 119B Parameters Without Going Bankrupt.

March 2026

Voxtral-4B TTS integrated into the podcast studio. The end-to-end pipeline reaches its first full episode. The ceiling that triggered the May pivot was not yet visible.
Image-generation pipeline (FLUX.1-schnell on ComfyUI, locally on the DGX Spark, plus WebP conversion and manual rating) backfilled across all blog articles. Setup: ComfyUI + FLUX Setup.
Stylometry phase 5 adopted as anti-AI-detection editorial signals: em-dash penalty, uniform-list penalty, sentence-length-stdev reward. The honest version of what this scoring does and does not catch: The Quality Gate That Rewards Fabrication.
OpenHands BadRequestError fix landed as a volume-mounted patch; the same week, the opencode strict-alternation bug surfaces on Mistral. The same class of bug, two different agents.

Earlier

Pre-DGX-Spark operations on smaller hardware. The decision tree that became the Self-Hosted AI Start Here guide was being lived through, not yet written down.
Nostr identity trio (cipherfox / sovgrid / hexabella) registered. First V4V (Value-for-Value, a Lightning-based tipping pattern) zaps received. Strategy: Zap Tracking and the Nostr Account.