How This Blog Actually Gets Built: The Full Build, Ten Weeks of Iteration, Three Hard Gates

May 27, 2026 29 min read

Update (2026-06-19). Two pieces evolved since this was written: the 35B Qwen quant is now AutoRound int4-mixed (switched from PrismaQuant on 2026-06-11, 69.2 tok/s, retired build), and hero images no longer strictly need the GPU mutex. On the Spark’s 128 GB unified memory the FLUX pass can run beside the resident LLM when there is headroom, with the model OOM-protected, falling back to the mutex pipeline otherwise. The mutex description below remains the safe default. Live stack: /stack/.

Most “how I built my blog” posts describe a static-site generator deployed to a serverless edge with a Markdown plugin and an analytics pixel. This one describes a desk-side NVIDIA DGX Spark with 128 GB of unified memory (about 121 GB usable as one pool) that runs a 35-billion-parameter Qwen quant under a CLI mutex so the image and TTS models can take the GPU in turn, drafts an article through that model with a style-aware quality gate sitting in front of it, renders the hero image through a separate FLUX-schnell pass on the same hardware via a CLI mutex, runs a stylometric AI-detection linter against the resulting prose, rsyncs the static build to a no-KYC EU VPS that speaks Caddy + Let’s Encrypt directly to the open internet, and emits a Matrix push when any step fails. All of that runs from a single master.py entry point on the machine in this apartment.

The mechanism came together over about ten weeks and has been in daily operation since 2026-04-08 (the Spark itself arrived early April 2026, so the whole arc is recent and on this one box, not a multi-year build). The visible artefact (this article, all 120 articles currently live as of 2026-05-27, the /insights/ page, the /stack/ snapshot, the /upstream/ contribution log) is the tip; the iceberg is what this article documents. If you are new to running self-hosted AI, the conceptual frames are explained as they appear and the internal links point to deeper-dive articles. If you have built similar systems, the insider numbers and the named bugs are next to each section.

Update 2026-06-15: fast-forward, the next thirty-odd articles

Three claims in the intro above changed after this article shipped. Per the Engineering Honesty Manifesto, here is what changed and why, with the trade-offs, instead of a silent rewrite. The dated milestones and numbers further down are left exactly as they read on 2026-05-27; this block is the fast-forward.

1. The local model switched from PrismaQuant to Intel AutoRound int4-mixed (2026-06-11). A same-ruler quant-gate (18 of 18 agent-bench coding tasks, identical quality) showed AutoRound decoding 12.7 percent faster: 69.2 against 61.4 tok/s on the measure.py ruler with prefill separated.

Pro: faster, and the AutoRound build keeps full vision. The earlier “vision dropped” note was wrong; the vision tower was just hidden behind a stale --language-model-only launch flag.
Con: the weights had to be re-quantised and re-validated. The served name and port stayed qwen3.6-35b on :30001, so every client (opencode, OpenWebUI, the blog MCP) switched with no config change.
Knock-on: the Mistral-on-SGLang :30000 engine is now retired. The vision job that used to need Mistral runs on Qwen itself, so the second engine and its 66 GB of weights stopped earning their slot. The old “57 to 62” and “71.5” tok/s figures below were a different (llama-benchy) ruler; only same-ruler deltas are comparable.

2. factcheck became a hard gate, and a third gate joined it. The “hard publication gating is the next phase” line below shipped. factcheck.py now blocks a deploy on any hallucinated registry pin. A new gate, selffact_check.py, blocks a publish that states a known-wrong fact about the grid itself (a wrong Spark spec, a retired service named as current, a personal-ownership date that never happened), checking against one source of truth, GRID-FACTS.md.

Pro: the two failure modes that used to slip past a human read, a hallucinated version and a stale self-fact, now stop a bad deploy on their own.
Con: a false positive would block a real deploy, so the self-fact rules are tuned to zero false positives across the whole corpus, and every gate keeps an emergency skip flag for the rare genuine case. The title now reads three hard gates, not two.

3. The Knowledge MCP shipped. What the knowledge-base guide still calls “on the roadmap, not shipped” is live. Agents query the local knowledge base, now including GRID-FACTS.md and a set of ops playbooks, over MCP, so the local model can answer questions about the grid without a cloud hop.

Ten weeks of milestones, in chronological order

The version of the pipeline that drafted this article is not the version that drafted the first article on the blog. Being explicit about the arc is part of the Engineering Honesty Manifesto discipline: do not pretend the current state was the original state. Thirteen inflection points moved the system from “I built this once” to “I ship from this every day.”

Date (2026)	Milestone	What changed and why it mattered
early April	DGX Spark booted into daily operation	Hardware showed up, drivers landed, first models loaded into unified memory. Articles were written manually with the model used only for one-off Q and A and code completion. The pipeline did not exist yet.
2026-04-08	First end-to-end draft pipeline	A single Python script could turn engineering notes (terse markdown bullets in a Gitea repo) into a publishable article. No quality gate, no factcheck, no stylometric scoring. The first three articles drafted this way required heavy human editing before going live.
2026-04-22	Forum auto-silenced a post as AI spam	A bug-report post got removed within an hour by an AI-detection service. The incident forced the first stylometric-detection layer into the scoring system: em-dash count, sentence-length standard deviation, uniform 3-bullet list count. Same retry loop the shape gate used.
2026-05-03	`scripts/factcheck.py` landed (warn-only)	Every Docker image, PyPI version, and npm package mentioned in prose now gets verified against the public registry. Hallucinated version pins became visible on /insights/ as a `factcheck_warnings` counter. Not yet a blocker (false-positive rate on niche registries is still real), but trending toward one.
2026-05-13	Migration from Mistral to Qwen 3.6 PrismaQuant as primary	Qwen 3.6 PrismaQuant on vLLM hit 57 to 62 tok/s decode with DFlash speculative decoding versus Mistral’s prior 29 tok/s no-EAGLE baseline. The Mistral / Qwen / GLM-5 comparison is the article that documents that call. Mistral kept the vision capability and the fallback slot via `safer-eagle` (36.5 tok/s, EAGLE confirmed stable on 2026-05-22).
2026-05-20 to 21	90 Gitea backlog issues closed in one overnight session	Pipeline + AGENTS.md discipline mature enough to run a real cleanup pass without breaking production. Five SovEng-pattern subpages went live, the Floki^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.} ^↗ VPS got hardened, the daily health-check cron landed.
2026-05-22	`switch.sh` mutex over the GPU services	Previously the dashboard mediated the LLM/TTS/image-gen handoff. The new CLI at `/data/scripts/llm/switch.sh` flips between Qwen on vLLM and Mistral on SGLang and the image stack on ComfyUI, Termux-friendly with sub-second status checks, enforcing that at most one inference engine has weights loaded into the 128 GB unified pool at a time.
2026-05-23	AGENTS.md + git-hooks rolled out to 16 repos	The multi-agent contract became a shared template, not per-repo improv. Pre-commit bulk-block, commit-msg trailer enforcement, pre-push fail-non-ff, post-commit prune, all via `core.hooksPath`. Same discipline in every repository the pipeline touches.
2026-05-24	Cloudflared retirement; direct Caddy + Let’s Encrypt on FlokiNET^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.} ^↗	Sovereignty score 5/6 to 6/6. The What Sovereign Actually Means article unpacks the six-dimensional framework that made the trade-off legible. No more rented edge layer; the public-facing surface is end-to-end controlled.
2026-05-24	Astro 5 to 6 migration in ~30 minutes	`loader: glob()` for the blog collection, `render(entry)` instead of `entry.render()`, `entry.slug` to `entry.id` across eight files. The build is now Content Collections with the glob loader; loader migration was the load-bearing change because every page that lists articles needed updating.
2026-05-27	32-article drop in one batch (88 to 120 live)	Pipeline survived its first real scaling test: 4 deploy iterations to clean state, all seven pre-deploy classes of error caught (H1 duplication, footer redundancy, tag singletons, name leak across two pre-existing articles, word-count drift, slug-naming honesty, em-dash sweep). The hub article landed at the same time; /insights/ updated within the same deploy script.
2026-06-11	Local model swapped PrismaQuant for AutoRound int4-mixed	A same-ruler quant-gate (18 of 18 agent-bench tasks) picked Intel AutoRound int4-mixed over the prior PrismaQuant build: 12.7 percent faster decode (69.2 vs 61.4 tok/s, measure.py) at identical coding quality, full vision retained. Served name and port unchanged, so all clients switched transparently; the Mistral/SGLang fallback engine was retired. See the AutoRound vs PrismaQuant duel.
2026-06-15	factcheck hardened, third gate added, grid facts externalised	factcheck.py went from warn-only to a hard deploy gate; a new selffact_check.py gate blocks publishes that state a known-wrong fact about the grid, with ground truth in a single GRID-FACTS.md that the local model also reads over the Knowledge MCP. The two-gate pipeline became a three-gate pipeline.

The point of laying out the milestones is to be honest about how recently this matured. The first stable end-to-end draft shipped on 2026-04-08, which is roughly seven weeks before the 32-article drop. Anything older than that on the blog was written by hand or with a much rougher version of the same flow. The article you are reading is the receipts for every milestone above.

The two-layer pipeline: Claude as architect, local as crew

For someone coming in cold: think of the system as a small construction firm. Cloud Claude (Anthropic’s developer agent product, accessed via Claude Code in the terminal) is the architect on retainer. The self-hosted 35B Qwen quant running on the Spark is the construction crew. The operator (me, working under the cipherfox persona on this site) is the site supervisor who signs off on every visit. The architect costs per-hour; the crew costs electricity; the supervisor stays the same regardless.

The dollar argument is the easy one. Once the Spark and the Floki VPS are paid for, a published article costs only electricity (roughly 250 W under draft load, roughly 90 W at idle, less at night). There is no per-token billing on the local layer. The harder argument is privacy: the actual source notes, the in-progress drafts, the failed attempts, the corrections, all of that material stays on the machine in this apartment. Nothing routes through a cloud API unless I am consciously inviting Claude into a specific meta-task and I have decided the content of that prompt is fine to send.

The split between the two layers is task-shaped. The cloud-vs-local capability matrix is the canonical reference for the row-by-row breakdown; the short version: architecture, multi-file refactors, novel reasoning go to Claude; drafting, single-file changes, tool calls go local; TTS, image gen, and anything privacy-sensitive go local because Claude cannot do them at all. The decision rule fell out of running both layers daily since early April, not from a benchmark study.

The split is not meant to be permanent. The cloud-Claude layer is the part of the stack that is not yet sovereign: it is rented, it runs off-box, and it sees whatever meta-task it is invited into. The explicit goal is to retire it. Every capability that still has to go to Claude (architecture, multi-file refactors, the hardest reasoning) is a line item on the list of what the local model cannot do yet, and every local-model upgrade is judged on whether it closes one of them. The day the local 35B, or its successor, handles a multi-file refactor without losing the thread is the day the cloud layer stops being load-bearing. Until then, honesty means showing the dependency, not hiding it. The direction of travel is full local sovereignty; the architect-on-retainer is a transitional role, not a fixture.

The hardware reality on GB10

The DGX Spark on the desk has 128 GB of unified memory, which means CPU and GPU share the same pool. That sounds great on the press release and it is mostly great in practice, but it has one operating consequence that shapes the whole pipeline: the inference engines, the image model, and the TTS model cannot all run at the same time. Their working sets overlap, and the 128 GB pool fills up. The mutex pattern from 2026-05-22 (switch.sh qwen|mistral|none|status) enforces this at the operator-tool layer with a 60-second guard between transitions so the kernel can actually drop the page-cache before the next service loads.

For pros, the genuine subtleties are: NVFP4 quantisation is the only path to fitting 119B Mistral params alongside enough headroom for inference KV-cache, EAGLE speculative decoding was the difference between 29 tok/s and 36.5 tok/s on Mistral but required confirming that the spec model stayed numerically stable through the entire context window (verified 2026-05-22), and Qwen 3.6 PrismaQuant at 4.75-bit beat Mistral on tokens-per-second by roughly 2x for text-only work because the engine path is shorter through vLLM than through SGLang on this hardware. The deeper details (which kernel revisions matter, which flashinfer tag finally cooperated, which fallback the OOM-watcher uses) are in the setup article and the Mistral vs Qwen vs GLM-5 comparison. This article stays at the pipeline layer above them.

The page-cache hijack failure mode deserves its own bullet because it has bitten me twice. After an SGLang or vLLM crash, the kernel keeps the model weights in the page cache. The next launch reads the weights faster than from disk (good) but the cache is competing for the same 128 GB with the about-to-load weights (bad). The fix is echo 3 > /proc/sys/vm/drop_caches before every relaunch after a crash. The Spark page-cache hijack memory entry is the canonical short note on the discipline; the discipline survives because the operator wrote it down after the second incident.

Session memory: how amnesiac agents become useful

For someone new to multi-agent systems: every agent loses everything between sessions. The local LLM does not remember what it wrote last week. Claude Code does not remember last session’s architecture decisions. The Matrix bridge agent does not remember the previous deploy. The naive fix is to start every session with “let me explain the project again,” which burns 10 to 30 minutes of context budget and still misses half the detail.

The pipeline fix is per-agent context files committed to each repo, read at session start, and updated only at session end:

AGENTS.md is the multi-agent contract. It lives in the root of every repository the pipeline touches (currently 16 repos, all using the same shared template from sovereign-shared-core/git-hooks/AGENTS-block.md). It defines source-of-truth boundaries (which agent owns which files), the session-start ritual (git pull && bash scripts/preflight.sh && read AGENTS.md), and the session-end ritual (commit with persona in the trailer, or mark the work as WIP(scope)). Every agent reads it first.
VIBE.md is the local-LLM style file. It carries the anti-AI-pattern rules (em-dashes forbidden, no “leverage” or “seamlessly”, no uniform three-bullet lists, no rhetorical-question paragraph endings), the active workarounds for known bugs (Mistral strict-alternation, EAGLE numerical-stability bounds), the per-style code policy that the drafter consults, and the rolling motif blacklist for the image model. It is the second file the local agent reads.
BRIEFING.md is the optional cloud-Claude companion. When a Claude Code session opens, the operator pastes the relevant BRIEFING.md and the session starts with hours of project context loaded as a single prompt. The cipherfox BRIEFING.md covers the sovereign-blog repo; the Hexabella BRIEFING.md covers the Nostr-posting layer; each persona keeps its own.

Context injection beats re-derivation every single time. The discipline costs one afternoon to write up front and ~10 minutes per week to keep current; the alternative is a week of “where did this divergence come from” before the next major incident. For pros, the genuine subtlety is that these three files must not contradict each other. Drift between them is the equivalent of stale documentation, and the factcheck linter is what surfaces drift when the prose says one thing and the registry says another. The auto-memory system in Claude Code extends this further; the memory/MEMORY.md index acts as the long-term store for facts that span sessions but do not belong in any single repo’s AGENTS.md.

The multi-agent ritual

Several agents touch this codebase, and none can see what the others did since their last edit. Cloud Claude Code handles architecture and content polish. A rotating set of local CLI tools (current lineup on /stack/, agent-layer comparison in Coding Assistants on a Sovereign Stack) handles pipeline work and bulk drafting against the self-hosted model. A Matrix-bridged agent represents the Hexabella persona for cross-platform posting (Nostr long-form, Mastodon feedback ingestion).

Coordination is by ritual, not by shared state:

session-start:  git pull
                bash scripts/preflight.sh
                read AGENTS.md (always)
                read VIBE.md if local-LLM agent
                read BRIEFING.md if Claude session

session-end:    git commit -- <explicit paths>
                  -m "scope: action

                  Co-Authored-By: <persona> <noreply@anthropic.com>"
                or git stash with WIP(scope) note

Drift between the production VPS and the repo is detected by scripts/preflight.sh. A single scripts/blog-deploy-verify.sh orchestrates: local build, rsync to the Floki VPS, the quality-signals self-heal pass, drift-commit, axe a11y verification across all pages mobile-and-desktop, and a live HTTP check that confirms the new content rendered. Two protections matter most for someone copying the pattern: git commit -- <paths> (with explicit paths) instead of git commit -a, because parallel sessions sometimes have other agents’ work in the index (verified 2026-05-18 after a contamination commit landed in production); and the persona trailer in the commit message, so the multi-agent audit trail survives every merge and rebase. The git-hooks toolkit (sovereign-shared-core/git-hooks) enforces these checks in the pre-commit and commit-msg stages, so a misconfigured agent cannot push.

The personas matter not because they are mascots but because they have distinct authorities. Cipherfox owns the engineering-log voice on the blog and is the only persona allowed to publish a draft to /blog/. Hexabella owns the cross-platform posting and is the only persona allowed to push to Nostr or to ingest Mastodon DMs. The Sovereign Qwen instance in OpenWebUI is a tool that either persona can call but neither can impersonate; it is bound to the sovereign-mcp server, the SearXNG web-search backend, and the sovereign-kb RAG corpus. The strategy-agents article is the canonical reference for which persona does what; the short summary is “if it touches the public internet on this domain, cipherfox; if it touches a relay on Nostr or a federation handle on Mastodon, Hexabella.”

Style configs: how generic LLMs stop sounding generic

Conceptual frame for those new to this: a generic LLM, even a quantised 35B one running on a desk, will produce structurally identical output for everything if you ask it in a generic voice. A “setup article” and a “strategy article” and a “fix-article” will all come out the same shape: same H2 count, same paragraph length distribution, same conclusion paragraph even when one is not warranted. The model has a prior distribution over “what an article looks like” and that prior wins unless something explicit overrides it.

The fix is config-driven, not prompt-driven. Each article style on this blog has its own config block in VIBE.md:

Code policy. Which articles get code blocks, how many, and what kind. Fix-articles get verbatim commands the operator could paste; strategy articles get diagrams or pseudo-code only, never literal pastable commands; setup articles get the complete config files; service articles get example invocations of the service.
Section count. 5 to 7 H2s for fix-articles, 8 to 10 for guides, no hard cap on strategy (but a soft warning at 12 H2s because beyond that the article wants to be split).
Section style. Each H2’s expected internal shape. A fix-article H2 names the symptom, then the cause, then the patch. A guide H2 builds toward an action. A strategy H2 makes a claim and defends it with at least one number or one named system.
Voice cues. Whose voice the article is in. Most articles are first-person operator (cipherfox). Some are explicitly cross-persona (when Hexabella’s Nostr-posting workflow is the subject, the relevant section can be in Hexabella’s voice).

For pros, the load-bearing detail is that this gets injected into the prompt before the model sees it, not retrofitted onto the output as a post-hoc edit. Vague style guidance just becomes statistical noise on top of the base distribution. Explicit constraints reach the attention heads at the right layer and change which tokens get sampled in the first place. The blog’s quality-score on /insights/ went up by an average of 35 points per article on the first batch after the per-style configs landed; the gate-failure rate dropped from roughly one in three articles to roughly one in twelve.

Image motif blacklist: what the model wants vs what the article needs

The image model (currently FLUX.1-schnell on ComfyUI, see /stack/ for the canonical version) defaults to the same metaphors regardless of topic: an overflowing glass, a lone figure at a desk, an industrial workshop with low light and a single window, a vague gradient with abstract geometry. Even when the prompt explicitly bans those motifs, the prior distribution wins. Negative instructions (“no overflowing glass, no lone figure”) do not override strong base priors. They get parsed; they do not get followed.

What works: redirect the prompt into a different visual domain per style. Mechanical/diagrammatic for fix-articles (gears, exploded diagrams, wiring schematics). Architectural for guides (cross-sections, floor-plans, isometric buildings). Portrait/landscape with consistent lighting for strategy (a desk with specific objects, a landscape with specific weather, a wall with specific posters). Maintain a rolling motif blacklist so the pipeline cannot loop between sessions; once “wiring schematic on dark background” gets used, it goes on the blacklist for two weeks.

The 32-article drop on 2026-05-27 was the worst-affected batch on motif collision. Eight of sixteen prompts in the most recent backfill round collided on industrial-workshop vocabulary because the blacklist was global-recent rather than within-batch. The per-batch motif-rotation fix is the open issue. For the curious, the hero image for this article was produced through the same pipeline; if it looks like a wiring schematic with too many gears, the motif-rotation fix has not landed yet.

The three quality gates, all hard

Every article passes through three independent gates before it goes live. All three block.

Gate 1, shape. A style-aware weighted score combining stylometric and structural signals: word count, sentence-length standard deviation, em-dash count (a strong negative weight, -12 per occurrence per the memory file), uniform 3-bullet structures, code-block presence by style, internal-link count, named-entity diversity (named systems, named files, named version numbers). Each style has its own weights and its own floor: currently 150 to 220 depending on style, with the higher floors on guides and strategy. Plus a per-style word-count floor: 1200 words for guides, strategy, and services; 800 for fix-articles. A score-fail OR a word-count-fail blocks the publish. The article stays visible on /insights/ with a fail badge so the gap is auditable, not hidden. The quality gate that rewards fabrication article is the case study where the gate itself was the bug, not the model; that is why the gate now scores against named-entity diversity (registry-resolvable names) rather than just against article shape.

Gate 2, factcheck. scripts/factcheck.py runs against every article. It walks the rendered HTML, extracts every Docker image, every PyPI version, every npm package, and every git tag mentioned in prose, and verifies each against the public registry. The warnings surface on /insights/ as a factcheck_warnings counter and feed negatively into the quality score. It started warn-only on 2026-05-03 and became a hard deploy gate (2026-06) once the false-positive rate on niche registries (pre-release, private, recently-renamed) was tamed. A hallucinated pin now blocks the deploy, with an emergency --skip-factcheck flag for the rare genuine false positive. The counter on /insights/ stays publicly visible, so the gap is auditable.

Gate 3, self-fact. scripts/selffact_check.py blocks a publish that states a known-wrong fact about the grid itself: a wrong Spark spec, a retired component named as current, a personal-ownership date that never happened. Its rules and ground truth live in one file, /data/scripts/GRID-FACTS.md, which the local model also reads over the Knowledge MCP, so the same source of truth that feeds the gate feeds the drafter. The rule set is curated to zero false positives across the live corpus, because a gate that cries wolf gets bypassed. This is the gate that would have caught the stale “PrismaQuant is current” phrasing this very article carried until the 2026-06-15 update above.

For pros, the load-bearing nuance: none of the three gates can tell you whether the registry-verified version was the right choice for the use case described. The factcheck linter resolves “Docker image foo:1.2.3 exists” but not “Docker image foo:1.2.3 is the right pin for this article’s context.” That remains a human-attestation problem, which is why the Engineering Honesty Manifesto is the document that closes the loop. The gates catch obvious failure; the operator reads every draft before it ships. The gates are necessary; they are not sufficient.

Anti-AI-detection: stylometry beats wordlists

Conceptual frame for those new to this: modern AI-detection tools do not pattern-match on banned words. They look at structural signatures. How often the writer uses em-dashes (U+2014, the long dash that humans rarely type because most keyboards do not have a key for it). How varied the sentence lengths are. How often three-bullet lists appear in a row. How often paragraphs end with rhetorical questions or with single-sentence punchlines. LLMs cluster around the mean on every one of these signals. Humans do not.

After a forum post got auto-silenced as AI spam on 2026-04-22 (well within the first hour, by a service that scored the post at >0.93 probability AI-written), the same scoring system that handles structural shape grew stylometric signals:

em_dashes: the strongest single LLM tell. Score weight -12 per occurrence in body content; the rule is enforced site-wide in VIBE.md anti_ai_patterns and the deploy gate refuses any article with one. The character is forbidden in articles, in UI pages, in path-blurbs, and even in source-comments that leak into source mirrors.
uniform_3_lists: the structural tell. Models love a clean three-bullet rhythm. Score weight -5 per occurrence of a 3-bullet list that follows another 3-bullet list within the same article.
sentence_length_stdev: humans vary sentence length unevenly. LLMs cluster around the medium sentence. The signal is the standard deviation across all sentences in the article; below a threshold (currently 7 words of stdev), the article gets flagged.
rhetorical_question_paragraph_endings: the rhythmic tell. LLMs love to end paragraphs with a question. Tracked but not yet weight-scored.

The retry loop applies. If the local model writes too uniformly, the next pass gets explicit feedback to break the rhythm (specific examples: “your last article had 12 sentences between 12 and 16 words; vary by at least 8 words across the next draft”). Goodhart’s law applies; optimising against my own detectors is not the same as fooling Discourse AI in the real world, so the weights stay moderate and the layer runs as a linter, not an adversarial loop. The point is not to evade detection; the point is to write like a human because the alternative is bad writing. The detection signal is just the most measurable proxy for “this paragraph has rhythm.”

Publishing flow: from notes to /insights/

For someone tracing the path of a single article end-to-end: the source is a markdown file in the cipherfox/sovgrid-business/articles directory in the local Gitea (loopback only, 127.0.0.1:3002), typically a few hundred words of dense engineering notes from a debugging session or a build log. The drafter (cipherfox running opencode against Qwen 3.6) reads the notes, reads VIBE.md, reads the relevant per-style config block, and produces a draft into sovereign-blog/src/content/blog/<slug>.md with the frontmatter filled in.

flow:
  notes.md (~300 words)
       → opencode + Qwen 3.6 against VIBE.md per-style config
       → draft.md (~1500 words) with frontmatter
       → scripts/score-quality.py (Gate 1 shape, hard)
       → scripts/factcheck.py (Gate 2 registry, hard)
       → scripts/selffact_check.py (Gate 3 self-fact, hard)
       → scripts/render-hero.py (FLUX-schnell, ~5 s/image)
       → bash scripts/blog-deploy-verify.sh
            → astro build (~6 s)
            → rsync to Floki
            → axe a11y check across all pages
            → live HTTP verification of the new URL
       → /insights/ updated; Matrix push if any step failed

The pipeline is sequential by design. Earlier iterations tried parallelising the draft and hero-image steps; the lesson from 2026-04-15 was that the GPU pool fights itself if both run at once. The systemd patterns article covers the service-management layer underneath this flow.

The cross-platform posting happens after publish, not as part of the publish flow. Once the article is live, Hexabella (running on the Matrix bridge with its own context file and its own signing keys) reads the new URL from the deploy log, generates a Nostr long-form announcement (NIP-23 kind:30023), and pushes it through /data/scripts/nostr/post.py to the relays the project uses. The Nostr posting is intentionally a separate process because the article’s blast radius is the open relay graph, not the closed sovgrid domain. The strategy decision to use NIP-23 covers why no traditional email newsletter was added.

The 32-article drop: the scaling test

The 32-article drop on 2026-05-27 was the first real test of the pipeline at scale. Eighty-eight articles were live going in; one hundred twenty were live coming out. The drop required four deploy iterations before the site rendered cleanly, because all seven pre-deploy classes of error fired at once. The receipts:

H1 duplication. Drafts included # Title as the first body line, which compounded with the layout’s <h1>{title}</h1> to produce two H1s per article. Mass-stripped via regex pre-deploy. The deploy gate now catches it.
Footer redundancy. Three articles repeated the footer’s “sovgrid.org is an engineering log” paragraph in the body. Mass-stripped after the user escalated. Pre-check class #2 now catches it.
Tag singletons. Sixty-one tag-pages with a single article each. My rule is “no tag with fewer than two articles.” Mass-filtered before deploy.
Name leak across two pre-existing articles. Fourteen mentions of the operator’s real first name (instead of the cipherfox persona) found in three articles. Two of those articles were already live for weeks. Token-boundary regex replacement to cipherfox (the persona used on this domain) was the fix. The leak was the most serious of the seven; it was already in production before the drop started.
Word-count drift. One article fell to 1186 words after the H1-and-footer strip, below the 1200-word floor for its style. Added a 30-word paragraph about socket-activation absence in systemd, which was load-bearing context anyway. Score returned to passing.
Slug-naming honesty. I argued for keeping the old slug astro-5-caddy-static-first-ai-blog-stack after the Astro 5-to-6 migration “for SEO.” Caught myself: the article was never online under that slug, so the SEO claim was empty. Renamed to astro-6-caddy-static-first-ai-blog-stack. The rule is: do not preserve slug names for pages that never had inbound links.
Em-dash sweep. Multiple rounds across UI pages, path-blurbs, and one yaml blurb. The rule is hard: zero em-dashes site-wide.

All seven classes are now in the pre-deploy check list, the deploy script enforces them, and the /insights/ gate-fail counter would show any drift the next morning. For pros, the load-bearing lesson is that the gates that catch one class of error in one article will catch it in 32 articles only if the gate runs at the right layer. The em-dash sweep, for example, had to run across src/content/blog/, src/content/paths/, src/pages/, src/components/, and src/layouts/ because the rule applies to every render-path the reader sees, not just article bodies.

What is still genuinely broken

A short honest list, because the manifesto Rule 6 requires it:

Multi-file refactors on the local stack. The 35B Qwen quant loses context past about four files. Anything bigger still goes to Claude. See the cloud-vs-local capability matrix for the row-by-row breakdown.
Image motif collision on multi-article batches. The blacklist is global-recent, not in-batch. The 32-article drop had eight collisions on industrial-workshop vocabulary.
Voxtral-4B TTS expressivity ceiling. Spot-listen test 0/10. Pivot spike to VibeVoice / Higgs Audio v2 / IndexTTS-2 is queued; podcast pipeline stays on Voxtral until the spike completes.
reasoning_tokens reporting on SGLang (now historical). The retired Mistral/SGLang fallback always reported reasoning_tokens: 0 even when reasoning was active, an SGLang bug, not a model bug. The primary stack is vLLM/Qwen and was never affected; with SGLang retired (2026-06-11) this no longer bites, but it stays filed upstream.
Web-search-grounded TLA translation in the Hexabella podcast pipeline. Architecture decision pending: pre-generation search-pass (deterministic, more latency) vs inline tool-calls during local-model generation (flexible, less reproducible). Plan doc not yet written.

The other half of “still broken” is upstream-tracked in /upstream/, which is the public-facing index of every bug found while running this stack that got filed and patched at the source. Two open-source releases came out of the same surface: sovereign-mcp (the MCP server behind mcp.sovgrid.org) and vps-healthcheck (the daily Floki audit script). Both MIT-licensed.

The entry point and the loop

All pipeline operations run through a single command surface on the Spark:

python3 scripts/master.py

It dispatches to individual scripts with inline explanations of what each does and when to use it. The desktop GUI (sovereign_dashboard.py) wraps the same with live output streaming for the longer-running tasks. The Matrix bridge gets push notifications from the same daemon when a build finishes or a deploy fails. The CLI mutex at /data/scripts/llm/switch.sh is reachable from Termux on a phone over Tailscale SSH, which means the operator can launch a draft from a coffee shop and walk back to the desk while it finishes.

The pipeline is not optimised for someone else to copy. It is optimised for one operator to keep running it without breaking it. The discipline above (the milestones, the two-layer split, the AGENTS.md ritual, the per-style configs, the motif blacklist, the two gates, the stylometric layer, the deploy-verify script, the persona separation) is what made the difference between “I built a pipeline once” and “I ship from this every day since 2026-04-08.” If you are tracing a single article from notes to live URL, the publishing-flow diagram above is the map. If you are evaluating the system as a whole, the 2026 reference architecture is the layered narrative that cross-links into every block, and the Start Here page is the right entry if this is the first article you have opened.

The receipts for everything above are in the commit log on the Gitea instance behind the firewall, mirrored selectively to GitHub under the same identity, and the dated snapshots that get a number wrong eventually get a follow-up article that prints the corrected number with the original date next to it. That is the Engineering Honesty Manifesto commitment, and that is the deal.

	Today	7d	30d	All-time
Unique readers	—	—	—	—
Page views	—	—	—	—