The Engineering Honesty Manifesto

May 27, 2026 11 min read

Update (2026-06-19). The “Qwen 3.6 PrismaQuant” references here predate the 2026-06-11 production switch to AutoRound int4-mixed (69.2 tok/s, 12.7 percent better on the coding gate, vision retained, PrismaQuant retired). The figures are kept as the engineering-log record; the live stack is on /stack/ and the switch is measured in AutoRound int4 vs PrismaQuant.

The honesty is the product.

Everything else on this site is a packaging decision around that one commitment. The hardware, the model stack, the consulting practice, the book, the Lightning address in the footer, the engineering log of bugs I caused and bugs I fixed: all of them are downstream of the choice to write the operating reality as it happened, rather than the operating reality as it would have sold better.

This piece is the explicit version of that commitment. Six rules I hold myself to. Each rule has a receipt from the public log of this site that proves I am willing to keep the rule under load. If any of these rules is broken in a future post, the rule is being broken on purpose and the post will say so.

Rule 1: Numbers I have not measured do not appear

If I write that the DGX Spark sustains around 71 tokens per second on Qwen 3.6 PrismaQuant under DFlash speculative decoding, it is because I have measured it, on this hardware, with a known prompt distribution, and I can point to the systemd unit and the log file that produced the number. (See Spark Arena Rank 4 Made Me Add Qwen3.6 for the original 45 tok/s baseline measurement on the same hardware, before DFlash was enabled. The number moved because the configuration moved; both numbers are real.)

If I write that the Spark draws “moderate” power under load, it is because I have not put a Kill-A-Watt on the input and I refuse to quote a wattage I cannot defend. The vendor’s TDP is a public number; the lived behavior is my observation; the synthesis is honest because the parts are labelled. The cost of this rule is that I look less authoritative than writers who confidently cite numbers they pulled from a press release. The benefit is that when a reader builds a decision on my numbers, the decision is built on actual measurements.

The corollary: vendor benchmarks get cited with the configuration they were measured under, or not at all. “131 tokens per second peak” without “batched, throughput-optimized, parallel-request” is marketing, not data. The same rule applies to vendor SOTA claims (Z.ai’s “8-hour autonomous execution” claim for GLM-5.1, for instance, gets a label that says “vendor-published claim, not operator-reproduced”).

Rule 2: Failures are first-class content

The site’s fixes/ archive is longer than its setup/ archive. That is on purpose.

When I broke the loudnorm filter on the podcast pipeline, the postmortem went up the same week. (See Fixes: ffmpeg volume filter eval frame.) When the SGLang restart kept hitting OOM at 95 GB because the kernel page cache was holding stale weights from the previous engine instance, the fix and the cause both got published. (See Fixes: SGLang Restart OOM Fix; the one-line fix is echo 3 > /proc/sys/vm/drop_caches before every engine relaunch.) When the vLLM MoE backend defaulted to a kernel path that froze the desktop session, I wrote the debug log before I shipped the workaround. (See Fixes: vLLM MoE Throughput sm121 Desktop Freeze; the env-var fix is VLLM_FLASHINFER_MOE_BACKEND=latency.)

The reason is not contrition theatre. The reason is that the failures are where the operational knowledge lives. A reader who is about to walk into the same wall benefits more from my postmortem than from a polished setup guide that pretends the wall does not exist. The page-cache hijack pattern is the canonical case: every Spark operator will hit it; the documented fix is one shell command; the cost of not knowing the fix is an unscheduled OOM at the worst possible time.

Rule 3: Citations are positioning, not neutral sourcing

When I link to an author, I am saying “this person’s broader work is consonant with the posture of this site.” When I quote a public figure, I am saying “I will be associated with this person’s reputation in the reader’s mind.” Both of those statements are positioning decisions, not neutral attribution.

The practical consequence is a small list of people whose work I will cite and a larger list of people whose work I will not, even when their technical content is good. The binding internal decision memo retired several response-article angles after I noticed I was about to position the site against its own audience, by citing a figure whose broader work points the opposite direction from where sovgrid sits. The rule cost me one article angle. The rule keeps the site readable to its audience.

Rule 4: Hedging is honest; padding is not

There is a real difference between “I have not measured this and the published figure is X” and “industry experts agree that approximately Y.” The first is a hedge, and it tells the reader exactly which part of the claim is mine to defend. The second is padding that performs authority without earning it.

The site uses hedges deliberately. “Per the vendor’s published figure,” “based on my own measured throughput on the same hardware,” “the lived experience is,” “this is observation-level, not instrumented.” Each hedge is a small honest label. The reader can decide how much weight to give each labelled part.

The site does not use padding. No “industry experts agree.” No “leading platforms support.” No anonymous attribution where named attribution would do. If I cannot name the source, I do not need the claim. (For the inverse case, see The Quality Gate That Rewards Fabrication, where I document a scorer pathology that incentivized exactly the padding pattern I am refusing here.)

Rule 5: The customer’s premises is not a metaphor

A real chunk of this site’s revenue is “sovereign-AI consulting,” which means I install AI on someone else’s hardware and leave the keys with them. If I describe that work, the description has to be faithful to what actually happens in the engagement, not to a marketing version of it.

Concretely: I will not describe consulting outcomes I have not delivered. If a piece talks about “the typical engagement,” it is referring to engagements I have run, with the customer’s identity anonymized when needed. If a piece talks about “the case for sovereign AI in healthcare” or “the financial-services use case,” I am explicit about which parts are deployed-and-measured and which parts are scoped-but-not-yet-shipped.

The temptation to imply more deployments than exist is real and constant. The discipline is to resist it, because the value of the consulting pipeline depends entirely on the customer believing that the engineer they are hiring is honest about what has and has not been done. As of May 2026, the consulting revenue is at the “scope-call SKU validation” phase, not at the “five enterprise engagements shipped” phase, and the writing reflects that.

The multi-agent operational discipline behind this rule lives in the AGENTS.md convention across all 16 Gitea repositories. Every commit carries an agent identifier in the trailer; every pipeline pathspec-commits rather than git add -A; every quarterly review walks the corpus for stale claims. The institutional honesty is what keeps the rule enforceable when multiple agents (Claude Code, opencode, and others) touch the same codebase.

Rule 6: Self-correction is published, not silenced

When I am wrong in print, I write the correction in a follow-up post, not by stealth-editing the original. The post that documented my original assumption that the Mistral-to-Qwen swap would be a 2.5x slowdown contains the explicit line “Wrong direction entirely” once the Spark Arena measurement landed. (See Spark Arena Rank 4 Made Me Add Qwen3.6.) The Mistral article links forward to the correction. The two posts read as one honest arc.

The alternative pattern (silent edits that erase the original claim) is convenient and dishonest. It produces a site that always looks like it was right, which is indistinguishable from a site that is unwilling to be wrong in public. Sovgrid is willing to be wrong in public. The willingness is the substrate that makes the rest of the writing trustworthy.

The institutional version of this rule is the memory-pending-audit-quarterly cadence instituted on 2026-05-25. Every quarter, the operator walks the agent-memory corpus for “wartet auf X” / “blockiert” / “pending” claims and verifies each one against current reality. The cadence was instituted after a single session uncovered five stale blockers, including a two-day-stale claim that a Gitea token rotation needed physical Desktop access when it was actually a five-second docker exec command. The pattern is the same as Rule 6 at the operational level: do not let stale claims accumulate in memory or in print, because both audiences (future-self and future-reader) depend on the claims being current.

The next audits are scheduled for 2026-08-25, 2026-11-25, 2027-02-25, and 2027-05-25. Each one will publish whatever it finds.

What this commits me to

Reading this list back, the practical commitments are concrete. The site is allowed to be slower than competitors that fabricate numbers, less authoritative than writers who quote anonymous “industry experts,” less impressive than consultants who imply ten engagements when they have run three. Those are real costs. They are the cost of the honesty being the product.

The benefit is that the readership the site attracts is the readership that wants the honesty. A reader who comes to sovgrid expecting a hype piece bounces fast. A reader who comes to sovgrid expecting that the operational details will reflect what actually happened stays, and over time becomes the kind of reader who pays for a Stack Audit or buys the book in pre-order, because they have already decided that the engineer behind the writing is the kind of engineer they want in the room. The reading list that shaped this site’s argument is at [/books/]; the best Bitcoin books are on Konsensus^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.} ^↗.

The hidden compounding effect: the honesty discipline makes the writing easier, not harder. There is no need to remember which version of the story I told which audience, because there is only one version, and it is the version the log files would tell if anyone asked. (For the broader argument about how this temperament shows up across a community, see The Quiet Pattern Among Sovereign Engineers.)

How to read the rest of the site through this lens

If you have just landed on sovgrid.org for the first time, the manifesto above is the lens for everything else. Three quick reading paths.

If you want the engineering substance, start with the Self-Hosted AI Start Here guide and follow the cross-links from there. The corpus is mirrored as a public read-only feed at sovgrid.org/rss.xml, so an offline reader can pull it without browser tracking. Most pages in the setup and fixes archives are short, postmortem-style, and link back to whatever they superseded.

If you want the strategic argument for why this work matters, the companion piece What Sovereign Actually Means in 2026 and Two Leaderboards Nobody Reads Together are the two strongest entry points.

If you want to support the work, the footer has a Lightning address. There is no paywall. The book is in pre-order. For consulting, reach me through any of the contact links in the footer (Nostr DM is the fastest, the email link is HTML-entity-encoded so it survives spam scrapers, the GitHub profile takes issues too). None of these are necessary for reading the site, because the rule that “the honesty is the product” implies that the writing has to be valuable on its own, before any commerce begins.

What you will not find here

There is no email newsletter. There is no signup form. The decision was made deliberately on 2026-05-25 after evaluating the open-source options (Listmonk, Keila, self-hosted Postfix versus rented relays through Mailgun or Postmark). The honest framing of why: an email subscriber list would put the operator into custodian-of-PII territory on Dimension 1 of the sovereignty framework, with no compensating capability the existing channels do not already cover.

The sovereign-native substitute is two channels that already work:

The RSS feed at /rss.xml: free, no account, no email collection, works with every offline reader.
Nostr long-form (NIP-23, kind 30023) posts on the cipherfox@sovgrid.org npub: every published article on sovgrid.org cross-posts as a long-form Nostr event. Readers follow the npub via any Nostr client and the next article shows up in the feed. The Nostr identity is portable across relays, so the audience reach is not held by any single platform.

The absence of an email newsletter is itself a Rule 5 statement. The decision will be revisited if the Nostr reach plateaus for more than six months while genuine email-lead-generation requests exceed ten per week. Until then, the RSS feed and the Nostr follow are the channels, and the absence of a third channel is explicit rather than implied.

The same logic applies to several other things sovgrid.org does not have yet: a dedicated /hire/ page, a paid tier, a podcast subscription channel, a Telegram group. Each is on the operational backlog. None of them is in print yet because none of them exists yet. (See the Reopen-Trigger table for the conditions under which each item moves from “deferred” to “active build.”)

The honesty discipline applies recursively. If a piece of marketing infrastructure is not deployed, the site does not pretend it is.

	Today	7d	30d	All-time
Unique readers	—	—	—	—
Page views	—	—	—	—