Roadmaps, concepts, meta-reflections, and benchmarks across the sovereign AI stack.
A two-day build log from localhost to a sovereign hybrid AI site. Three failure modes, exact fixes, and the reproducibility checklist most cloud guides skip.
Read article →
gpt-oss-120b pulls nearly four million downloads a month, so I assumed it was a one-command experience. Getting it to serve on a DGX Spark took a frozen box, a 25GB image pull strangled by a Tor proxy, and a 43-minute kernel compile. Then the measurement: on my own coding tasks the 120B scored 56 percent where the 35B Qwen I already run scored 100. Here is the full teardown, with every number measured on the box and the failed measurements thrown out, not published.
NVIDIA's Nemotron-3-Super-120B-A12B is tuned for Blackwell and ships an NVFP4 build that fits a single 128GB DGX Spark. I measured it where almost nobody else does: single-stream, on one GB10. The result is 23.7 tok/s, a competent but painfully verbose coder, and a genuinely strong retrieval agent. Here is the full teardown, with the published benchmarks fact-checked against what the box actually did.
I built a small, dependency-free harness that answers one question with numbers instead of vibes: does this enhancement make my agent measurably better, on my models, on my tasks? Here is the method, what I found, and why deterministic gates are the whole point.
I run Qwen3.6-35B at 4.75-bit for coding. A 4.0-bit AutoRound build promised more speed. Fewer bits usually means a dumber model, so I measured both halves: decode throughput and coding quality, the latter through my own agent-bench harness. The result settled it. Here is the duel, the bandwidth math, and why the bit count was the wrong thing to fear.
caveman has ~200k installs and claims 75% token reduction. I measured it on two local models and three Claude frontiers (Sonnet 4.6, Opus 4.8, Fable 5). The math does not work out the way the claim says it does.
Serena is one of the most-installed coding MCP servers. I tested it against two local models (Qwen3.6-35b and Mistral-Small-4) on three refactor tasks with deterministic gates. The short answer is more interesting than yes or no.
Monitoring one VPS with a Prometheus stack is like hiring a security team for a garden shed. I wrote a 315-line bash script instead: one SSH session, twelve checks, one morning notification. Here is the design, the honest comparison against the usual suspects, and why detect-and-alert beats auto-fix at this scale.
Value-for-value as the monetization model for sovgrid. The architectural fact (the channel exists) versus the dollar volume (zero sats received as of the most recent ground-truth audit). The honest version of what V4V is and is not, six months in.
About a third of the people who ask me end up not buying. Six specific 'don't buy' clauses, four buyer profiles, the four real alternatives ranked, a flowchart, and the operational receipts (drop_caches=3, VLLM_FLASHINFER_MOE_BACKEND=latency, 30-minute recovery runbook) the spec sheet does not give you. Lead-magnet source; also gated as PDF.
Two days of reverse-proxy work, a full Caddy stack with Let's Encrypt TLS and basic-auth in front of opencode web, all working. Then I realized I am not the right user for it. The actual mobile answer was already on my phone, and OpenWebUI quietly took over the other half of the use case.
HackerNoon ranks coding LLMs by programming language. WhatLLM.org aggregates LiveCodeBench, Terminal-Bench and SciCode. Neither tests self-hosted models on real hardware. A self-hoster's reading protocol for coding leaderboards.
The planning post before the implementation post. FIPS is an open-source mesh protocol with cryptographic identity and transport-agnostic routing. My sovereign AI stack is sovereign at the model and the hardware, and leaks the whole workload at the network boundary. Here is why that gap matters, the five concrete pieces of work I am committing to, and why I write the plan in public before I know if it works.
I almost published 'Mistral Small 4 scores 0/30 on coding, the quant kills it'. A competent model scoring exactly zero should have been the red flag. The benchmark harness was hanging behind this stack's Tor docker proxy and never reached the model. Here is the broken-ruler story, the direct measurement that replaced it, and every Mistral-vs-Qwen3.6 number at a glance, including which one can actually read an image.
This blog gates every article behind one Python scorer before it publishes. I gave Qwen3.6 and Mistral Small 4 the same brief, the Start Here hub article this site still owes, and ran the raw output through that real gate with no editing. Both passed. Both invented hardware, processes, and benchmarks the scorer counted as quality. Here is the full method, the two source texts, and why a passing score is a floor and not a truth filter.
Six traits I keep seeing across the people who fit the sovereign-engineer description: they argue with specs, name every dependency, default to publishing, plan in decade arcs while shipping weekly, price friction honestly, and gate their optimism. Written from the outside, by the operator who runs the iron the sovereign software eventually touches.
Eleven VibeVoice renders, one Voxtral baseline, the operator's ears. The first day of the three-day TTS spike that follows the V6=0/10 verdict. Engineering-log shape, with the actual audio embedded.
Five posts a week, no marketing department, no template-substitution. Building a Nostr distribution cadence for a self-hosted blog that does not embody what readers can spot in two scrolls.
Eight engineering fixes deep, three weeks of patches, two failure modes on the same engine. The Voxtral open checkpoint has no path to release-quality podcast audio. The drama of staying with it anyway, and the three engines I plan to spike next.
My MCP-server NSM page showed 334 unique agents. One change to the aggregator (User-Agent plus IP /24 dedupe) and the truth surfaced: 86% of external hits come from a single /24 range, the rest are mostly automated probes. Headline metrics that look like reach can be five services pretending to be many.
Each number on the live Insights page has a formula, a business meaning, and a vanity-trap. If you are running a DGX Spark as the engine of a small AI service, here is how to read the dashboard daily without chasing growth-theatre, and which two metrics are the only ones worth waking up to check.
Qwen3.6-35B-A3B PrismaQuant at 95 tok/s on a single Spark (Spark Arena rank 4) beats my measured Mistral Small 4 at 35 tok/s by 2.7x on paper. This is the plan, not the result. SWE-Bench scores, opencode replacing vibe, why Mistral stays installed for creative prose, the Hacker News critiques on opencode I take seriously, and the two-day prep before day-2 measurements land on 2026-05-25.
Rendering a 367-character podcast turn as one Voxtral call takes 21 seconds. Split into 90-character chunks: 35 seconds. Same words, same voice, 38 percent more wallclock.
Wired a browser search form directly into an MCP tool that AI agents already call. One afternoon, four endpoints, zero CORS, real numbers from the deploy. The mistakes that cost me an hour are documented inline.
What a DGX Spark actually draws from the wall, what that costs in Germany versus the US, how it compares to a lightbulb and a Bitcoin miner, and how many solar panels would offset it. With sources.
Cipherfox and Hexabella post curated content without human oversight, using Mistral Small 4 on a DGX Spark and a hardened signing service. Here’s how it works today.
How sovgrid.org structures its most important posts to guide readers and shape the blog’s identity.
How we’re getting the Sovereign AI MCP endpoint listed in five registries with real traffic tracking and zero KYC friction.
A no-BS breakdown of the gaps in a self-hosted AI stack and the exact next steps to plug them.
Three Nostr identities, a working zap-attribution pipeline, 44 articles live at the time of writing, and after 30 days exactly zero zaps. What I learned about V4V on a small technical blog.
Mainstream AI coverage cites only one leaderboard. arena.ai ranks quality. spark-arena.com ranks throughput on real hardware. The decision that matters lives in the third column nobody publishes.
Status snapshot of what is running on this stack today and what is being built next. For returning readers. New here? Read 'Self-Hosted AI: Start Here' first.
This technical blog maintains a single source of truth while layering machine-readable tools on top, ensuring both human readers and AI agents get accurate, up-to-date information.
Learn how to transform your technical blog into a dual-purpose knowledge base that serves both human readers and AI agents while future-proofing your content strategy.
A deep dive into the DGX Spark ecosystem, real power costs, and agent-driven tool adoption for self-hosting 119B models at home in 2026.
A hands-on comparison of AI coding tools testing local inference vs cloud dependency for privacy-first workflows.
A deep dive into optimizing Mistral Small 4 for local technical blogging, with practical solutions for session memory, image generation, and EEAT compliance.
A full-system review of our quality scoring pipeline against a rigorous philosophical framework. Three things it confirms, two things it exposes, and one concrete fix that changes the architecture.