#devops | Sovereign AI Blog

I asked Qwen3.6 to redesign a stats component on sovgrid.org. It did, with real unique readers from NSM, git-based edit history, collapsable layout, and a dozen bug fixes along the way. The cost was eighteen commits, eight debugging iterations, and a growing sense that I was stress-testing my local LLM at the edge of its useful range for a feature whose value to me as operator was genuinely questionable.

Jul 17, 2026

The ArticleZapStats Redesign: Real Readers, Git History, and the Cost of Asking a Local LLM to Care

I asked Qwen3.6 to redesign a stats component on sovgrid.org. It did, with real unique readers from NSM, git-based edit history, collapsable layout, and a dozen bug fixes along the way. The cost was eighteen commits, eight debugging iterations, and a growing sense that I was stress-testing my local LLM at the edge of its useful range for a feature whose value to me as operator was genuinely questionable.

I wanted a daily read of what is happening across my public repositories without handing a cloud service write access to them. The result is a sovereign GitHub assistant that runs on my own GPU, reviews incoming pull requests with a local model, and physically cannot post to GitHub. Here is the architecture, every decision behind it, the comparison with the SaaS reviewers, and the four times the build lied to me before it told the truth.

Jun 18, 2026

sovereign-aiself-hostedmcp

The GitHub Bot That Cannot Write

I wanted a daily read of what is happening across my public repositories without handing a cloud service write access to them. The result is a sovereign GitHub assistant that runs on my own GPU, reviews incoming pull requests with a local model, and physically cannot post to GitHub. Here is the architecture, every decision behind it, the comparison with the SaaS reviewers, and the four times the build lied to me before it told the truth.

Monitoring one VPS with a Prometheus stack is like hiring a security team for a garden shed. I wrote a 315-line bash script instead: one SSH session, twelve checks, one morning notification. Here is the design, the honest comparison against the usual suspects, and why detect-and-alert beats auto-fix at this scale.

Jun 10, 2026

strategy

vps-healthcheck: Twelve Daily Checks, One SSH Session, One Notification

Monitoring one VPS with a Prometheus stack is like hiring a security team for a garden shed. I wrote a 315-line bash script instead: one SSH session, twelve checks, one morning notification. Here is the design, the honest comparison against the usual suspects, and why detect-and-alert beats auto-fix at this scale.

I took the published Spark-Arena recipes for Qwen3.6-35B on a DGX Spark, ran them on my own box, and almost none of the headline throughput reproduced. Here is what the numbers actually mean once you control for the container image, the measurement harness, and the prompt, plus a capability that was hiding inside one of the quants.

Jun 7, 2026

qwendgx-sparkengineering-honesty

The Leaderboard Said 239 Tokens a Second. My DGX Spark Said 71.

I took the published Spark-Arena recipes for Qwen3.6-35B on a DGX Spark, ran them on my own box, and almost none of the headline throughput reproduced. Here is what the numbers actually mean once you control for the container image, the measurement harness, and the prompt, plus a capability that was hiding inside one of the quants.

I copied my DGX Spark /data/ convention to a standard Ubuntu laptop. Three weeks later I forgot Docker exists. Root partition filled to 96 percent. Here is the diagnosis, the surgery, and the rule I should have followed.

Jun 1, 2026

lenovoubuntudockerengineering-honesty

The /data/ Convention Trap: Ubuntu-LVM Lessons That Bit Me Twice

I copied my DGX Spark /data/ convention to a standard Ubuntu laptop. Three weeks later I forgot Docker exists. Root partition filled to 96 percent. Here is the diagnosis, the surgery, and the rule I should have followed.

Watchtower upstream is archived, but the ecosystem did not die with it. A community fork exists. So does WUD. So do half a dozen smaller projects. I built watchdocker anyway, and this is the honest write-up of why a 350-line bash script earns its place next to the survivors, plus how to fork it, contribute to it, and help it land in the hands of operators who would benefit.

Jun 1, 2026

dockersovereigntyopsfix

watchdocker: A Bash-Native Successor To Watchtower, Honestly Compared

Watchtower upstream is archived, but the ecosystem did not die with it. A community fork exists. So does WUD. So do half a dozen smaller projects. I built watchdocker anyway, and this is the honest write-up of why a 350-line bash script earns its place next to the survivors, plus how to fork it, contribute to it, and help it land in the hands of operators who would benefit.

The complete mechanism behind sovgrid.org: a DGX Spark on a desk drafting articles through a 35B-parameter Qwen quant, cloud Claude doing the architecture, AGENTS.md as the multi-agent contract, three independent quality gates, and a stylometric layer that landed after a forum auto-banned a post as AI spam. Ten weeks of milestones, the real numbers, the things that still do not work, the goal of eventually retiring the cloud layer entirely, and the entry point that ties it all together.

May 27, 2026

agentssovereign-aidgx-spark

How This Blog Actually Gets Built: The Full Build, Ten Weeks of Iteration, Three Hard Gates

The complete mechanism behind sovgrid.org: a DGX Spark on a desk drafting articles through a 35B-parameter Qwen quant, cloud Claude doing the architecture, AGENTS.md as the multi-agent contract, three independent quality gates, and a stylometric layer that landed after a forum auto-banned a post as AI spam. Ten weeks of milestones, the real numbers, the things that still do not work, the goal of eventually retiring the cloud layer entirely, and the entry point that ties it all together.

A self-hosted Gitea instance holds the prompts, the unit files, the runbooks, the customer data references, and the model identifiers for the sovgrid AI stack. The pattern is mundane and load-bearing.

May 20, 2026

tutorialgitea

Gitea as Source-of-Truth for AI Pipelines

A self-hosted Gitea instance holds the prompts, the unit files, the runbooks, the customer data references, and the model identifiers for the sovgrid AI stack. The pattern is mundane and load-bearing.

Two days of reverse-proxy work, a full Caddy stack with Let's Encrypt TLS and basic-auth in front of opencode web, all working. Then I realized I am not the right user for it. The actual mobile answer was already on my phone, and OpenWebUI quietly took over the other half of the use case.

May 20, 2026

strategyopencodeagents

I Built a Web UI for Mobile Coding. Termux Won Anyway.

Two days of reverse-proxy work, a full Caddy stack with Let's Encrypt TLS and basic-auth in front of opencode web, all working. Then I realized I am not the right user for it. The actual mobile answer was already on my phone, and OpenWebUI quietly took over the other half of the use case.

A defensive PR review exposed a 2-year-old WebLN provider leak in Bitcoin Connect's recommended pattern. The fix is three lines in the README. PR #385 merged 2026-05-07.

May 18, 2026

fix

Bitcoin Connect: window.webln Stays After Disconnect

A defensive PR review exposed a 2-year-old WebLN provider leak in Bitcoin Connect's recommended pattern. The fix is three lines in the README. PR #385 merged 2026-05-07.

I added a numerical output contract to my Mistral prompt and watched throughput drop in half on the same hardware. Then the naturalize step in the same pipeline run hit 31 tok/s. Live SGLang logs explain why, and what to do about it.

May 18, 2026

fixmistralpodcastsglang

EAGLE Throughput Is Content-Dependent: Same Run, 14 to 31 Tokens Per Second

I added a numerical output contract to my Mistral prompt and watched throughput drop in half on the same hardware. Then the naturalize step in the same pipeline run hit 31 tok/s. Live SGLang logs explain why, and what to do about it.

Per-block ffmpeg loudnorm averages multiple speakers to one gain, leaving the quieter voice quieter. Dynamic-mode loudnorm eats the first 3 seconds of audio.

May 18, 2026

fixttsvoxtral

Per-Segment Loudnorm and the 3-Second Lookahead Bug

Per-block ffmpeg loudnorm averages multiple speakers to one gain, leaving the quieter voice quieter. Dynamic-mode loudnorm eats the first 3 seconds of audio.

A vLLM-Qwen container ran four days clean, then froze the whole GNOME desktop the moment any GPU app opened. SGLang-Mistral never did this in days of uptime. The cause: vLLM's FlashInfer MoE throughput backend has broken SM120 kernels on the DGX Spark's SM 12.1 GPU, and on unified memory a bad kernel launch takes the display down with it. One env var fixes it.

May 18, 2026

fixdgx-sparkqwensglang

Why SGLang Never Froze My Desktop But vLLM Did: an SM 12.1 MoE-Kernel Story

A vLLM-Qwen container ran four days clean, then froze the whole GNOME desktop the moment any GPU app opened. SGLang-Mistral never did this in days of uptime. The cause: vLLM's FlashInfer MoE throughput backend has broken SM120 kernels on the DGX Spark's SM 12.1 GPU, and on unified memory a bad kernel launch takes the display down with it. One env var fixes it.

I almost published 'Mistral Small 4 scores 0/30 on coding, the quant kills it'. A competent model scoring exactly zero should have been the red flag. The benchmark harness was hanging behind this stack's Tor docker proxy and never reached the model. Here is the broken-ruler story, the direct measurement that replaced it, and every Mistral-vs-Qwen3.6 number at a glance, including which one can actually read an image.

May 18, 2026

strategyqwenmistraldgx-spark

Mistral vs Qwen3.6 on DGX Spark: the 0/30 That Was a Broken Ruler

I almost published 'Mistral Small 4 scores 0/30 on coding, the quant kills it'. A competent model scoring exactly zero should have been the red flag. The benchmark harness was hanging behind this stack's Tor docker proxy and never reached the model. Here is the broken-ruler story, the direct measurement that replaced it, and every Mistral-vs-Qwen3.6 number at a glance, including which one can actually read an image.

This blog gates every article behind one Python scorer before it publishes. I gave Qwen3.6 and Mistral Small 4 the same brief, the Start Here hub article this site still owes, and ran the raw output through that real gate with no editing. Both passed. Both invented hardware, processes, and benchmarks the scorer counted as quality. Here is the full method, the two source texts, and why a passing score is a floor and not a truth filter.

May 18, 2026

strategyqwenmistraldgx-spark

The Quality Gate That Rewards Fabrication: I Had Qwen and Mistral Write This Blog

This blog gates every article behind one Python scorer before it publishes. I gave Qwen3.6 and Mistral Small 4 the same brief, the Start Here hub article this site still owes, and ran the raw output through that real gate with no editing. Both passed. Both invented hardware, processes, and benchmarks the scorer counted as quality. Here is the full method, the two source texts, and why a passing score is a floor and not a truth filter.

The huggingface-hub CLI exits zero while leaving five out of six safetensor shards as .incomplete files. Three failure modes from the same model pull, and the wrapper that catches all of them.

May 13, 2026

fixdgx-spark

Why hf download Lies to You at 22 GB on DGX Spark

The huggingface-hub CLI exits zero while leaving five out of six safetensor shards as .incomplete files. Three failure modes from the same model pull, and the wrapper that catches all of them.

Three pipeline gaps surfaced in a single afternoon. Each was silent for weeks. The same three-part pattern fixed all of them. Concrete code, before-and-after numbers, and the discipline that keeps it from happening again.

May 11, 2026

fix

Three Self-Healing Patches in One Day, All the Same Shape

Three pipeline gaps surfaced in a single afternoon. Each was silent for weeks. The same three-part pattern fixed all of them. Concrete code, before-and-after numbers, and the discipline that keeps it from happening again.

Each number on the live Insights page has a formula, a business meaning, and a vanity-trap. If you are running a DGX Spark as the engine of a small AI service, here is how to read the dashboard daily without chasing growth-theatre, and which two metrics are the only ones worth waking up to check.

May 11, 2026

strategymcp

How to Read the Insights Dashboard for a DGX-Spark Business, Not a Hobby Blog

Each number on the live Insights page has a formula, a business meaning, and a vanity-trap. If you are running a DGX Spark as the engine of a small AI service, here is how to read the dashboard daily without chasing growth-theatre, and which two metrics are the only ones worth waking up to check.

The intro music wasn't playing for the first four seconds of every podcast episode. RMS at minus infinity. The fix was one keyword, eval=frame.

May 7, 2026

fixpodcastvoxtral

FFmpeg Volume Filter eval=frame: A 4-Second Silent Bug

The intro music wasn't playing for the first four seconds of every podcast episode. RMS at minus infinity. The fix was one keyword, eval=frame.

Rendering a 367-character podcast turn as one Voxtral call takes 21 seconds. Split into 90-character chunks: 35 seconds. Same words, same voice, 38 percent more wallclock.

May 7, 2026

strategypodcastttsvoxtral

Voxtral Chunk Strategy: 38 Percent Faster Render with Whole Turns

Rendering a 367-character podcast turn as one Voxtral call takes 21 seconds. Split into 90-character chunks: 35 seconds. Same words, same voice, 38 percent more wallclock.

How we fixed loudness pumping, markup stripping, and dialogue rhythm in a self-hosted podcast pipeline

May 3, 2026

fixpodcastvoxtral

Voxtral Podcast Audio: Mono 24 kHz Baseline and Three Compression Pitfalls

How we fixed loudness pumping, markup stripping, and dialogue rhythm in a self-hosted podcast pipeline

How a silent AttributeError nearly killed our TTS pipeline, and why three lines of code fixed it forever.

May 3, 2026

fixttsvoxtral

Voxtral-TTS Blocker on GB10: The Three-Line vllm-omni Patch

How a silent AttributeError nearly killed our TTS pipeline, and why three lines of code fixed it forever.

How a three-line Python init order bug masqueraded as a Blackwell GPU hang, and why checking raw logs beat all hardware theories.

May 3, 2026

fixmistralpodcastttsvoxtral

The 3.5-Hour Deadlock That Was Really an AttributeError

How a three-line Python init order bug masqueraded as a Blackwell GPU hang, and why checking raw logs beat all hardware theories.

Learn how a 200-line proxy fixed a strict role-alternation bug that broke Mistral Small 4 after the first few turns

Apr 27, 2026

fixmistralopenclawsglang

Fix OpenClaw + SGLang with Mistral: Stop the "conversation roles must alternate" 400 BadRequest

Learn how a 200-line proxy fixed a strict role-alternation bug that broke Mistral Small 4 after the first few turns

How a single flag killed my self-hosted TTS stack, and how I fixed it without losing a second of audio.

Apr 25, 2026

fixpodcastttsvoxtral

Voxtral Stage 1 OOM on GB10: Why --enforce-eager Is Not Enough

How a single flag killed my self-hosted TTS stack, and how I fixed it without losing a second of audio.

A senior engineer walks through the four hidden failures that made his backup system look healthy while actually failing for six weeks. Includes the exact commands, error messages, and hardware specs that turned a disaster into a reliable setup.

Apr 14, 2026

fix

How Four Silent Failures Made My Backup System a Security Theater

A senior engineer walks through the four hidden failures that made his backup system look healthy while actually failing for six weeks. Includes the exact commands, error messages, and hardware specs that turned a disaster into a reliable setup.

Learn how to install and configure Aider for reliable local LLM coding sessions on ARM64 workstations with practical troubleshooting tips.

Mar 30, 2026

fixmistralopenhandssglang

Aider Setup on DGX Spark: Mistral-via-SGLang Endpoint and Tor-Routed pip

Learn how to install and configure Aider for reliable local LLM coding sessions on ARM64 workstations with practical troubleshooting tips.

How a single SSH syntax error, misconfigured swappiness, and container limits almost took down my Sovereign AI stack, and the exact commands I used to fix them.

Mar 29, 2026

fixopenhands

Three Silent Failures That Would Have Killed My Self-Hosted AI Stack

How a single SSH syntax error, misconfigured swappiness, and container limits almost took down my Sovereign AI stack, and the exact commands I used to fix them.

Resolving Docker network isolation between cloudflared and an Astro static site container to restore Cloudflare Zero Trust tunnel functionality.

Mar 28, 2026

fix

Cloudflared in Astro's Docker Network: The Hostname-Resolution Fix

Resolving Docker network isolation between cloudflared and an Astro static site container to restore Cloudflare Zero Trust tunnel functionality.

Learn how to reclaim disk space from unused Docker images and optimize your stack by running Caddy as a systemd service instead of in Docker.

Mar 27, 2026

fixopenhands

Reclaiming 20 GB: Dead Docker Images and Why Caddy Runs Better as systemd

Learn how to reclaim disk space from unused Docker images and optimize your stack by running Caddy as a systemd service instead of in Docker.

Resolve Docker networking failures where containers can't resolve names or access volumes, with a single `.gitconfig` tweak that fixes both issues.

Mar 26, 2026

fixgiteaopenhands

OpenHands and Gitea Integration: Docker-Network Hostname Fix

Resolve Docker networking failures where containers can't resolve names or access volumes, with a single `.gitconfig` tweak that fixes both issues.

OpenHands crashes after 10 minutes with a BadRequestError. Here’s exactly how to fix the alternating roles bug in Mistral Small 4 and why the default config is broken.

Mar 25, 2026

fixmistralopenhandssglang

Fix: OpenHands BadRequestError: Mistral Alternating Roles

OpenHands crashes after 10 minutes with a BadRequestError. Here’s exactly how to fix the alternating roles bug in Mistral Small 4 and why the default config is broken.

Learn how to diagnose and resolve Docker port conflicts with practical troubleshooting steps and configuration fixes.

Mar 24, 2026

fixsglang

OpenWebUI Port Conflict on DGX Spark: Why 8080 Was Already Taken

Learn how to diagnose and resolve Docker port conflicts with practical troubleshooting steps and configuration fixes.

Three separate 400 Bad Request causes in Mistral Vibe with SGLang, their root causes, and update-safe fixes

Mar 23, 2026

fixmistralsglangvibe

Vibe 400 Bad Request Fix: Mistral Alternating Roles and reasoning_effort

Three separate 400 Bad Request causes in Mistral Vibe with SGLang, their root causes, and update-safe fixes

How strict workflow rules and tool constraints prevent AI agents from destroying your codebase during file edits.

Mar 22, 2026

fixgiteamcpmistralvibe

Vibe write_file Overwrite Bug: When Edits Silently Replace Whole Files

How strict workflow rules and tool constraints prevent AI agents from destroying your codebase during file edits.

How I wasted three days debugging SIGKILL 137 after every SGLang restart, until I learned that GPU memory isn’t freed instantly and Docker’s `--rm` and `--restart` hate each other.

Mar 21, 2026

fixmistralsglang

SGLang Restart OOM Fix: Unified Memory Cleanup on GB10/DGX Spark

How I wasted three days debugging SIGKILL 137 after every SGLang restart, until I learned that GPU memory isn’t freed instantly and Docker’s `--rm` and `--restart` hate each other.

How we got Mistral Small 4 119B inference working on NVIDIA DGX Spark's ARM64 GB10 chip with SGLang, including backend selection, speculative decoding, and Vibe CLI optimizations.

Mar 20, 2026

fixmistralsglangvibe

SGLang on DGX Spark: 35-41 tok/s with EAGLE Speculative Decoding

How we got Mistral Small 4 119B inference working on NVIDIA DGX Spark's ARM64 GB10 chip with SGLang, including backend selection, speculative decoding, and Vibe CLI optimizations.

Async event loop blocking, N+1 Docker calls, systemd ProtectSystem conflicts, and stacking frontend polling: four independent bugs in one FastAPI app, all invisible at idle.

Mar 19, 2026

fix

Four Bugs That Only Showed Up Under Load: Fixing a FastAPI Dashboard

Async event loop blocking, N+1 Docker calls, systemd ProtectSystem conflicts, and stacking frontend polling: four independent bugs in one FastAPI app, all invisible at idle.

Nostr Scheduling: Homemade vs. nostr-emanator, A Comparison

The ArticleZapStats Redesign: Real Readers, Git History, and the Cost of Asking a Local LLM to Care

The GitHub Bot That Cannot Write

vps-healthcheck: Twelve Daily Checks, One SSH Session, One Notification

The Leaderboard Said 239 Tokens a Second. My DGX Spark Said 71.

The /data/ Convention Trap: Ubuntu-LVM Lessons That Bit Me Twice

watchdocker: A Bash-Native Successor To Watchtower, Honestly Compared

How This Blog Actually Gets Built: The Full Build, Ten Weeks of Iteration, Three Hard Gates

Gitea as Source-of-Truth for AI Pipelines

I Built a Web UI for Mobile Coding. Termux Won Anyway.

Bitcoin Connect: window.webln Stays After Disconnect

EAGLE Throughput Is Content-Dependent: Same Run, 14 to 31 Tokens Per Second

Per-Segment Loudnorm and the 3-Second Lookahead Bug

Why SGLang Never Froze My Desktop But vLLM Did: an SM 12.1 MoE-Kernel Story

Mistral vs Qwen3.6 on DGX Spark: the 0/30 That Was a Broken Ruler

The Quality Gate That Rewards Fabrication: I Had Qwen and Mistral Write This Blog

Why hf download Lies to You at 22 GB on DGX Spark

Three Self-Healing Patches in One Day, All the Same Shape

How to Read the Insights Dashboard for a DGX-Spark Business, Not a Hobby Blog

FFmpeg Volume Filter eval=frame: A 4-Second Silent Bug

Voxtral Chunk Strategy: 38 Percent Faster Render with Whole Turns

Voxtral Podcast Audio: Mono 24 kHz Baseline and Three Compression Pitfalls

Voxtral-TTS Blocker on GB10: The Three-Line vllm-omni Patch

The 3.5-Hour Deadlock That Was Really an AttributeError

Fix OpenClaw + SGLang with Mistral: Stop the "conversation roles must alternate" 400 BadRequest

Voxtral Stage 1 OOM on GB10: Why --enforce-eager Is Not Enough

How Four Silent Failures Made My Backup System a Security Theater

Aider Setup on DGX Spark: Mistral-via-SGLang Endpoint and Tor-Routed pip

Three Silent Failures That Would Have Killed My Self-Hosted AI Stack

Cloudflared in Astro's Docker Network: The Hostname-Resolution Fix

Reclaiming 20 GB: Dead Docker Images and Why Caddy Runs Better as systemd

OpenHands and Gitea Integration: Docker-Network Hostname Fix

Fix: OpenHands BadRequestError: Mistral Alternating Roles

OpenWebUI Port Conflict on DGX Spark: Why 8080 Was Already Taken

Vibe 400 Bad Request Fix: Mistral Alternating Roles and reasoning_effort

Vibe write_file Overwrite Bug: When Edits Silently Replace Whole Files

SGLang Restart OOM Fix: Unified Memory Cleanup on GB10/DGX Spark

SGLang on DGX Spark: 35-41 tok/s with EAGLE Speculative Decoding

Four Bugs That Only Showed Up Under Load: Fixing a FastAPI Dashboard