#sglang | Sovereign AI Blog

I added a numerical output contract to my Mistral prompt and watched throughput drop in half on the same hardware. Then the naturalize step in the same pipeline run hit 31 tok/s. Live SGLang logs explain why, and what to do about it.

May 18, 2026

EAGLE Throughput Is Content-Dependent: Same Run, 14 to 31 Tokens Per Second

I added a numerical output contract to my Mistral prompt and watched throughput drop in half on the same hardware. Then the naturalize step in the same pipeline run hit 31 tok/s. Live SGLang logs explain why, and what to do about it.

A vLLM-Qwen container ran four days clean, then froze the whole GNOME desktop the moment any GPU app opened. SGLang-Mistral never did this in days of uptime. The cause: vLLM's FlashInfer MoE throughput backend has broken SM120 kernels on the DGX Spark's SM 12.1 GPU, and on unified memory a bad kernel launch takes the display down with it. One env var fixes it.

May 18, 2026

fixdgx-sparkqwendevops

Why SGLang Never Froze My Desktop But vLLM Did: an SM 12.1 MoE-Kernel Story

A vLLM-Qwen container ran four days clean, then froze the whole GNOME desktop the moment any GPU app opened. SGLang-Mistral never did this in days of uptime. The cause: vLLM's FlashInfer MoE throughput backend has broken SM120 kernels on the DGX Spark's SM 12.1 GPU, and on unified memory a bad kernel launch takes the display down with it. One env var fixes it.

A hands-on guide to installing and configuring OpenClaw on NVIDIA DGX Spark, switching between cloud and local models, and wiring MCP servers.

May 3, 2026

setupmcpmistralopenclaw

OpenClaw Setup on DGX Spark for Sovereign AI Agents

A hands-on guide to installing and configuring OpenClaw on NVIDIA DGX Spark, switching between cloud and local models, and wiring MCP servers.

Learn how to run a self-hosted MCP server for your blog’s knowledge base, integrate it with OpenClaw and Vibe, and avoid the pitfalls I hit while migrating from cloud to Sovereign AI.

May 3, 2026

setupmcpopenclawvibe

Sovereign MCP Server: Local Setup, Integration, and Hard Lessons

Learn how to run a self-hosted MCP server for your blog’s knowledge base, integrate it with OpenClaw and Vibe, and avoid the pitfalls I hit while migrating from cloud to Sovereign AI.

How we’re getting the Sovereign AI MCP endpoint listed in five registries with real traffic tracking and zero KYC friction.

May 3, 2026

strategymcp

MCP Registry Distribution: Submission Plan & Tracking

How we’re getting the Sovereign AI MCP endpoint listed in five registries with real traffic tracking and zero KYC friction.

Learn how a 200-line proxy fixed a strict role-alternation bug that broke Mistral Small 4 after the first few turns

Apr 27, 2026

fixdevopsmistralopenclaw

Fix OpenClaw + SGLang with Mistral: Stop the "conversation roles must alternate" 400 BadRequest

Learn how a 200-line proxy fixed a strict role-alternation bug that broke Mistral Small 4 after the first few turns

This technical blog maintains a single source of truth while layering machine-readable tools on top, ensuring both human readers and AI agents get accurate, up-to-date information.

Apr 27, 2026

strategymcp

A Self-Hosted AI Blog That Serves Both Humans and Machines

This technical blog maintains a single source of truth while layering machine-readable tools on top, ensuring both human readers and AI agents get accurate, up-to-date information.

Learn how to transform your technical blog into a dual-purpose knowledge base that serves both human readers and AI agents while future-proofing your content strategy.

Apr 26, 2026

strategymcpmistral

From Blog to Agent Tools: How One Knowledge Base Powers Both Humans and AI

Learn how to transform your technical blog into a dual-purpose knowledge base that serves both human readers and AI agents while future-proofing your content strategy.

Run Mistral Small 4 119B on NVIDIA GB10 with SGLang nightly: exact flags, real benchmarks, every gotcha that costs a day

Apr 19, 2026

setupmistral

Self-Host Mistral Small 4 with SGLang on NVIDIA DGX Spark (GB10): What Actually Works

Run Mistral Small 4 119B on NVIDIA GB10 with SGLang nightly: exact flags, real benchmarks, every gotcha that costs a day

Optimized workflow for running FLUX.1-schnell and Mistral sequentially on NVIDIA DGX Spark with 128GB unified memory

Apr 18, 2026

setupcomfyuifluxmistral

ComfyUI plus FLUX.1-schnell on DGX Spark: Per-Style Visual Vocabularies

Optimized workflow for running FLUX.1-schnell and Mistral sequentially on NVIDIA DGX Spark with 128GB unified memory

Lessons learned from a failed LLM self-review experiment that broke our validation pipeline and how we fixed it with deterministic checks.

Apr 16, 2026

setupmistral

Self-Hosted AI Content Pipeline: What Works and What Doesn’t

Lessons learned from a failed LLM self-review experiment that broke our validation pipeline and how we fixed it with deterministic checks.

Deploy a privacy-respecting AI coding assistant with Mistral Small 4 and SearXNG using Docker on ARM64 hardware.

Apr 12, 2026

setupmistralopenclawopenhands

OpenHands Setup with Mistral-via-SGLang: The Multi-Arch Container Recipe

Deploy a privacy-respecting AI coding assistant with Mistral Small 4 and SearXNG using Docker on ARM64 hardware.

A hardened local AI development stack using OpenHands, Aider, and Gitea over Tor with Mistral Small 4 inference

Apr 5, 2026

servicesgiteamistralopenhands

Privacy-Hardened AI Stack: OpenHands, Aider, and Gitea over Tor

A hardened local AI development stack using OpenHands, Aider, and Gitea over Tor with Mistral Small 4 inference

How to run OpenHands and Aider locally with Mistral Small 4 and Qwen3 Coder Next for reliable, private AI-assisted development.

Apr 4, 2026

servicesmistralopenhands

SOVEREIGN DEV STUDIO v2: Self-Hosted AI Coding Agents That Actually Work

How to run OpenHands and Aider locally with Mistral Small 4 and Qwen3 Coder Next for reliable, private AI-assisted development.

A practical guide to configuring a secure, self-hosted Docker development stack with OpenHands, Gitea, and model caching for Sovereign AI.

Apr 1, 2026

servicesmistralopenhands

Docker Dev Stack on DGX Spark: Compose Patterns for Sovereign AI

A practical guide to configuring a secure, self-hosted Docker development stack with OpenHands, Gitea, and model caching for Sovereign AI.

Learn how to install and configure Aider for reliable local LLM coding sessions on ARM64 workstations with practical troubleshooting tips.

Mar 30, 2026

fixdevopsmistralopenhands

Aider Setup on DGX Spark: Mistral-via-SGLang Endpoint and Tor-Routed pip

Learn how to install and configure Aider for reliable local LLM coding sessions on ARM64 workstations with practical troubleshooting tips.

OpenHands crashes after 10 minutes with a BadRequestError. Here’s exactly how to fix the alternating roles bug in Mistral Small 4 and why the default config is broken.

Mar 25, 2026

fixdevopsmistralopenhands

Fix: OpenHands BadRequestError: Mistral Alternating Roles

OpenHands crashes after 10 minutes with a BadRequestError. Here’s exactly how to fix the alternating roles bug in Mistral Small 4 and why the default config is broken.

Learn how to diagnose and resolve Docker port conflicts with practical troubleshooting steps and configuration fixes.

Mar 24, 2026

fixdevops

OpenWebUI Port Conflict on DGX Spark: Why 8080 Was Already Taken

Learn how to diagnose and resolve Docker port conflicts with practical troubleshooting steps and configuration fixes.

Three separate 400 Bad Request causes in Mistral Vibe with SGLang, their root causes, and update-safe fixes

Mar 23, 2026

fixdevopsmistralvibe

Vibe 400 Bad Request Fix: Mistral Alternating Roles and reasoning_effort

Three separate 400 Bad Request causes in Mistral Vibe with SGLang, their root causes, and update-safe fixes

How I wasted three days debugging SIGKILL 137 after every SGLang restart, until I learned that GPU memory isn’t freed instantly and Docker’s `--rm` and `--restart` hate each other.

Mar 21, 2026

fixdevopsmistral

SGLang Restart OOM Fix: Unified Memory Cleanup on GB10/DGX Spark

How I wasted three days debugging SIGKILL 137 after every SGLang restart, until I learned that GPU memory isn’t freed instantly and Docker’s `--rm` and `--restart` hate each other.

How we got Mistral Small 4 119B inference working on NVIDIA DGX Spark's ARM64 GB10 chip with SGLang, including backend selection, speculative decoding, and Vibe CLI optimizations.

Mar 20, 2026

fixdevopsmistralvibe

SGLang on DGX Spark: 35-41 tok/s with EAGLE Speculative Decoding

How we got Mistral Small 4 119B inference working on NVIDIA DGX Spark's ARM64 GB10 chip with SGLang, including backend selection, speculative decoding, and Vibe CLI optimizations.

Two Days From Localhost to Production: Building a Hybrid Sovereign AI Site

EAGLE Throughput Is Content-Dependent: Same Run, 14 to 31 Tokens Per Second

Why SGLang Never Froze My Desktop But vLLM Did: an SM 12.1 MoE-Kernel Story

OpenClaw Setup on DGX Spark for Sovereign AI Agents

Sovereign MCP Server: Local Setup, Integration, and Hard Lessons

MCP Registry Distribution: Submission Plan & Tracking

Fix OpenClaw + SGLang with Mistral: Stop the "conversation roles must alternate" 400 BadRequest

A Self-Hosted AI Blog That Serves Both Humans and Machines

From Blog to Agent Tools: How One Knowledge Base Powers Both Humans and AI

Self-Host Mistral Small 4 with SGLang on NVIDIA DGX Spark (GB10): What Actually Works

ComfyUI plus FLUX.1-schnell on DGX Spark: Per-Style Visual Vocabularies

Self-Hosted AI Content Pipeline: What Works and What Doesn’t

OpenHands Setup with Mistral-via-SGLang: The Multi-Arch Container Recipe

Privacy-Hardened AI Stack: OpenHands, Aider, and Gitea over Tor

SOVEREIGN DEV STUDIO v2: Self-Hosted AI Coding Agents That Actually Work

Docker Dev Stack on DGX Spark: Compose Patterns for Sovereign AI

Aider Setup on DGX Spark: Mistral-via-SGLang Endpoint and Tor-Routed pip

Fix: OpenHands BadRequestError: Mistral Alternating Roles

OpenWebUI Port Conflict on DGX Spark: Why 8080 Was Already Taken

Vibe 400 Bad Request Fix: Mistral Alternating Roles and reasoning_effort

SGLang Restart OOM Fix: Unified Memory Cleanup on GB10/DGX Spark

SGLang on DGX Spark: 35-41 tok/s with EAGLE Speculative Decoding