#comparison | Sovereign AI Blog

Same model, same box, three ways to shrink it: Intel's AutoRound int4, a 4.75-bit PrismaQuant, and FP8. I measured all three on decode speed, coding accuracy, and vision, with one ruler per axis and the failed runs thrown out. AutoRound won every column that mattered, and the surprise was vision: the leanest build kept its eyes while the others went blind or broke. Here is the teardown.

Jun 12, 2026

Three Quants of One 35B Qwen on a DGX Spark. The Fastest Build Was the Only One That Could Still See.

Same model, same box, three ways to shrink it: Intel's AutoRound int4, a 4.75-bit PrismaQuant, and FP8. I measured all three on decode speed, coding accuracy, and vision, with one ruler per axis and the failed runs thrown out. AutoRound won every column that mattered, and the surprise was vision: the leanest build kept its eyes while the others went blind or broke. Here is the teardown.

An honest capability matrix between cloud Claude and a self-hosted GB10 stack across 13 tasks, plus the entry-points into the deeper-dive articles. Claude still leads on multi-step reasoning; the local stack now covers two things Claude cannot do at all.

May 27, 2026

dgx-sparksovereign-ai

Cloud vs Local AI: Where Each Actually Wins in 2026

An honest capability matrix between cloud Claude and a self-hosted GB10 stack across 13 tasks, plus the entry-points into the deeper-dive articles. Claude still leads on multi-step reasoning; the local stack now covers two things Claude cannot do at all.

Amortized hardware, power-by-jurisdiction, opportunity cost, and the value of privacy, modelled at 10/100/1000/10000 calls per day. Break-even sits between 700 and 1,200 calls per day depending on the cloud tier you actually need, but the inputs that move the line are not the ones the listicles emphasize.

May 25, 2026

funneldgx-spark

Self-Hosted AI vs Cloud APIs: The Real Total Cost

Amortized hardware, power-by-jurisdiction, opportunity cost, and the value of privacy, modelled at 10/100/1000/10000 calls per day. Break-even sits between 700 and 1,200 calls per day depending on the cloud tier you actually need, but the inputs that move the line are not the ones the listicles emphasize.

Three honest paths at €15k for the one-person consultancy or small studio that has outgrown a single box: dual RTX 5090 on a Threadripper Pro workstation, DGX Spark plus a dedicated inference second box, or a refurbished pro-workstation route. Current Geizhals prices, UPS sizing, and the cases where this tier is genuinely the floor.

May 24, 2026

hardwaredgx-sparkservicesbudget-build

What I'd Buy in 2026 for €15,000: A Pro-Studio Sovereign AI Build

Three honest paths at €15k for the one-person consultancy or small studio that has outgrown a single box: dual RTX 5090 on a Threadripper Pro workstation, DGX Spark plus a dedicated inference second box, or a refurbished pro-workstation route. Current Geizhals prices, UPS sizing, and the cases where this tier is genuinely the floor.

A used RTX 3090 plus a current AM5 platform gets you a real local-inference box for under €2k in 2026. Component picks with current Geizhals prices, honest power-cost math for Germany, the US, and India, and a list of models this build runs well and the ones it does not.

May 24, 2026

affiliatehardwarebudget-build

What I'd Buy in 2026 for €2,000: A Beginner Sovereign AI Build

A used RTX 3090 plus a current AM5 platform gets you a real local-inference box for under €2k in 2026. Component picks with current Geizhals prices, honest power-cost math for Germany, the US, and India, and a list of models this build runs well and the ones it does not.

Two honest €4k paths: a new RTX 4090 24 GB on AM5, or a used RTX A6000 48 GB on a Threadripper-class platform. Component picks with current Geizhals prices, the workload that breaks each path, and a side-by-side with DGX Spark at the same money.

May 24, 2026

affiliatehardwarebudget-build

What I'd Buy in 2026 for €4,000: A Mid-Tier Sovereign AI Build

Two honest €4k paths: a new RTX 4090 24 GB on AM5, or a used RTX A6000 48 GB on a Threadripper-class platform. Component picks with current Geizhals prices, the workload that breaks each path, and a side-by-side with DGX Spark at the same money.

At €8k the binding question stops being VRAM ceiling and becomes architecture choice. A DGX Spark plus accessories on one side, an RTX 5090 32 GB workstation on the other. I run the Spark; here is the comparison from the inside, with current Geizhals prices captured 2026-05-22.

May 24, 2026

affiliatehardwaredgx-sparkbudget-build

What I'd Buy in 2026 for €8,000: A Premium Sovereign AI Build

At €8k the binding question stops being VRAM ceiling and becomes architecture choice. A DGX Spark plus accessories on one side, an RTX 5090 32 GB workstation on the other. I run the Spark; here is the comparison from the inside, with current Geizhals prices captured 2026-05-22.

The Spark wins on MoE-class language models and the developer-tooling pipeline. The Mac Studio wins on silence, daily-driver ergonomics, and memory ceiling (up to 512 GB on M3 Ultra). The choice depends on which column is binding for your workload.

May 23, 2026

dgx-sparkhardware

DGX Spark vs Apple Mac Studio: Which Wins for Local LLMs?

The Spark wins on MoE-class language models and the developer-tooling pipeline. The Mac Studio wins on silence, daily-driver ergonomics, and memory ceiling (up to 512 GB on M3 Ultra). The choice depends on which column is binding for your workload.

Three production-class open-weights models, all weighed against one Spark. Qwen wins on coding throughput and now sustains around 71 tok/s under DFlash. Mistral holds the creative-prose and verified-vision slot as a safer fallback. GLM-5.1 at 754B does not fit and the reason it does not fit is the most useful lesson in this comparison.

May 23, 2026

qwenmistraldgx-spark

Mistral Small 4 vs Qwen 3.6 vs GLM-5.1 on a Single DGX Spark

Three production-class open-weights models, all weighed against one Spark. Qwen wins on coding throughput and now sustains around 71 tok/s under DFlash. Mistral holds the creative-prose and verified-vision slot as a safer fallback. GLM-5.1 at 754B does not fit and the reason it does not fit is the most useful lesson in this comparison.

NVIDIA's published reference playbooks are excellent for the workflows they cover and quietly misleading for the workflows they do not. Three categories of help, three categories of trap, and the rule for telling them apart before you copy a configuration into production.

May 22, 2026

hardwareauthority

NVIDIA Playbooks: Where They Help and Where They Don't

NVIDIA's published reference playbooks are excellent for the workflows they cover and quietly misleading for the workflows they do not. Three categories of help, three categories of trap, and the rule for telling them apart before you copy a configuration into production.

Four assistants still on the table in 2026 plus one I uninstalled. Claude Code wins on raw capability, Aider wins on git discipline, opencode is now the local primary against Qwen 3.6, OpenClaw stays as the Mistral specialty. Vibe is in the postmortem column.

May 22, 2026

opencodeopenclaw

Coding Assistants on a Sovereign Stack: Claude Code, opencode, Aider, OpenClaw (and why Vibe got retired)

Four assistants still on the table in 2026 plus one I uninstalled. Claude Code wins on raw capability, Aider wins on git discipline, opencode is now the local primary against Qwen 3.6, OpenClaw stays as the Mistral specialty. Vibe is in the postmortem column.

Tailscale is the right pick if your sovereignty budget is finite and the rented coordination server is an acceptable trade. Headscale is the right pick if the coordination server's vendor risk is the dimension you cannot accept. Both ship the same WireGuard underneath.

May 21, 2026

ops

Tailscale vs Headscale for Multi-Box Sovereign Stacks

Tailscale is the right pick if your sovereignty budget is finite and the rented coordination server is an acceptable trade. Headscale is the right pick if the coordination server's vendor risk is the dimension you cannot accept. Both ship the same WireGuard underneath.

I Built OpenAI's gpt-oss-120b on a Single DGX Spark. My 35B Qwen Out-Coded It.

Three Quants of One 35B Qwen on a DGX Spark. The Fastest Build Was the Only One That Could Still See.

Cloud vs Local AI: Where Each Actually Wins in 2026

Self-Hosted AI vs Cloud APIs: The Real Total Cost

What I'd Buy in 2026 for €15,000: A Pro-Studio Sovereign AI Build

What I'd Buy in 2026 for €2,000: A Beginner Sovereign AI Build

What I'd Buy in 2026 for €4,000: A Mid-Tier Sovereign AI Build

What I'd Buy in 2026 for €8,000: A Premium Sovereign AI Build

DGX Spark vs Apple Mac Studio: Which Wins for Local LLMs?

Mistral Small 4 vs Qwen 3.6 vs GLM-5.1 on a Single DGX Spark

NVIDIA Playbooks: Where They Help and Where They Don't

Coding Assistants on a Sovereign Stack: Claude Code, opencode, Aider, OpenClaw (and why Vibe got retired)

Tailscale vs Headscale for Multi-Box Sovereign Stacks