All articles tagged "engineering-honesty" : self-hosted AI fixes, setups, and architecture notes.
Standing up two large models on a DGX Spark, my own measurements tried to deceive me three separate ways: a harness that scored a working model at zero, a one-shot test that framed the model for a bug that was mine, and a cold reading that undersold decode speed by 35 percent. None of the wrong numbers were random. Each had a cause, a tell, and a fix. Here is the field guide.
Read article →
I took the published Spark-Arena recipes for Qwen3.6-35B on a DGX Spark, ran them on my own box, and almost none of the headline throughput reproduced. Here is what the numbers actually mean once you control for the container image, the measurement harness, and the prompt, plus a capability that was hiding inside one of the quants.
Giving a local 8B model persistent memory and retrieval good enough to replace a cloud assistant for daily coding. The architecture is mem0 plus a RAG knowledge base over ChromaDB. The honest part is the two bugs that made the first version forget you and answer the wrong question with full confidence.
The default AIDE configuration on Debian and Ubuntu selects the entire root filesystem, which means your tripwire is checksumming your home directory, your models, and your downloaded films every night. Here is how I caught it on a friend's machine and the scope file that fixed it.
Honest minute-by-minute log of building a friend's sovereign-AI workstation from a stock Lenovo with Windows to a fully self-hosted KI-stack with custom dashboard, MCP-routed RAG, and bidirectional cross-tailnet sharing. With the mistakes.
I had a 600-line dashboard that worked technically and went unopened socially. Rebuilding it as a teaching surface changed everything. This post is the design pattern: info-buttons on every metric, persona-cross-references on every model, a glossary tab that explains every acronym, and a doctor tab with one-button fixes. Sample backend and frontend code.
I copied my DGX Spark /data/ convention to a standard Ubuntu laptop. Three weeks later I forgot Docker exists. Root partition filled to 96 percent. Here is the diagnosis, the surgery, and the rule I should have followed.
Most sovereign-AI guides assume the operator is the same person as the user. What changes when the operator is your friend who has zero Linux experience? The discipline is identity separation at every layer, default-local privacy, and a vibe-sustaining onboarding pattern that survives day three.
A May 2026 memo of mine said local 8B models cannot reliably do MCP tool-use. I retested in late May. The memo was specifically wrong about WHY. Direct OpenAI-format API calls work fine. The bridge layer was the broken part.