// hardware + software inventory

The Stack

What actually runs, where it runs, and what it talks to. Snapshot date is printed below; check the engineering log for in-flight changes.

Snapshot: 2026-05-27

Hardware

Primary host
NVIDIA DGX Spark: GB10 Blackwell, 128 GB LPDDR5x unified memory, 4 TB NVMe, ARM v9.2-A. In daily operation since 2026-04.
Public VPS
FlokiNET no-KYC Romania VPS: Debian 13, 1.9 GiB RAM, 50 GB disk, Docker CE + Caddy. Serves sovgrid.org and mcp.sovgrid.org.
Lightning node
Raspberry Pi 4 (4 GB): Alby Hub + LNbits for V4V zaps.
Mobile companion
Pixel Android: Termux + Tailscale + SSH-on-2222 into Spark, running opencode in tmux.

Inference

Text primary
Qwen3.6-35B-A3B PrismaQuant 4.75-bit via vLLM on port 30001: 57 to 62 tok/s decode with DFlash speculative decoding (verified 2026-05-22), FlashInfer MoE latency backend, no thinking traces.
Text fallback + Vision
Mistral-Small-4 NVFP4 119B via SGLang on port 30000: 36.5 tok/s decode with safer-eagle speculative decoding (EAGLE confirmed stable in the 2026-05-22 switch.sh cleanup session; previously 29 tok/s no-EAGLE baseline), mem-fraction 0.65, ctx 32k (see Mistral / Qwen / GLM-5 comparison).
Mutex
A switch.sh CLI at /data/scripts/llm/switch.sh flips between Qwen on vLLM and Mistral on SGLang, enforcing that at most one inference engine has loaded weights into unified memory at a time. Termux-friendly, sub-second status check. Replaces the older dashboard-mediated mutex.
Image generation
FLUX.1-schnell on ComfyUI (SparkyUI image, CUDA 13 + SM121A SageAttention). Sub-second per image on the Spark; 32-hero blog batches render in ~5 minutes. Sequential with the LLM stack via the switch.sh mutex: one or the other.
TTS
Voxtral-4B (text-only fork) via transformers: V6 spot-listen showed a ceiling on long-form expressivity, VibeVoice / Higgs Audio v2 / IndexTTS-2 spike in progress.

Agents & clients

Local CLI
opencode 0.8.12 against Qwen3.6 (no strict-alternation issues), Aider with Misti-Dev persona, vibe with the alternating-roles patch.
UI
OpenWebUI behind Tailscale + Caddy mkcert: custom model "Sovereign Qwen" with system prompt + sovereign-kb RAG + SearXNG web search + mcpo bridge to sovereign-mcp.
Mobile
Termux + SSH into a tmux session running opencode against Qwen3.6. The browser-based opencode-web behind Caddy on the tailnet is the secondary path, not the daily driver.
MCP
Claude Code targets sovgrid via Streamable-HTTP (mcp.sovgrid.org). Vibe targets sovereign, gitea, monitoring, knowledge. OpenClaw targets knowledge + sovereign.

Publishing & operations

Blog
Astro 6 static site, no database, full rebuild under 3 s. Built locally, rsynced to the Floki VPS, served by Caddy. Source mirrored on Gitea (canonical) and GitHub (public). 120 articles live on 2026-05-27.
Article pipeline
Local LLM drafts articles from engineering notes; scripts/update_blog_from_gitea.py applies quality-signal scoring + style gates; FLUX.1-schnell renders the matching hero image on the same Spark; scripts/blog-deploy-verify.sh orchestrates build, rsync, drift-commit, axe a11y, and NSM refresh. Zero cloud cost per article once the infrastructure runs.
Dashboard
FastAPI + React single-file SecOps dashboard on the Spark. Service start/stop, AIDE alerts, backup status, MCP restart. Reachable over Tor (.onion) and Tailscale; not exposed on the public internet.
Payments
Lightning over the Alby Hub on the Pi 4 (V4V model: no accounts, no platform cut, no subscriptions). Lightning Address in every page footer. IBAN on every invoice. No Stripe, no PayPal, no third-party processor in the path.
Web research backend
SearXNG on 127.0.0.1:8888 powers web search for the article pipeline and the "Sovereign Qwen" custom model in OpenWebUI. No query logging to third parties; meta-search results are aggregated locally.

Storage & secrets

Layout
/ai/models (LLM weights, HF cache), /data/config, /data/scripts, /data/projects, /data/secrets (chmod 700).
Backups
age-encrypted tarball, dual-target: local /data/backups + USB. Nightly Floki snapshot pull (NSM stats + Caddy logs) feeds the same backup.
Code
Local Gitea on 127.0.0.1:3002 (loopback only). Two public mirrors on github.com/cipherfoxie: sovereign-mcp (MCP server, MIT) and vps-healthcheck (ops audit, MIT). Per agent-mirroring policy: public-facing tooling only; internal persona-MCPs stay in Gitea.

Public edge

Domain
sovgrid.org via FlokiNET-friendly registrar, points at the Floki VPS.
Reverse proxy
Caddy with Let's Encrypt: both sovgrid.org and mcp.sovgrid.org, HTTP/2 only (no QUIC over DERP).
Hardening
UFW default-deny + fail2ban (sshd, caddy-mcp jails). SSH key-only, AllowUsers cipherfox, MaxAuthTries 3. unattended-upgrades active on Debian-Security origin. Caddy edge-block on confirmed-scraper IP ranges (currently 69.12.0.0/16), audit-trail kept in the NSM aggregator.
Analytics
NSM (North Star Metric) aggregator scans raw Caddy logs into /insights/: 30-day external tool-calls, distinct agents (UA + /24 deduped), self-traffic filtered out by CIDR, and distributed-scraper clusters filtered out by UA-over-many-/24s signature. No JS pixel, no cookies, no third-party tracker.
Daily audit
floki-healthcheck.sh runs 07:30 Berlin from Spark via SSH: 12 checks, single Matrix push, JSON sidecar at /api/floki-health.json.