// hardware + software inventory

The Stack

What actually runs, where it runs, and what it talks to. Snapshot date is printed below; check the engineering log for in-flight changes.

Snapshot: 2026-06-19

Hardware

Primary host: NVIDIA DGX Spark: GB10 Blackwell, 128 GB LPDDR5x unified memory, 4 TB NVMe, ARM v9.2-A. In daily operation since 2026-04. The buy-or-skip reasoning is in Should You Buy a DGX Spark in 2026.
Public VPS: FlokiNET no-KYC EU VPS: Debian 13, 1.9 GiB RAM, 50 GB disk, Docker CE + Caddy. Serves sovgrid.org and mcp.sovgrid.org (Floki VPS setup).
Lightning node: Raspberry Pi 4 (4 GB): Alby Hub + LNbits for V4V zaps (self-hosted Lightning operator's guide).
Mobile companion: Pixel Android: Termux + Tailscale + SSH-on-2222 into Spark, running opencode in tmux (mobile terminal setup).

Inference

Text primary: Qwen3.6-35B-A3B AutoRound int4-mixed via vLLM on port 30001: about 69 tok/s single-stream decode with DFlash speculative decoding (k=3), measured prefill-separated (median of 3, temperature 0). The -mixed build keeps the sensitive MoE gate layers at 16-bit; it lands +12.7% decode over the previous PrismaQuant 4.75-bit build with no measurable quality loss (18/18 on the agent-bench gate). FlashInfer MoE latency backend, no thinking traces. Production since 2026-06-11; see the quant duel write-up. Serves vision in production too: the AutoRound weights keep the full vision tower, so image-reading runs on this same endpoint.
Coding / agentic: GLM-4.7-Flash (30B-A3B MoE, compressed-tensors W4A16, MLA attention) via vLLM on port 30002: about 53.7 tok/s single-stream with MTP speculative decoding. The cyankiwi AWQ-named build is actually compressed-tensors and needs an MLA kv_b_proj.weight guard (a three-line patch completing PR #34695) to run on sm_121 at all.
Reasoning / math: Gemma-4-31B NVFP4 (text-only, modelopt) via vLLM on port 30003: dense, so memory-bandwidth-bound at about 7 tok/s single-stream. vLLM auto-forces the TRITON_ATTN backend for its heterogeneous head dims. A 26B-A4B MoE variant in FP8 is the faster alternative under evaluation.
Mutex: A switch-llm.sh CLI at /data/scripts/llm/switch-llm.sh arbitrates which single engine holds the GB10 (qwen|glm|gemma|none), each at gpu-memory-utilization 0.50 (0.80 OOMs the desktop, because that fraction is of the total unified memory the OS shares). It stops the peer engine's container and its systemd service, so nothing resurrects mid-switch. Termux-friendly, sub-second status.
Image generation: FLUX.1-schnell on ComfyUI (SparkyUI image, CUDA 13 + SM121A SageAttention). Sub-second per image on the Spark; 32-hero blog batches render in ~5 minutes. Runs sequentially via the switch-llm.sh mutex, or beside the resident LLM when unified memory has headroom (the LLM is OOM-protected, ComfyUI is the sacrifice). Setup: ComfyUI + FLUX.
TTS: Qwen3-TTS-1.7B-CustomVoice loads beside the resident LLM in unified memory (no mutex needed, about 3.4 GB). It replaced the retired Voxtral after the V6 spot-listen showed a ceiling on long-form expressivity (the Voxtral ceiling and the pivot).

Agents & clients

Local CLI: opencode 0.8.12 against the resident vLLM model as the primary coder, with goose (Block, Apache-2.0) as the backup; both OpenAI-compatible, no strict-alternation issues. The old Mistral-tied vibe CLI was retired. Tradeoffs: coding-agent comparison.
UI: OpenWebUI behind Tailscale + Caddy mkcert: custom model "Sovereign Qwen" with system prompt + sovereign-kb RAG + SearXNG web search + mcpo bridge to sovereign-mcp.
Mobile: Termux + SSH into a tmux session running opencode against Qwen3.6. The browser-based opencode-web behind Caddy on the tailnet is the secondary path, not the daily driver.
MCP: Claude Code targets sovgrid via Streamable-HTTP (mcp.sovgrid.org). opencode, goose and OpenClaw target the local knowledge + sovereign MCP servers. When the MCP is worth installing: the honest MVP.

Publishing & operations

Blog: Astro 6 static site, no database, full rebuild under 3 s. Built locally, rsynced to the Floki VPS, served by Caddy. Source mirrored on Gitea (canonical) and GitHub (public). Around 155 articles live as of June 2026. Stack writeup: Astro 6 + Caddy.
Article pipeline: Local LLM drafts articles from engineering notes; scripts/update_blog_from_gitea.py applies quality-signal scoring + style gates; FLUX.1-schnell renders the matching hero image on the same Spark; scripts/blog-deploy-verify.sh orchestrates build, rsync, drift-commit, axe a11y, and NSM refresh. Zero cloud cost per article once the infrastructure runs. Full mechanism: how this blog actually gets built.
Dashboard: FastAPI + React single-file SecOps dashboard on the Spark. Service start/stop, AIDE alerts, backup status, MCP restart. Reachable over Tor (.onion) and Tailscale; not exposed on the public internet.
Payments: Lightning over the Alby Hub on the Pi 4 (V4V model: no accounts, no platform cut, no subscriptions). Lightning Address in every page footer. Bank transfer on every invoice. No Stripe, no PayPal, no third-party processor in the path.
Web research backend: SearXNG on 127.0.0.1:8888 powers web search for the article pipeline and the "Sovereign Qwen" custom model in OpenWebUI. No query logging to third parties; meta-search results are aggregated locally.

Storage & secrets

Layout: /ai/models (LLM weights, HF cache), /data/config, /data/scripts, /data/projects, /data/secrets (chmod 700).
Backups: age-encrypted tarball, dual-target: local /data/backups + USB. Nightly Floki snapshot pull (NSM stats + Caddy logs) feeds the same backup (the backup rebuild).
Code: Local Gitea on 127.0.0.1:3002 (loopback only). Two public mirrors on github.com/cipherfoxie: sovereign-mcp (MCP server, MIT) and vps-healthcheck (ops audit, MIT). Per agent-mirroring policy: public-facing tooling only; internal persona-MCPs stay in Gitea.

Public edge

Domain: sovgrid.org via FlokiNET-friendly registrar, points at the Floki VPS.
Reverse proxy: Caddy with Let's Encrypt: both sovgrid.org and mcp.sovgrid.org, HTTP/2 only (no QUIC over DERP).
Hardening: UFW default-deny + fail2ban (sshd, caddy-mcp jails). SSH key-only, AllowUsers cipherfox, MaxAuthTries 3. unattended-upgrades active on Debian-Security origin. Caddy edge-block on confirmed-scraper IP ranges (currently 69.12.0.0/16), audit-trail kept in the NSM aggregator.
Analytics: NSM (North Star Metric) aggregator scans raw Caddy logs into /insights/: 30-day external tool-calls, distinct agents (UA + /24 deduped), self-traffic filtered out by CIDR, and distributed-scraper clusters filtered out by UA-over-many-/24s signature. No JS pixel, no cookies, no third-party tracker. Methodology: Insights Dashboard for a DGX Business.
Daily audit: floki-healthcheck.sh runs daily from Spark via SSH: 12 checks, single Matrix push, JSON sidecar at /api/floki-health.json.