Reading path

I'm operating this stack day-to-day

Day-zero is one thing. Day-N is the part nobody writes about. Service lifecycle, power events, backup, network hardening, file integrity, and the mental model of the memory you are actually managing.

10 articles, in reading order

  1. systemd Patterns for Self-Hosted AI Services

    Service lifecycle: six unit-file patterns that make a multi-service AI stack survive crashes and reboots without operator intervention.

  2. Power Failure Recovery on a DGX Spark: The 30-Minute Procedure

    Power-event recovery: the thirty-minute procedure from cold boot to inference resumed, with the specific failure modes that show up in the first ten minutes.

  3. Backing Up 119B Parameters Without Going Bankrupt on Storage

    Backup discipline: keeping a 119B-parameter model and the surrounding state durable without paying for object-storage at model-weight scale.

  4. Caddy + Cloudflare Tunnel: The Reliability Pattern

    Network edge: the Caddy-plus-tunnel pattern that survives ISP-side reconfiguration, with the receipts from the 2026-05 Cloudflared retirement.

  5. AIDE + Tripwire for AI Boxes: When File Integrity Matters

    File integrity: when AIDE or Tripwire on an AI box is the right intrusion-detection layer, and the operational rules to run them without alert fatigue.

  6. The Unified-Memory Inference Mental Model

    The resource you are actually managing: a unified memory pool shared across LLM, TTS, and image services, with the mental model that makes the sequencing rules feel obvious instead of arbitrary.

  7. Self-Hosted Observability for a One-Person AI Stack

    Observability for a one-person stack: which Prometheus exporters earn their keep, which dashboards stay quiet on a good day, and the alert-fatigue ceiling that defines what gets watched.

  8. Tailscale vs Headscale for Multi-Box Sovereign Stacks

    Network plane: Tailscale managed versus Headscale self-hosted, the control-plane sovereignty trade-off, and the DERP-relay decision that sits underneath both.

  9. The Operator's Guide to Self-Hosted Lightning

    Lightning ops for the daily driver: channel selection, fee-policy updates, channel-backup automation, and the inbound-liquidity bootstrap problem.

  10. Gitea as Source-of-Truth for AI Pipelines

    Source-of-truth for AI pipelines: why Gitea on loopback over GitHub, the pull-before-read ritual, and the webhook-versus-polling decision.

← All articles