Build a Self-Hosted AI Blog with Astro, Mistral, and ComfyUI on One Machine
New to self-hosting AI? The Self-Hosted AI: Start Here hub walks the hardware-decision tree, inference-engine choice, and the operational gotchas that bite hardest in the first three months. Read it before or after this one, whichever fits your stage.
I spent months fighting cloud costs and vendor lock-in before realizing my blog didn’t need them. What I needed was a machine that could write, illustrate, and serve content without phoning home. This stack runs entirely on a single DGX Spark with 128GB unified memory, using Astro for the site, Mistral Small 4 for writing, and ComfyUI FLUX for images. No external APIs. No recurring fees. Just a machine that works while I sleep.
Quick Take
- A self-hosted AI pipeline that writes, illustrates, and serves content
- Runs on one DGX Spark with 128GB RAM
- Uses Astro for the site, Mistral Small 4 for writing, ComfyUI FLUX for images
- No cloud APIs, no recurring fees, just a machine doing the work
The Stack That Runs It All
docker compose config
This isn’t a theoretical setup. It’s what powers my blog at vivalacompra.com. The stack is split into two containers: one for Astro (the site), and one for cloudflared (the tunnel). No Node process runs in production, nginx serves the static build directly.
Key components defined:
- Astro 5 + Tailwind v3 refers to the static site generator and CSS framework used to build the blog.
- nginx:alpine is a lightweight web server that serves the static Astro build without running Node.js in production.
Why this matters: keeping Node out of production containers reduces attack surface and simplifies deployments. The Astro build runs locally, then gets mounted into the container at /usr/share/nginx/html.
In practice, this means I can rebuild the site locally and push it to production without ever touching the container runtime.
Build Workflow: One Command, Zero Friction
# Build and deploy in one step
npm run build
docker compose up -d --build
The magic here is the volume mount. The dist/ directory, where Astro outputs the static site, is mounted directly into the nginx container. No rebuilds needed for content changes.
Why this works: Docker volumes let you inject local files into containers without rebuilding images. The Astro build runs on your machine, then appears instantly in the container.
For example, if I fix a typo in a blog post, I just run npm run build and the change goes live within seconds.
Content Pipeline: Write, Generate, Publish
# Run the full pipeline: fetch, write, generate images, build
python3 scripts/update_blog_from_gitea.py --run-now
The pipeline has three phases:
- Fetch content from Gitea (or any Git repo)
- Generate text using Mistral Small 4 running on port 30000
- Generate images using ComfyUI FLUX (sequential, not parallel)
Why it’s sequential: ComfyUI FLUX and Mistral Small 4 share the same 128GB RAM. Running them together crashes the system.
In practice, this means I stop Mistral before generating images, then restart it afterward. It’s a manual step, but it keeps the machine stable.
Image Pipeline: FLUX, WebP, and Size Control
# Generate images from prompts
python3 scripts/generate_blog_images.py
Images are generated at WebP quality 82, which keeps file sizes between 20, 160 KB per hero image. The pipeline skips generation if the image already exists and has a score ≥ 3.
Why WebP quality 82: it’s the sweet spot between visual quality and file size. Higher quality adds kilobytes without noticeable gains.
For example, if I regenerate a hero image and it comes out at 450 KB, I’ll manually tweak the prompt or lower the quality slightly.
Insights Dashboard: Track What Matters
---
// src/pages/insights.astro
const posts = await Astro.glob('../content/blog/*.md');
---
The dashboard pulls metrics directly from the blog’s frontmatter: EEAT scores, image scores, and actual file sizes. No backend required, everything is calculated at build time.
Why this matters: it turns subjective quality checks into objective data. A score of 4 for expertise means something concrete, not just a gut feeling.
In practice, this helps me spot weak spots in my content pipeline before readers do.
What I Actually Use
- Astro 5 + Tailwind v3: because it turns Markdown into a fast static site without Node in production
- Mistral Small 4: because it writes drafts I can edit, not polished corporate fluff
- ComfyUI FLUX: because it generates images that don’t look like AI junk
What this stack does NOT do, on purpose
Three things were left out of the original setup that came up enough during the first months that they are worth naming explicitly.
First, no comment system. The article surface is one-way for now, with replies routed through Nostr (NIP-22 native comments planned, tracked separately). The reasoning: every commercial comment system either runs JavaScript on every page-load, leaks reader IPs to a third party, or both. NIP-22 is the only architecture that respects the privacy floor the rest of the stack maintains. Until that ships, the reply surface is Nostr-direct.
Second, no analytics in the conventional sense. The only signal source is Caddy access logs, aggregated nightly into the NSM strip on /insights/. That gives blog views, MCP tool-call rate, unique IPs, and rate-limit hit count. It does not give heatmaps, scroll depth, time-on-page, or anything else that requires client-side JavaScript. The tradeoff is intentional: we know less, the reader is tracked less, the site stays static.
Third, no per-article images at hub-grade quality. The hero images come from FLUX.1-schnell on the same DGX Spark, generated on demand. Quality is good enough for the medium (illustrative banner, not photographic editorial), bad enough that nobody is going to confuse them with stock-photo licensing material. That is a deliberate choice: pipeline-generated images stay legible at thumb-scale and never accidentally suggest a level of production budget that would set wrong expectations.
Where this setup will need to change
Two scaling thresholds will force decisions in the next 12-18 months.
At ~200 published articles the search-and-retrieval pattern that works today (paste URL into Claude, let it scan) starts hurting agent-side context budgets. That is the threshold where the MCP server’s search_blog tool stops being redundant. The infrastructure to make that transition is already in place; only the corpus needs to grow.
At the point where Floki VPS becomes the bottleneck (sustained traffic above what a small no-KYC tier handles), the path is either upgrading the VPS tier, adding a second VPS for Caddy load-balancing, or moving to a different no-KYC provider. The architecture is portable because nothing depends on Floki specifically; the configuration is in the repo, the deploy is rsync-based, the move would take an afternoon if the receiving box already has Caddy and Docker set up.
The honest single-line takeaway across this whole setup is that the cost of running a self-hosted AI blog is mostly upfront discipline, not ongoing operations. Once the pipeline is wired the marginal cost per new article is the editorial time itself, plus a few minutes of compute. The rest of the stack runs without daily attention; the moments where attention is needed (a Caddy renewal that fails to auto-restart, a Mistral container that wedges, a deploy where the factcheck-gate trips on a new hallucination) are visible because the alerting is in place, not because the system is fragile.
The shortest summary that does the article justice: this is what a small but serious self-hosted publishing surface looks like in 2026, with the tradeoffs named honestly and the corners cut deliberately rather than by accident.
Self-Hosted AI Blog Architecture
Astro + Mistral + ComfyUI on single machine