From Blog to Agent Tools: How One Knowledge Base Powers Both Humans and AI
Coming from outside the stack? The Self-Hosted AI: Start Here hub article maps where strategy decisions like this one land in the actual deploy: hardware tree, inference engine, what hurts most. Useful as the operational anchor for the framing here.
The moment you realize AI agents are eating your affiliate traffic isn’t when you see the stats drop. It’s when you ask an agent a question and get a perfect answer, without a single click.
That’s the inflection point. The old model of writing for humans and hoping they click is over. The new model is writing once, serving twice: a blog for people, and machine-readable tools for agents. One knowledge base, two interfaces. No double work.
Quick Take
- AI agents bypass blogs entirely, your content is consumed raw, not clicked
- One knowledge base feeds both humans (blog) and machines (MCP tools)
- The first tool validates SGLang configs for DGX Spark users, live since April 2026
- Revenue streams now include tool calls, not just clicks
Polish Notice (2026-05-03): the architecture section, KPI examples, and caveats below have been corrected against the real Sovereign AI Grid stack (Astro v5, FastMCP 1.27, FlokiNET ↗ no-KYC privacy VPS on Debian 13, Mistral Small 4 as a 119B MoE served by SGLang on DGX Spark). The strategic thesis is unchanged. The original draft contained several speculative version pins, those have been removed or corrected.
Why the Affiliate Model Is Dying in Real Time
Last month, I ran a test: asked three agents the same question about setting up Mistral Small 4 on a DGX Spark. Claude (via Claude Code), Perplexity, and a self-hosted Mistral-via-OpenClaw all returned correct answers with direct commands. Not one link clicked. The affiliate revenue? Zero.
# Example command returned by agents
docker run --gpus all --shm-size=16g \
-v /data/models:/models \
mistralai/Mistral-Small-Instruct-2501 \
--port 8000 --tensor-parallel-size 4
This isn’t hypothetical. The data is already here. Agents are trained on your content, but they don’t send traffic back. They execute directly. Your blog becomes a knowledge source, not a destination.
Watch Out Agents may return outdated commands if your blog hasn’t been updated in >30 days. Always pin to dated model snapshots (e.g.,
Mistral-Small-Instruct-2501) rather than rolling tags, to prevent version drift.
The response isn’t to fight it. It’s to be on both sides.
The blog remains the foundation, first-person stories, exact commands, real failures. But now it also powers machine-readable tools via MCP. Agents don’t need affiliate links. They need validated configurations, health checks, and compatibility matrices. That’s what the tools provide.
# Example MCP tool response (FastMCP server)
{
"status": "valid",
"command": "docker run --gpus all --shm-size=16g -v /data/models:/models mistralai/Mistral-Small-Instruct-2501 --port 8000",
"flags": ["--tensor-parallel-size 4"],
"forbidden": ["--quantization int8"], # Known to cause OOM on DGX Spark
"source": "/blog/setup-mistral-sglang-setup/"
}
The experiment: track three revenue streams in parallel, affiliate clicks, Value-for-Value Lightning tips, and paid MCP tool calls, and publish the trends live. No vision statements. Just numbers.
The Architecture That Actually Works Today
The stack is simple because it has to be. One VPS, one blog, one MCP server. No cloud lock-in, no vendor sprawl.
The blog runs on Astro v5, static build, hosted on a no-KYC privacy VPS at FlokiNET ↗ (Debian 13, Caddy reverse proxy with Let’s Encrypt). It is the source of truth for both humans and tools. Every article passes a quality gate (style-specific minimum score) before publish:
# Example quality block in Astro frontmatter
The MCP server runs on FastMCP 1.27, exposed via Streamable HTTP at https://mcp.sovgrid.org/self-hosted-ai, stateless and LLM-agnostic. It does not set up servers, it returns information about them, search results from the blog corpus, the article body for a slug, the tag list, and a diagnostic-pattern matcher for SGLang errors on GB10/SM121A hardware.
Watch Out Stateless MCP tools cannot track user sessions. If you need per-user validation later (paid tier, rate limiting beyond IP), add a lightweight session store. Today the deployment is fully stateless because the use case does not need state yet.
Gotcha FastMCP 1.27 changed the import path from
mcp.server.fastmcp(1.0.x) to plainfastmcp. If a published code snippet still imports from the old path, it is from a pre-1.x article and needs an update before pasting into a new project.
The Tools That Agents Actually Need
The first tool is live: diagnose_sglang. It takes hardware specs, model version, and current flags, then returns a validated docker run command or flags that will fail. It’s built from the same knowledge base as the blog, setup articles, fix articles, error logs.
# Example tool input/output
{
"hardware": "nvidia-dgx-spark",
"model": "Mistral-Small-Instruct-2501",
"flags": ["--quantization int8", "--tensor-parallel-size 8"],
"output": {
"status": "invalid",
"error": "OOM on GB10 with this combination",
"suggestion": "Use a smaller tensor-parallel size or switch quantization",
"source": "/blog/fixes-sglang-vibe-performance-benchmark/"
}
}
The next tools on the roadmap (tracked in Gitea Issue #13) are diagnostic-class extensions of the same pattern: diagnose_voxtral for TTS output quality issues, diagnose_openclaw for alternating-roles and Side-Car-Proxy edge cases, stack_inventory for dated system-version reporting from KB metadata, related_articles for TF-IDF-graph hops across the corpus. None ship today. The pattern is identical: pattern-match a real problem against an article-derived rule set, return a citation-bearing answer.
What is not shipping is a generic “ARM64 LLM compatibility checker” that tries to be a knowledge base on its own, that is just web search dressed up as a tool, and the agent calling it would do better with a real web fetch.
Each tool is stateless. Each tool is LLM-agnostic. Each tool is built from the same articles that humans read. No duplication, no drift.
The KPIs That Matter, For Humans and Machines
For humans, the metrics are EEAT scores and affiliate clicks. But for agents, the numbers are different.
Tool-call rate tracks how often the MCP server is called per day. Unique-IP rate tracks how many distinct callers reach it (proxies like Smithery and Glama collapse many real users into a few IPs, so this is a floor, not a ceiling). User-agent breakdown distinguishes direct claude-code callers from gateway-mixed traffic. HTTP-200 rate tracks whether the tool actually returned a result.
# Real Caddy log shape (anonymized)
{
"timestamp": "2026-05-03T14:30:00Z",
"user_agent": "claude-code/1.x",
"remote_ip": "<asn-mapped-to-direct-or-gateway>",
"method": "POST",
"path": "/self-hosted-ai",
"status": 200,
"latency_ms": 38
}
The affiliate stream is live (FlokiNET ↗, Alby ↗, BitBox ↗), conversion tracking is not yet wired. The Value-for-Value stream is live as a Lightning address, zaps received so far: zero. The MCP free tier is live and being called. The honest snapshot: technical foundation works, monetization signal is empty, distribution effort is the actual bottleneck.
Watch Out MCP tools don’t respect
robots.txt. If you’re scraping your own blog for tool data, exclude/blog/tools/to avoid recursion.
The plan is to add L402 payments later, after adoption proof. A Lightning Node will handle the microtransactions. But first, we need to see the numbers move.
The Meta-Experiment: Publishing the Trends Live
The highest priority article is the meta-experiment itself: “Watching the Old Internet Die and the New One Emerge: Live Revenue Data.” It will track three streams in parallel, affiliate clicks declining, MCP calls rising, Lightning tips trickling in. No estimates. No projections. Just real numbers from this domain.
The hook is simple: “These are the real numbers from this site. Watch the shift from click economy to execution economy happen in real time.”
Watch Out Live revenue tracking requires strict separation of streams. Mixing affiliate clicks with MCP calls in analytics will corrupt your data.
The next articles will dive into llms.txt as the new robots.txt, explain L402 payments, and compare MCP tools to RAG. Each one will be built from the same knowledge base, no extra work.
The goal isn’t to predict the future. It’s to build it, measure it, and publish it. The old model is dying. The new one is here. The tools are live. The next step is to watch the numbers.