A Self-Hosted AI Blog That Serves Both Humans and Machines
The web is being rewritten by AI agents that bypass websites entirely. This site fights back by serving both people and machines from the same knowledge base.
Quick Take
- Hybrid strategy keeps human content and machine tools in sync
- MCP tools turn blog posts into actionable APIs for agents
- Revenue shifts from affiliate clicks to pay-per-execution calls
Why Dual Layers Exist
Affiliate revenue drops when agents stop clicking links. Meanwhile, new revenue appears where agents need answers they can execute immediately. The solution isn’t to abandon human readers, it’s to layer machine-readable tools on top of existing content without duplicating effort.
⚠️ Gotcha: If you duplicate content for machines, you’ll create a maintenance nightmare. The blog must remain the single source of truth. Human articles get polished for EEAT while MCP tools expose structured data from the same markdown files. One update, two interfaces.
⚠️ Watch Out: Agents won’t tolerate stale data. If your tools pull from outdated guides (e.g., referencing v1.3.2 of SGLang when v1.5.0 is current), agents will fail silently or return incorrect configurations. Always validate tool data against the latest release notes.
⚠️ Limitation: Not all content is suitable for machine consumption. Deep dives into philosophical implications of sovereign AI (e.g., AI Sovereignty: The Final Frontier of Self-Sufficiency) won’t translate well into JSON responses. Reserve these for human readers.
The Machine-Readable Layer
Agents don’t need storytelling. They need validation, diagnostics, and configuration snippets. Three tools emerged from the DGX Spark community’s pain points:
1. SGLang GB10 Config Validator
GB10 hardware crashes when users copy-paste flags from old guides. This tool returns a pre-validated docker command plus forbidden flags that trigger OOM errors. The knowledge comes from setup guides and troubleshooting posts already on the site.
# Example MCP tool response for GB10 Config Validator
{
"validated_command": "docker run --gpus all --shm-size=16g -p 8080:8080 ghcr.io/sgl-project/sglang:v1.5.0 --model-path /models/llama-3-70b --port 8080",
"forbidden_flags": ["--max-seq-len=8192", "--tensor-parallel-size=4"],
"error_examples": [
"OOM on GB10 with --tensor-parallel-size=4 (requires 48GB VRAM per GPU)"
]
}
⚠️ Gotcha: The validator must account for hardware-specific quirks. For example, the GB10’s 24GB VRAM limit means
--max-seq-len=16384will fail on batch size >2, even if your guide says it works. Test against real hardware before publishing.
⚠️ Watch Out: If your markdown files reference relative paths (e.g.,
./configs/gb10.yaml), the tool must resolve them to absolute paths (/etc/blog/configs/gb10.yaml) to avoid path resolution failures in production.
2. Sovereign Stack Health Check
It queries service status, known bugs, and recommended versions across SGLang, OpenHands, and related tools. The data lives in VIBE.md and fix articles, no separate database required.
# Example VIBE.md entry for SGLang v1.5.0
sglang:
version: "1.5.0"
status: "stable"
known_bugs:
- "CUDA 12.1 + GB10 causes kernel panics (workaround: use CUDA 11.8)"
recommended_versions:
openhands: "0.2.3"
vllm: "0.4.1"
⚠️ Limitation: The health check can’t predict future breaking changes. For example, if SGLang releases v1.6.0 with a new memory allocator, your tool will return stale recommendations until you manually update
VIBE.md.
⚠️ Gotcha: Always include version strings in tool outputs. Agents need to know whether they’re calling v1.4.2 (which has a critical bug) or v1.5.0 (which fixes it). Example:
{ "current_version": "1.5.0", "upgrade_available": true, "latest_version": "1.5.1" }
3. ARM64 LLM Compatibility Checker
It answers whether a specific model-quantization-hardware combo actually works. The output includes workarounds and recommended flags pulled directly from setup guides.
# Example ARM64 compatibility check output
$ ./arm64-checker --model llama-3-8b --quantization int4 --hardware jetson-orin
✅ Supported with flags: --tensor-parallel-size=1 --max-seq-len=4096
⚠️ Workaround: Use `--rope-scaling-factor 1.0` to avoid attention errors
❌ Unsupported: --flash-attn (not available on ARM64)
⚠️ Watch Out: ARM64 support is fragmented. A model that works on Jetson Orin might fail on Raspberry Pi 5 due to different NEON optimizations. Always specify the exact hardware in tool outputs.
⚠️ Limitation: Quantization formats aren’t universally supported. For example,
bitsandbytesint8 works on x86 but fails on ARM64 with “CUDA error: invalid device function”. The tool must include hardware-specific caveats.
Each tool is stateless and LLM-agnostic. They return JSON responses that any agent can parse, whether it’s Claude calling natively or Goose using the MCP protocol.
# Example MCP tool schema for ARM64 checker
{
"$schema": "http://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"model": {"type": "string"},
"quantization": {"type": "string", "enum": ["int4", "int8", "fp16"]},
"hardware": {"type": "string"},
"supported": {"type": "boolean"},
"recommended_flags": {"type": "array", "items": {"type": "string"}}
}
}
How Revenue Shifts
Human traffic still earns affiliate clicks and Lightning Zaps. But the real change comes from machine calls:
Free tier tools answer questions from the standard knowledge base. Paid tier tools will require L402 payments per execution, settled via Lightning Network once adoption proves demand. The plan is to start with free tools to build volume, then layer on microtransactions when agents show willingness to pay.
⚠️ Gotcha: L402 payments require Lightning Network liquidity. If your node has insufficient inbound capacity (e.g., <10,000 sats), agents will fail to pay. Monitor your node’s liquidity with:
lncli listpeers | grep "inbound_liquidity_msat"
⚠️ Watch Out: Agents may game the free tier by making excessive calls. Implement rate limiting per IP (e.g., 100 requests/hour) to prevent abuse. Example FastAPI middleware:
from fastapi import Request, HTTPException from fastapi.responses import JSONResponse RATE_LIMIT = 100 # requests per hour @app.middleware("http") async def rate_limit_middleware(request: Request, call_next): ip = request.client.host key = f"rate_limit:{ip}" current = redis.incr(key) if current == 1: redis.expire(key, 3600) if current > RATE_LIMIT: raise HTTPException(status_code=429, detail="Rate limit exceeded") return await call_next(request)
Transparency matters. Only relative trends get published, percentage changes in affiliate clicks versus MCP calls, never absolute dollars. This keeps the experiment honest while preserving privacy.
⚠️ Limitation: Tracking machine vs. human traffic is imprecise. Some agents spoof user-agent strings, while humans may use ad-blockers that obscure analytics. Use a combination of:
- User-agent parsing (e.g.,
curl,Claude-Code)- Request headers (e.g.,
X-MCP-Client)- Behavioral patterns (e.g., JSON responses vs. HTML)
Next Steps
The immediate task is stabilizing the tool pipeline. A grep-based checker currently validates docker commands before they reach production. Next is building the FastAPI MCP server that exposes these tools at /api/tools/{name}.
# Example FastAPI MCP server endpoint
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
class ToolRequest(BaseModel):
model: str
quantization: str
hardware: str
@app.post("/api/tools/arm64-checker")
async def arm64_checker(request: ToolRequest):
# Validate input
if request.hardware not in ["jetson-orin", "raspberry-pi-5"]:
raise ValueError("Unsupported hardware")
# Fetch data from markdown files
data = load_from_markdown(f"/etc/blog/data/{request.model}.yaml")
return {
"supported": request.model in data["supported_models"],
"recommended_flags": data["flags"][request.hardware]
}
⚠️ Gotcha: The MCP server must handle malformed requests gracefully. For example, if an agent sends
"quantization": "int99"(invalid), return a 400 error with a clear message:{ "error": "Invalid quantization: int99. Valid options: int4, int8, fp16" }
After that, the focus shifts to documentation. Every article needs a tool block in its frontmatter so agents can discover capabilities automatically. The schema is ready; the implementation comes next.
# Example frontmatter for a blog post
---
title: "Deploying SGLang on GB10: A Step-by-Step Guide"
tools:
- name: "sglang-gb10-validator"
description: "Validates GB10 docker commands"
input_schema:
type: object
properties:
docker_command:
type: string
output_schema:
type: object
properties:
validated_command:
type: string
forbidden_flags:
type: array
items:
type: string
---
The experiment runs live. When both channels are active, the site will publish monthly updates showing how revenue shifts as machines replace humans as the primary consumers.
⚠️ Watch Out: If machine traffic dominates, human readers may feel alienated. Include a human-readable summary section in tool outputs to maintain accessibility. Example:
## For Humans This tool is designed for AI agents. For human-readable guides, see: - [SGLang GB10 Setup Guide](https://www.glukhov.org/tags/ai-coding/) - [Troubleshooting OOM Errors](https://vps.us/blog/vps-tutorials/page/3/)