Sovereign MCP Server: Local Setup, Integration, and Hard Lessons
New to self-hosting AI? The Self-Hosted AI: Start Here hub walks the hardware-decision tree, inference-engine choice, and the operational gotchas that bite hardest in the first three months. Read it before or after this one, whichever fits your stage.
The MCP server keeps asking if your blog already covered a topic, even though you’ve written 42 posts about it.
Quick Take
- Replaces manual context checks with automated, local knowledge retrieval
- Runs on a single port without conflicting with other services
- Diagnoses SGLang configuration issues without manual stack explanations
- Integrates with OpenClaw via HTTP and Vibe via stdio without extra servers
What the Sovereign MCP Server Actually Does
The server exposes three tools for local AI agents:
| Tool | What it does |
|---|---|
search_blog | Runs TF-IDF full-text search across all published articles |
get_article | Returns the full text of an article by its slug |
diagnose_sglang | Applies seven rules to self-diagnose SGLang configuration issues |
Without MCP, agents ask every time whether a topic is already covered. With MCP, they query the knowledge base directly, avoid duplicates, and build on existing content. When SGLang fails, they fetch the diagnostic rules without requiring manual stack explanations each time.
In practice, this means OpenClaw agents can now reference your blog posts without pinging external APIs or relying on stale context.
How the Server Is Built and Where It Lives
The server lives in /data/projects/sovereign-mcp/ and starts with src/main.py, which creates a FastMCP("sovereign-ai-blog") instance. It uses Streamable HTTP (MCP spec 2025-03-26) via mcp.streamable_http_app() and runs as an ASGI app under uvicorn. The knowledge base is data/knowledge-base.json, auto-generated by scripts/generate_knowledge_base.py from the sovereign-blog project. Each article entry contains slug, title, description, date, tags, and body. The TF-IDF index loads on startup; no external vector backend is needed.
In practice, the server starts in under a second and serves the index from memory, which is fine for up to 500 articles.
Running It as a System Service
Here’s the systemd unit that keeps it running:
[Unit]
Description=Sovereign AI MCP Server
After=network.target
[Service]
Type=simple
User=cipherfox
WorkingDirectory=/data/projects/sovereign-mcp
ExecStart=/data/projects/sovereign-mcp/.venv/bin/uvicorn src.main:app --host 127.0.0.1 --port 8002 --workers 1
Restart=on-failure
RestartSec=5
Environment=PYTHONUNBUFFERED=1
[Install]
WantedBy=multi-user.target
sudo cp sovereign-mcp.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable sovereign-mcp.service
sudo systemctl start sovereign-mcp.service
Why port 8002 instead of 8001? Because Voxtral TTS uses port 8001 for its OpenAI-compatible API, and Voxtral runs on-demand while Sovereign MCP runs permanently. Both services on the same port would block Voxtral from starting.
In practice, permanent services and on-demand services must not share ports, so 8001 stays reserved for Voxtral and Sovereign MCP moves to 8002.
Giving the Dashboard Control Without a Password
To let the sovereign-mcp dashboard restart the service without prompting for a password, add a sudoers entry:
echo 'cipherfox ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart sovereign-mcp.service' | sudo tee /etc/sudoers.d/sovereign-mcp
sudo chmod 440 /etc/sudoers.d/sovereign-mcp
This is intentionally service-specific; a wildcard restart * would expose too much attack surface.
In practice, the dashboard can now restart the server on demand, which is useful when you push new blog content and want agents to pick up the updated knowledge base immediately.
Hooking Up OpenClaw via HTTP
OpenClaw supports MCP servers natively via HTTP (url field) and stdio (command + args). Here’s how to configure both endpoints:
openclaw mcp set sovereign '{"url":"http://127.0.0.1:8002/self-hosted-ai"}'
openclaw mcp set knowledge '{"command":"python3","args":["/home/cipherfox/.vibe/mcp-servers/knowledge_mcp.py"]}'
openclaw mcp list
# - sovereign
The configuration is stored in ~/.openclaw/openclaw.json under mcp.servers.
In practice, OpenClaw agents can now call the Sovereign MCP tools over HTTP, which is faster and more reliable than stdio for local services.
Getting Vibe to Talk to It via stdio
Vibe only speaks stdio, so we wrap the FastMCP server in a stdio-compatible script:
#!/bin/bash
# ~/.vibe/mcp-servers/sovereign_mcp.sh
cd /data/projects/sovereign-mcp
exec .venv/bin/python3 -c "from src.main import mcp; mcp.run(transport='stdio')"
The trick is reusing the same mcp = FastMCP(...) instance from src.main and starting it in stdio mode. No second server process, just the same code running differently.
Vibe’s config points to this wrapper:
# ~/.vibe/config.toml
[[mcp_servers]]
name = "sovereign"
transport = "stdio"
command = "/home/cipherfox/.vibe/mcp-servers/sovereign_mcp.sh"
args = []
Test it with:
echo '{"jsonrpc":"2.0","id":1,"method":"initialize","params":{"protocolVersion":"2024-11-05","capabilities":{},"clientInfo":{"name":"test","version":"0"}}}' \
| timeout 5 ~/.vibe/mcp-servers/sovereign_mcp.sh 2>/dev/null | head -1
# → {"jsonrpc":"2.0","id":1,"result":{...,"serverInfo":{"name":"sovereign-ai-blog",...}}}
In practice, Vibe agents can now use the same knowledge base and diagnostics without any extra infrastructure.
Keeping the Knowledge Base Fresh
The knowledge base is stored in data/knowledge-base.json and generated by the sovereign-blog project:
cd /data/projects/sovereign-blog
python3 scripts/generate_knowledge_base.py # after each build
cp public/knowledge-base.json /data/projects/sovereign-mcp/data/
sudo systemctl restart sovereign-mcp.service # loads the new KB at startup
Right now it’s manual. The plan is to add a post-build hook in master.py that copies the file and restarts MCP automatically.
In practice, you only need to run two commands after publishing a new post to make it searchable by agents.
Performance Numbers You Can Trust
With 42 articles in the knowledge base:
| Operation | Latency |
|---|---|
/health GET | <5ms |
tools/list | <10ms |
search_blog (TF-IDF, top 5) | 30 to 50ms |
get_article (slug lookup) | <10ms |
diagnose_sglang (7 rules) | <10ms |
Single-process, single-worker, no caching needed at this scale. Performance scales linearly with KB size because TF-IDF is recomputed per query and not persisted. Beyond 500 articles, you’ll want to persist the index or switch to a backend like BM25 or Whoosh.
In practice, these latencies mean agents can query your blog in real time without noticeable delays.
What I Actually Use
- Mistral Small 4: my go-to local model for most tasks because it balances speed and quality well.
- OpenClaw: the agent framework that lets me plug in local MCP servers without rewriting integrations.
- Vibe: my daily driver for quick experiments and iterative development.
Streamable HTTP vs Stdio, the choice nobody documents
The MCP spec offers two transports for agent-server communication: Stdio (the agent spawns the server as a subprocess and pipes JSON-RPC over stdin/stdout) and Streamable HTTP (the server runs as a long-lived process and accepts HTTP POST per tool call). For a hosted server reachable from many agents, Streamable HTTP is the only viable choice. Stdio assumes the agent owns the server process lifecycle, which falls apart the moment two agents want to query the same corpus or the server outlives the agent session.
Streamable HTTP also lets Caddy do TLS termination, lets Prometheus scrape the FastMCP /metrics endpoint, and lets the existing reverse-proxy logging pipeline collect per-tool-call telemetry without server-side changes. Stdio gives you none of that. The penalty for Streamable HTTP is one network hop of latency per tool call, which on a localhost or LAN setup is in the noise (under one millisecond) and on a public endpoint adds the round-trip time the agent’s user is already used to.
If you are building a personal-only MCP that one agent on one machine will ever touch, Stdio is fine and simpler. Anything else, default to Streamable HTTP.
FastMCP 1.27 is a hard cut from the legacy SDK
If you have read the older MCP examples that import from mcp.server import Server and pass InitializationOptions, that is the legacy mcp SDK from late 2024. FastMCP 1.x is a different package with a different surface. The migration is mostly mechanical, decorators replace the imperative tool registration, but the Pydantic-based input/output schemas are the load-bearing change: tool inputs and outputs are now declared as Pydantic models, the JSON-Schema is generated automatically, and the MCP Inspector reads them back cleanly without manual schema authoring. The four-line refactor that Smithery’s quality scorer rewards in the 100/100 post was exactly this: replace hand-written inputSchema dicts with typed Pydantic models, ship the same tools with measurably better introspection. If you are starting fresh, start on FastMCP 1.27 and ignore the older SDK skeletons.
What happened next
This post documented the build. Two follow-ups close the arc:
- How we hit 100/100 on Smithery (and what the score actually measures), the listing/scoring story across Smithery, Glama, and the awesome-mcp PR.
- Why the Sovereign AI Blog MCP is mostly redundant today, the honest MVP/POC follow-up about when (and when not) to actually install this server.
Sovereign MCP Server
Local AI knowledge integration architecture