How to Read the Insights Dashboard for a DGX-Spark Business, Not a Hobby Blog

May 11, 2026 13 min read

The Insights page on this site is intentionally small. Four NSM cards across the top, six content KPIs underneath, no charts, no fancy widgets, no JavaScript pixels. Everything is computed from Caddy access logs. Everything is one Python function deep.

That austerity is the point. If you are running a DGX Spark as the engine of an AI service that needs to find product-market-fit in the agent-discoverability era, what you measure decides what you build. A dashboard that flatters you with vanity numbers will steer you into hobby-blog territory inside a quarter. This article is the deep-dive companion to that page: each metric defined, the formula behind it, the business decision it should influence, and the trap that the metric itself sets if you let it.

The audience here is not “blog operator”. It is “DGX-Spark-as-infrastructure operator who is trying to discover whether their MCP server is on the path to being a product”.

If you only check one thing per day

Open /insights/. Compare today’s “MCP tool calls” to yesterday’s. If it moved more than 50%, check “Top distinct agents” in the raw API at /api/nsm-stats.json. If one IP-block is responsible for most of the move, the metric moved, your reach did not. Close the tab.

That is the entire daily ritual. Everything below explains why.

The fundamental question

Before any specific metric, you need an honest answer to this: what are you actually trying to learn from these numbers?

Self-hosted-AI as a business has three plausible revenue paths today: V4V tipping (Lightning, no rent-seeking), affiliate links (referral commissions, only meaningful with audience), and paid MCP access (L402 pay-per-call, gated tool quotas). All three depend on the same prior: AI agents finding your service useful enough to come back, and humans behind those agents (or the agents’ own logic) eventually triggering a settlement.

You do not know yet which signal predicts that. That ignorance is normal at the pre-distribution stage. What you can do is measure the closest leading indicator to “agent found this useful and acted on it”. Then watch which sub-metric moves first when something actually changes. The point of the dashboard is not to be impressed by today’s numbers. It is to make tomorrow’s surprise small.

The Primary NSM: MCP tool calls

Formula (in pseudo-Python, from nsm-aggregate-floki.py):

tool_calls = sum(
    1 for e in mcp_external
    if e.method == "POST" and e.path in ("/mcp", "/self-hosted-ai")
)
# where mcp_external = [e for e in mcp_log if not is_internal_ua(e.user_agent)]

What it measures. Every time an external AI agent successfully invokes a tool on the MCP server (search_blog, get_article, list_tags, diagnose_sglang), one POST hits one of these two endpoints. Calls from your own personas (cipherfox, hexabella, claude-code, vibe, openclaw) are filtered out via User-Agent prefix match and counted separately as internal_agent_calls. Calls from your own home-DSL range (env-configurable, default 91.59.0.0/16) are also broken out as self_traffic_tool_calls. The headline number reflects external reach, not your own dogfooding.

Why this is the NSM. It is the closest log-line you can capture to “an agent decided to act based on something my service offered”. Page-loads do not require action. A crawler can fetch every URL on the site without making a decision. A tool call costs the agent something: the agent has to know your MCP exists, has to have it registered, has to compose a JSON-RPC payload, has to wait for the response, has to parse it. Each tool call is an agent voting with its compute budget.

Healthy ranges by stage (operator-calibrated heuristics, not industry benchmarks. Recalibrate to your own context after two weeks of data):

Pre-discovery (0 to 100 calls/day): registered in MCP-discovery directories, traffic mostly from those directories’ periodic health-probes. You should see Smithery and Glama in your referrer mix.
Early discovery (100 to 1000 calls/day): real external agents have started invoking tools. Distinct-agents number should grow with this. If it does not, see the trap below.
Real reach (1000+ calls/day): sustained. At this point the L402 question becomes worth asking: at what price-per-call does a tool stop being free?

The trap. Tool-call volume is exquisitely sensitive to a single heavy client. One automation script in a single AWS Lambda can put 10,000 calls a month through your MCP without anyone reading the result. The number goes up, your service feels found, your sense of reach is wrong. The defense is the next metric.

Distinct Agents: the de-vanitized version

Formula:

agent_fingerprint = lambda e: (e.user_agent[:60], ip_prefix_24(e.client_ip))
distinct_agents_no_self = |{
    agent_fingerprint(e)
    for e in mcp_external
    if not is_self_ip(e.client_ip)
}|

What it measures. External clients deduped by (User-Agent first 60 characters, IP /24-prefix). A rotating Lambda fleet behind one /24 with one UA collapses to one agent. A different bot in the same /24 with a different UA stays separate. Your own DSL range is excluded.

Why this matters more than raw IP count. The original aggregator counted distinct IP strings. Every rotating cloud IP became its own agent. The number was honest about what it counted, but it answered the wrong question. The dedup-and-filter version answers the question you actually have: how many distinct services have integrated my MCP enough to call it from their infrastructure?

Business decision the metric should drive:

Distinct-agents flat, tool-calls growing: one heavy user is doing all the work. Not growth. Audit the top contributor in top_distinct_agents[]. Decide if they are paying-worthy (then L402 sooner) or indifferent (then ignore the volume).
Distinct-agents growing, tool-calls flat: discovery is working but engagement is shallow. New agents try once and leave. Your content surface is not deep enough for them to come back. Build vertical depth on the topics that brought them, not horizontal breadth.
Both growing in parallel: product-market-fit signal. Not certainty, but it earns more attention than either alone.
Both flat: discovery has stalled. Time to think about distribution, not metrics.

When I checked, the dedup-and-filter pass only shaved the headline from 334 to 314, a 6% drop. The story shift was elsewhere: the top single /24 in the audit was responsible for 86% of all external hits, and once that one IP-block came into view the whole “reach” framing collapsed. 314 distinct agents was technically accurate. It implied an audience that did not exist. The full receipts are in Why 334 Unique IPs Was Really 5 Services in Trench Coats. If you have an analytics dashboard you have not audited this way recently, audit it.

Rate-limit hits: paradoxical signal

Formula:

rate_limited_429 = sum(1 for e in mcp_log if e.status == 429)

What it measures. HTTP 429 responses returned by Caddy when a client exceeded the configured rate (60 requests per minute per IP on this site). These are the bot-pressure-rejection counter.

Why “higher is fine” is the right framing. A 429 is a defense working. The bot got knocked back at the boundary. The compute behind the endpoint (which is real on a DGX Spark when MCP calls hit the inference layer) was not consumed.

When to worry. Audit the top blocked IPs once a week. If a real customer (browser UA, low call volume per minute but rolling) is hitting 429s, your rate-limit is mis-calibrated. The 60/min default is generous for human use, even tab-heavy; a real customer should never see it. If they do, the cause is either a bug in the customer’s client or a too-narrow window in your limiter.

When zero is suspicious. Either you have no bot pressure (unlikely at any non-zero scale) or your rate-limiter is not configured (worse). On my site, the steady-state 429 number is in the low hundreds per month, mostly from one specific vulnerability scanner that gets dropped at the firewall before Caddy even sees it now (UFW-blocked after the second wave).

Blog page-loads: deliberately NOT a KPI

Formula:

blog_page_loads = sum(
    1 for e in blog_log
    if e.path.startswith("/blog/")
    and e.path.endswith("/")
    and e.path.count("/") == 3
    and not is_bot(e.user_agent)
)

What it measures. Human-browser GETs on /blog/<slug>/. Bot UAs are filtered out via prefix and pattern match.

Why deliberately not a KPI. Page-loads do not pay rent. The thesis behind the MCP server is that AI-agent-discoverable content is the path to revenue, not human page-loads to a banner-ad-free V4V tip-jar that nobody zaps anyway (also documented honestly: zap-tracking and blog Nostr account shows zero zaps after 30 days).

The discipline. Do not optimize for blog page-loads. Do not write articles aimed at maximizing human time-on-page. Do not chase Hacker News spikes. They are bursts that do not compound. Optimize for: every new article should compound the surface area that an agent can discover and act on via search_blog. Every new tool on the MCP should make the existing content more useful at a higher density (get_article(slug) is more useful when there are 200 articles in the corpus than when there are 20).

If page-loads grow at 30 percent month-over-month and tool-calls do not, you are building a hobby blog, not an AI service.

Content KPIs: quality-first

The build-time half of the dashboard reports on the content corpus directly: total articles, average quality score, percent that passed the quality gate, percent with a hero image, percent rated manually, reader thumbs-up via localStorage.

Total articles is the simplest. Just count(*.md) in src/content/blog/. Don’t game it.

Average quality score is a weighted composite across 13 signals: word count, code blocks, version references, file paths, error lines, caveats, H2 count, concrete numbers, comparison terms, concrete examples, lexical diversity, filler phrases (negative weight), hedging phrases (negative weight). Each signal is regex-extractable from the article body. The weights are style-specific (best_practice_learnings, werthaltige_code_beispiele, smart_infotainment, conclusion). The full table is in config/pipeline-config.json and the rationale per signal is in the content-quality manifest evaluation.

Passed quality gate is the binary version: did the article clear score >= style_min AND word_count >= style_min_wc? If not, the article is built into the dev preview but excluded from production (src/utils/publishable.ts decides). The article still exists in the repo. It just stays invisible until it earns its slot. This caught one article on this site as recently as today. The postmortem of the unblock is in Three Self-Healing Patches in One Day, All the Same Shape.

Hero-image coverage is count(articles with /images/blog/<slug>/hero.webp) / total. Below 95 percent means the image pipeline missed articles that should not be missing. The fix is the Backfill-Heroes pipeline; see the same article for the self-healing scan that closes that gap on every Full Pipeline run.

Images rated is the percent of articles where a manual one-to-five star rating exists in frontmatter (img_score). This is the human-quality-check on FLUX output. Mistral can auto-rate, but the manual rating is the one that decides whether an image gets regenerated.

Reader votes are stored in browser localStorage. This browser only. Not aggregated server-side. Why: the article exists to be useful to the reader in front of it; whether other readers liked it is not data I want to collect, and pretending to collect it would be tracking by another name. The thumbs-up state is for the reader to remember which articles they already validated for themselves.

How to read the dashboard at different cadences

Daily glance (under 2 minutes): open /insights/, look at “MCP tool calls” and “Distinct agents”. That is it. Other metrics do not change fast enough to need daily attention. If both numbers are moving up in parallel, today is a good day. If one is moving and the other is not, you have a question to ask later (which?).

Weekly review (10 minutes): check rate-limit hits for surprises (a new top blocked IP, a sudden burst), reader-vote skew (which articles are getting consistent thumbs-up from your own browsing, those are the ones your subject-matter-self thinks landed), top distinct agents (whois the top three new ones if the top-N list changed).

Monthly review (30 minutes): quality-gate-pass-rate (is it drifting down? Then recent articles are getting thinner), hero-image coverage (any drift means image pipeline is gappy), content mix (count articles per content-type, look for over-investment in one bucket), distinct-agents trend slope over four weeks (linear, accelerating, decelerating?).

Quarterly: re-evaluate which topics actually drove distinct-agents growth. Some posts pull traffic that compounds, some pull traffic that bursts and dies. The pattern is only readable at quarter-scale.

Anti-vanity-metric discipline

Four rules I try to actually follow.

If a number looks suspiciously good, audit the top contributor first. Today’s example again: 334 distinct agents was 5 services in trench coats. Without the audit I would have made decisions based on a number that did not measure what I thought it measured.

If a metric grew 10x in one week, ask “did one bot find me” before asking “did I get found”. The first cause is more common than the second at small scale. Tighten the question before tightening the rate-limit.

If a metric is flat for three months, ask “is the metric measuring something that actually changes” before asking “is my service failing”. Some metrics on small sites are below the noise floor of the underlying signal. Reader-votes on this site is the obvious example.

If you build a new metric, define it in one Python function with a formula in the comment, deploy the function with no GUI flourish, and let it run for a month before touching it. Premature visualization is a form of premature optimization.

The compound lesson

Caddy access logs are the source-of-truth on this site. Every NSM and every content KPI is computed from those logs (NSM) or from src/content/blog/*.md frontmatter (build-time). There are no JavaScript pixels, no analytics SDKs, no Google whatever, no Cloudflare Insights. Every metric is auditable by reading one Python file.

That austerity is a competitive advantage, not a feature absence. The reason your own metrics looked suspiciously good in the past is that they were measured by software that wanted them to look good. When you measure your own numbers in your own one-line Python you cannot lie to yourself for long.

The honest version of the dashboard is smaller than the dishonest version. That is by design.

Try the live POC, then read the architectural critique

The MCP server that produces most of the numbers above is not just instrumentation. It is the proof-of-concept of the whole thesis: that small, well-defined tools on a self-hosted MCP server are the right entry point for self-hosted AI businesses, before any of the L402 / paid-tier infrastructure is needed.

You can try the same tools an external AI agent would call:

/search runs search_blog live in the browser against the same TF-IDF index an MCP client would query. Empty query lists newest. Tag filter narrows. No auth wall.
The Sovereign AI Blog MCP Is Mostly Redundant Today, And That Will Change is the honest critique of why this MCP server is not yet worth installing, and the threshold (200 articles, specialized tools) at which it will be.
Why a Self-Hosted Blog Search Is the Right MCP Proof-of-Concept is the strategy companion that argues search is the perfect first MCP tool because it has a clean prior-art baseline (full-text search), a measurable improvement vector (TF-IDF over raw grep), and zero ambiguity about what success looks like.

If you are building from a DGX as your infrastructure spine, the search MCP server is the smallest concrete shape of the whole business plan. The Insights page is how you watch it grow without lying to yourself about the growth. Every other dashboard you have ever seen makes the opposite trade. Pick the one that lets you sleep.