Why 334 Unique IPs Was Really 5 Services in Trench Coats
My MCP-server Insights page had been showing 334 unique agents and 3253 tool calls in a rolling 30-day window. That sounds like reach. I had been quoting the number internally as “external discovery is working”. (The architectural argument behind every number on that page lives in the companion deep-dive How to Read the Insights Dashboard for a DGX-Spark Business. The story below is the receipts for one of its claims: that the “Unique agents” tile was lying to me.)
If you run any service that counts “unique users” or “unique IPs”
Open your access log right now. One line:
awk '{print $1}' access.log | cut -d. -f1-3 | sort | uniq -c | sort -rn | head -10That is the top 10 /24 ranges by hit count, ignoring User-Agent. If one /24 is responsible for more than 30% of your traffic, your “unique users” headline is being dominated by one client. Read on for what to do about it that is not “block them” (usually wrong) and not “ignore it” (also wrong).
I patched the aggregator to dedupe agents by (User-Agent, IP /24). One change. The number dropped to 318, then to 314 after self-traffic was filtered. More importantly the top contributor became visible:
2860 hits 152.233.42.0/24 node # 86% of all external hits
2465 hits 91.59.53.0/24 Chrome # operator's home DSL, now excluded
355 hits 152.236.8.0/24 node
162 hits 204.93.227.0/24 node
162 hits 216.246.40.0/24 node
108 hits 64.34.84.0/24 node
Five of the top six are Node.js HTTP clients on US datacenters. Whois on the leading one:
$ whois 152.233.42.201
inetnum: 152.233.0.0 - 152.233.127.255
hostname: unn-152-233-42-201.datapacket.com
org: AS60068 Datacamp Limited
city: Ashburn, Virginia
DataPacket is a cloud-hosting reseller. Their address space backs NordVPN exit-nodes, Smartproxy services, and a long list of self-hosted automation setups. The node user-agent rules out browsers. This is one automated client (or one Lambda fleet behind a static IP block) making roughly 95 calls a day to my MCP server. For 30 days. From a single /24.
I was reporting that as 334 distinct agents.
What the original aggregator counted
The pre-patch version of nsm-aggregate.py did this:
"unique_ips": len({remote_ip(e) for e in mcp_external if remote_ip(e)}),
Set-of-strings on the raw client IP. Every distinct IP equals one distinct agent. Two consequences:
- Rotating cloud-IP services inflate the number. A Lambda fleet rotating through 50 IPs in the same /24 shows up as 50 agents. Same code, same UA, same operator, counted fifty times.
- The operator’s own DSL inflates it too. My home Telekom range (91.59.0.0/16) rotates internally over the months. The aggregator had no way to recognize my own traffic as a single “agent”, or, ideally, to exclude it entirely from the external-reach number.
Before the patch, my self-traffic accounted for about 31% of MCP tool calls. Stopping a /health-polling loop on my end (after a separate audit confirmed I was the source) brought that to under 1% within a day. But the headline metric still treated those calls as external discovery.
The patch
Two helpers and one substitution. Helpers first:
import ipaddress, os
SELF_NETS = [
ipaddress.ip_network(s.strip(), strict=False)
for s in os.environ.get("NSM_SELF_IPS", "91.59.0.0/16").split(",")
if s.strip()
]
def is_self_ip(ip_str: str) -> bool:
try:
return any(ipaddress.ip_address(ip_str) in net for net in SELF_NETS)
except ValueError:
return False
def ip_prefix_24(ip_str: str) -> str:
"""/24 for IPv4, /64 for IPv6. Groups rotating cloud-IPs as one agent."""
try:
ip = ipaddress.ip_address(ip_str)
prefix = 24 if isinstance(ip, ipaddress.IPv4Address) else 64
return str(ipaddress.ip_network(f"{ip}/{prefix}", strict=False))
except ValueError:
return ip_str
Then the aggregation:
agent_fingerprints = set()
agent_fp_no_self = set()
agent_fp_counter = Counter()
for e in mcp_external:
ip = remote_ip(e)
if not ip:
continue
fp = (user_agent(e)[:60], ip_prefix_24(ip))
agent_fingerprints.add(fp)
agent_fp_counter[fp] += 1
if not is_self_ip(ip):
agent_fp_no_self.add(fp)
fp is a tuple of (UA-first-60-chars, IP-/24-prefix). A rotating Lambda fleet collapses to one entry. A different bot in the same /24 but with a different UA stays separate. My own DSL gets filtered out of the external count.
I also exposed the top contributors so the totals stay auditable:
top_distinct_agents = [
{
"ua": (ua or "<empty>")[:50],
"ip_prefix": ip_prefix,
"hits": cnt,
"is_self": is_self_ip(ip_prefix.split("/")[0]),
}
for (ua, ip_prefix), cnt in agent_fp_counter.most_common(20)
]
The Insights page now renders distinct_agents_no_self as the headline metric, with raw unique-IPs as a sub-line for back-compat readers. The full top_distinct_agents list is in the raw API at /api/nsm-stats.json for anyone who wants to inspect.
Why I’m not blocking the top bot
152.233.42.0/24 makes about 95 calls per day. My rate limit is 60 per minute per IP. The bot is 60 times under that ceiling. Tightening the rate limit by any reasonable factor would not constrain this client and would knock out Smithery and Glama discovery probes, which run at 3 to 5 per hour each. Smithery and Glama probes are my distribution channel. Killing them to block one indifferent scraper would be self-injury.
The content the bot reads is also already public. curl https://sovgrid.org/blog/<slug>/ returns the same article body that search_blog() and get_article() return through MCP. There is no information leak to plug. The MCP server is a more structured way to access the same content the website serves anyone with a browser. You can try the same search_blog tool an AI agent would call: /search runs it live in the page, against the same indexed corpus, with no auth wall. The architectural reasoning behind why this small search-server is the right MCP proof-of-concept (and why it is honest about being mostly redundant today) is in The Sovereign AI Blog MCP Is Mostly Redundant Today, And That Will Change and the strategy companion Why a Self-Hosted Blog Search Is the Right MCP Proof-of-Concept.
What I lose to this bot today is bandwidth and a slot in the headline number. The bandwidth is negligible (sub-millisecond responses on a 1 Mbps-capable VPS). The headline number was wrong anyway. The patch above made it correct.
What I cannot recover from this bot is a V4V tip, an affiliate click, a Lightning channel, a newsletter signup. The bot is not a customer today. It might be a customer tomorrow if I run a self-hosted L402 tier (pay-per-call Lightning HTTP-402 metering), which is on the roadmap for Q3 2026. Until then it costs me nothing and signals nothing.
Early warning instead of rate-tightening
The real risk isn’t this bot. It’s a bot like this that suddenly does 10,000 calls in an hour because someone forked a script and forgot the rate-limit. That’s the case where blocking matters.
So instead of constraining the average case, I added an anomaly watcher that runs every hour:
# Reads /api/nsm-stats.json, tracks per-/24 hit counts in a state file.
# Threshold: delta >= 500 hits AND rate >= 100/h since last run.
# Triggers a matrix-room push via the existing notify-matrix.sh.
if delta >= MIN_DELTA and rate >= MIN_RATE:
alerts.append((prefix, ua, delta, rate, hits))
Smithery and Glama probes stay well below the threshold. My self-IPs are excluded. A real scrape-burst (1000+ calls in 1-2 hours) lights up Matrix immediately, with the IP, the UA, the rate, and a one-line UFW-block suggestion. The first action is always a human decision.
Lesson
The aggregator was not lying. It counted exactly what it said it counted: distinct remote-IP strings. It was answering the wrong question.
I was using the number as a proxy for “how many distinct services have discovered my MCP.” The right way to count that is (UA, network-prefix) tuples, with my own ranges excluded. The dedupe is one line of Python. The self-filter is an env-var. The difference between the two answers was 86% concentrated in one /24.
The pattern is general. If a headline metric on your dashboard is going up faster than your subjective sense of reach, audit the top contributor. If a single client is doing 80% of the hits, your number is measuring the client, not the audience.
Five services in trench coats can dress up as 334 agents if you count them by their IP-pant-leg instead of their face. Now the number on my page is smaller and the story is truer.
What I’m watching for next
A few failure modes the new aggregator still cannot catch, ranked by how much they would mislead the next person reading my Insights page:
- One operator behind multiple /24 ranges across multiple cloud providers. A determined scraper distributes its calls across AWS, GCP, and Azure /24 prefixes. The aggregator sees ten “distinct” agents where there is one. ASN-level grouping would catch most of this, and the next iteration of the script will pull AS-numbers from a Team Cymru lookup when the daily cron runs.
- One human visitor through a privacy proxy whose exit-node rotates. Apple Private Relay, NordVPN, and similar services each rotate /24s aggressively. The aggregator counts each exit-IP separately. This direction biases the number upward, which is the safer error for a “reach” metric, but worth knowing.
- Bot networks that share a UA string. Two unrelated services both shipping
User-Agent: nodefrom completely different /24s correctly count as two distinct agents in my current dedupe. That is the right answer for “how many distinct services”. If I ever want “how many distinct operators” I would need to look at request-timing fingerprints, which is its own research problem.
The patch is not the final word. It is a strictly more honest answer than the previous one. The next ratchet up in honesty (ASN grouping) is queued. Everything below that is asking the data for more certainty than the access log can provide.