Caddy + Cloudflare Tunnel: The Reliability Pattern

May 21, 2026 12 min read

Status note (2026-05-24): The sovgrid.org stack migrated away from Cloudflare Tunnel. The current production edge is Caddy + Let’s Encrypt direct on FlokiNET^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.}, with no Cloudflare in the path. This article documents the prior pattern and the reasoning behind both the adoption and the retirement. The migration path described in the final section is what was actually executed.

The pattern: Caddy as the reverse proxy on a small VPS, Cloudflare Tunnel as the secure pipe from the VPS to the home Spark, no inbound ports open at the residential ISP. The trade is that Cloudflare’s edge handles the TLS handshake and the DDoS-class abuse you cannot economically absorb. The trade is named, the migration path exists, and the rented dimension is acceptable for the current threat model.

Quick Take

Architecture: Caddy on Floki VPS as the public-facing reverse proxy. Cloudflare Tunnel daemon runs on the Floki VPS, registering an outbound connection to Cloudflare’s edge. No inbound ports on the residential ISP.

Why this works: Cloudflare absorbs the DDoS-class abuse, Caddy handles the path-level routing, the Spark serves the actual content over the tunnel.

Why this is sovereign-enough: Cloudflare is a named, rented dimension. The migration path away from Cloudflare exists. The data plane in the tunnel is encrypted; Cloudflare sees the metadata (which domain is being served) but not the content (encrypted under the tunnel).

The latency cost: ~50 to 150 ms of added round-trip vs direct connection. For static content with Cloudflare’s edge cache, the cache hit reduces this; for dynamic content the cost is real.

The migration trigger: if Cloudflare changes its terms unacceptably, or if a customer requires “no Cloudflare in the path,” switch to Caddy direct on a hardened VPS with port 443 exposed and your own DDoS mitigation.

Why Caddy instead of nginx

Caddy handles TLS certificate provisioning automatically via ACME, which is why there is no cron job, no certbot script, and no certificate expiry event waiting to surprise you at 02:00. nginx requires you to wire certbot (or an equivalent) and manage renewal separately. In practice, certificate expiry is one of the most common causes of unplanned outages for self-hosted sites. That operational burden is the reason Caddy is the right choice for a one-operator stack.

Caddy’s configuration language is also significantly shorter for the common reverse-proxy case. The Caddyfile shown below is roughly 20 lines. The equivalent nginx config with TLS, headers, and logging is closer to 60-80 lines, each line a potential misconfiguration. For a solo operator the shorter config surface means fewer places to get something wrong.

The specific tradeoff: nginx has more tunable knobs for high-throughput production deployments. Caddy does not yet match nginx on raw performance at 10,000+ requests per second. For sovgrid.org’s traffic profile, which peaks around 1,500 requests per day on a busy article, nginx’s performance ceiling is irrelevant, which is why it does not appear in this stack.

Why not just expose the Spark directly

The naive alternative is to expose port 443 on the residential router, point sovgrid.org at the home IP, and serve traffic directly from the Spark. This works in the happy case and breaks in three predictable ways.

Residential ISPs prohibit it. Most consumer ISPs prohibit running public services on residential connections in their terms of service. The prohibition is rarely enforced for low-traffic sites but is enforced when traffic spikes. A successful blog post that gets a wave of visitors can trigger ISP action that takes you offline.

The home IP is unstable. Even with a static-IP residential service, the address can change without warning during ISP infrastructure events. DNS update lag puts you offline for hours. In my case the German residential connection had three address changes in a single month during ISP infrastructure maintenance, each with a 15-30 minute window before a DNS update propagated.

DDoS-class abuse is unmitigated. A residential connection has no DDoS protection. A single abusive actor with a botnet can take you offline at any cost they choose. Hosting a service that anyone can reach without an intermediate that can absorb abuse is operationally untenable in 2026.

The Cloudflare Tunnel pattern fixes all three. The home connection makes outbound connections only (which residential ISPs allow). The public address is Cloudflare’s anycast network (which is stable and DDoS-protected). The abuse is absorbed before it reaches the home connection.

The architecture

Four components in the path from user to content:

User’s browser makes a request to sovgrid.org.
Cloudflare’s edge receives the request, terminates TLS at the edge, applies caching rules, and routes the request through Cloudflare Tunnel.
Cloudflare Tunnel daemon (running on the Floki VPS) receives the request from Cloudflare and forwards it to Caddy.
Caddy on Floki VPS routes the request by path/host to either the static site content (served from local disk) or to the backend MCP server (proxied to the Spark over a private Tailscale connection).
Backend services on the Spark or on Floki itself produce the response, which travels back through Caddy, Cloudflare Tunnel, Cloudflare’s edge, and the user’s browser.

The Spark itself does not have a public IP. Outbound connections only. The Tunnel daemon registers with Cloudflare and maintains a persistent outbound connection that Cloudflare can route through.

The Caddy configuration

Caddy on the Floki VPS handles the host-based routing. The Caddyfile:

sovgrid.org, www.sovgrid.org {
    root * /srv/sovgrid
    encode gzip zstd
    file_server

    @api path /api/* /mcp/*
    reverse_proxy @api spark.tailnet.local:3000

    log {
        output file /var/log/caddy/sovgrid-access.log
        format json
    }
}

mcp.sovgrid.org {
    reverse_proxy spark.tailnet.local:8888
    log {
        output file /var/log/caddy/mcp-access.log
        format json
    }
}

Key points:

The static site is served from local disk on Floki, not from the Spark. Static content is cached at the edge by Cloudflare; only cache misses traverse the tunnel.
The MCP server is proxied over Tailscale to the Spark. The Tailscale connection is encrypted independent of the Cloudflare path.
Path-based routing splits the API/MCP calls from the static-content calls; each gets a separate access log.

The Cloudflare Tunnel configuration

The Tunnel daemon configuration lives on the Floki VPS. The daemon registers an outbound connection, and Cloudflare routes specified hostnames through it.

/etc/cloudflared/config.yml:

tunnel: sovgrid-tunnel
credentials-file: /etc/cloudflared/sovgrid-tunnel.json

ingress:
  - hostname: api.sovgrid.org
    service: http://127.0.0.1:3000
  - hostname: mcp.sovgrid.org
    service: http://127.0.0.1:8888
  - service: http_status:404

The tunnel is named at the Cloudflare admin console. The credentials file is generated when the tunnel is created. The ingress rules map hostname-to-service.

The systemd unit:

[Unit]
Description=Cloudflare Tunnel
After=network-online.target
Wants=network-online.target

[Service]
ExecStart=/usr/local/bin/cloudflared tunnel --config /etc/cloudflared/config.yml run
Restart=on-failure
RestartSec=10s

[Install]
WantedBy=multi-user.target

The Tunnel daemon will reconnect automatically if Cloudflare’s edge disconnects, which makes the path self-healing.

Why Cloudflare Tunnel over a static-IP VPS (at the time)

The alternative was to rent a VPS with a static IP, open port 443, point the DNS there, and serve traffic directly from the Floki VPS without any Cloudflare in the path. That is, in fact, what sovgrid.org runs today, which is why the comparison is grounded.

The reason Cloudflare Tunnel was chosen in the first phase: DDoS mitigation. A fresh VPS with no mitigation upstream can be taken offline by a volumetric attack for roughly EUR 5-10 per hour on the commodity booter market. Cloudflare’s free tier absorbs that volume because it routes traffic through its anycast network before the request ever reaches the origin. That protection is not available on a raw VPS at the same price point. For a new operator without an established traffic profile, that is the dominant risk.

The second reason: Cloudflare Tunnel does not require the VPS to have a publicly routable IP for the backend. This matters when the origin is a home machine on a residential ISP, specifically because the ISP does not guarantee that port 443 stays reachable. The tunnel is an outbound connection, which ISPs do not block.

The limitation: Cloudflare Tunnel required the DNS for the tunneled hostnames to be managed via Cloudflare’s nameservers. That is a real lock-in, not a theoretical one. Moving away required a DNS migration, not just a config change.

Why the migration away from Cloudflare made sense in 2026-05-24

By May 2026, sovgrid.org had 88 articles live and a traffic profile that made the threat model legible. The average daily request volume was under 2,000 requests per day. At that scale, the DDoS risk is real but not existential. A basic fail2ban setup plus Caddy’s built-in rate limiting handles the realistic attack surface.

The reason the migration happened when it did: the Cloudflare dependency had accumulated. Cloudflare was in the DNS path, the TLS path, and the CDN path simultaneously. That is not a named dependency; that is a single point of failure with a third-party name. When Cloudflare has an outage (which it does, roughly 3-4 times per year based on their public incident history), the site goes down regardless of what the operator does. The migration replaced that dependency with a direct Let’s Encrypt cert on the FlokiNET^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.} VPS. The latency increased by roughly 20-30 ms for European users (because Cloudflare’s edge was often geographically closer than the VPS), but the uptime dependency is now the VPS, which the operator controls.

Why this is sovereign-enough

The rented dimension is named: Cloudflare operates the edge and the tunnel matchmaking. Cloudflare sees the metadata (which domain is being served) and the request headers. Cloudflare does not see the response body content if I have TLS-end-to-end set up correctly with origin certificates, but in the default tunnel configuration, Cloudflare terminates TLS at the edge and re-encrypts to the tunnel. This is a real piece of dependency.

The trade is acceptable for the current threat model. The benefit is the DDoS protection that the operator cannot build alone. The cost is the rented dependency. The decision is named and re-examinable.

The migration trigger: if Cloudflare changes its terms unacceptably, if Cloudflare terminates the sovgrid account, or if a customer engagement requires that no Cloudflare be in the path. The migration target: Caddy on a hardened VPS with port 443 open, the Spark accessible via a separate VPN, and either a smaller-scale DDoS mitigation (anycast via a different provider) or accepting that the site might be temporarily unavailable under DDoS.

Caveats on the sovereign framing:

The tunnel’s “data is encrypted” claim has a limit. Cloudflare terminates TLS at the edge. The request headers, URL, and timing metadata are visible to Cloudflare. For a public blog this is not a meaningful concern. For an operator running private API endpoints or user-authenticated services through the tunnel, this is a limitation that changes the threat model materially.

The sovereignty score for this pattern, compared to running direct on a VPS with no CDN, is lower. The Cloudflare path trades sovereignty for DDoS resilience and operational simplicity. That is an honest trade. It is the wrong choice for an operator whose threat model includes Cloudflare itself as an adversary (for example, a journalist publishing sensitive material in a jurisdiction where Cloudflare can be compelled to disclose request metadata).

The latency cost and DDoS-absorption tradeoff

Direct connection from a Central-European user to the FlokiNET^{₿Affiliate link. You support sovgrid at no extra cost to you. See /support.} EU VPS: roughly 15-35 ms round-trip, measured with curl --write-out '%{time_total}' over 10 requests. That is the baseline.

Cloudflare Tunnel path: roughly 50-150 ms additional, depending on which Cloudflare edge the user hits and where the tunnel daemon is connected. The variance is large because Cloudflare routes through its nearest PoP (point of presence), and the tunnel leg from that PoP to the VPS adds another hop. For a user in Frankfurt hitting a Frankfurt PoP, the overhead was measured at 55-70 ms in testing. For a user in Southeast Asia hitting a Singapore PoP, the overhead was closer to 120-140 ms.

For static content cached at Cloudflare’s edge, the cache hit eliminates the tunnel traversal entirely: the edge serves the cached HTML or image directly, typically in under 10 ms. Cache hit rate for sovgrid.org’s static assets was above 80% during the Cloudflare period, which means the added latency only hit the remaining 20% of requests (mostly first-load HTML on new articles).

For dynamic content (an API call, an MCP tool request), the tunnel traversal is the full additional cost. There is no caching. An MCP tool call that takes 800 ms of LLM inference on the Spark adds 100-150 ms of tunnel overhead on top. At that latency scale the overhead is not the bottleneck.

The DDoS-absorption tradeoff: Cloudflare’s free tier can absorb Layer 3/4 volumetric attacks in the range of hundreds of Gbps. A raw VPS has no equivalent. The cost is that this protection disappears the moment Cloudflare has an incident, changes its terms, or terminates the account. For a public blog where a few hours of downtime is not critical, the protection is worth the dependency. For a service where availability is a hard requirement, this caveat matters: Cloudflare outages have lasted 2-6 hours in past incidents, and the operator has no control over the recovery timeline.

The latency is acceptable for the blog, acceptable for the MCP server (where cold-start LLM inference is the dominant latency component at 1-3 seconds), and would not be acceptable for a low-latency interactive workload where users notice anything above 200 ms. Pick the architecture by the latency budget, not by the simpler configuration path.

Where this fits

For the broader sovereignty test framework, see What Sovereign Actually Means in 2026. For the reference architecture, see The Sovereign AI Stack in 2026. For the Caddy-first static blog stack, see Astro + Caddy: Static-First AI Blog Stack.

The migration away from Cloudflare

The follow-up walkthrough covers the migration path: the alternative DDoS posture, the VPS hardening, and the cutover procedure to Caddy + Let’s Encrypt direct. That is the pattern sovgrid.org runs today. Follow cipherfox@sovgrid.org on Nostr or subscribe to the RSS feed to track updates.

	Today	7d	30d	All-time
Unique readers	—	—	—	—
Page views	—	—	—	—