The GitHub Bot That Cannot Write

June 18, 2026 12 min read

The pitch for an AI code reviewer is always the same: connect your repository, give the bot a token, and it comments on every pull request within minutes. The unspoken part is the token. To comment, the bot needs write access. To run instantly, it needs a webhook pointed at someone else’s cloud. To review your private diffs, it sends them to a model you do not host. Three quiet concessions, all of them pointing the wrong way for anyone who runs their own substrate.

I run a small fleet of public repositories and a local model on a DGX Spark. The model sits idle most of the day. The question was narrow: can I get a useful daily read of my GitHub presence, including a real review of any pull request a stranger opens, without giving anything write access and without a single byte leaving the building? The answer turned into a system that is worth describing, partly because it works and partly because the path to making it work corrected four wrong assumptions I held at the start.

What it does

Every morning a job wakes up inside my private Git server. It reads three things from public GitHub: the notification inbox, the pull requests I still have open against other people’s projects, and any issue or pull request a stranger has opened on one of my repositories. A local model triages all of it into a short briefing. If a stranger opened a pull request, a second tool reads the diff and writes a line-level review. The briefing lands in my Matrix client before I have finished coffee. Nothing is posted back to GitHub. Nothing is sent to a third party.

Once a week the same job also scans the upstream projects I have contributed to, pulls their open issues, and asks the model which ones match my skills well enough to be worth a fix. That last part turns a maintenance chore into a contribution pipeline, which is the difference between a tool that watches and a tool that earns its keep.

The shape of the thing

Five decisions define the architecture. Each one had an obvious default that I rejected for a specific reason.

Polling, not webhooks. The instant-review experience needs GitHub to call you when a pull request opens. That means a public endpoint listening for GitHub’s events, which means an inbound ear on your infrastructure that exists to react to the outside world. For a setup whose entire premise is sovereignty, that is the wrong shape. A scheduled poll reaches out, reads, and hangs up. The cost is latency: a pull request opened at 10:00 is seen at the next run, not at 10:01. For a personal fleet with single-digit pull request volume, that trade is free. The repositories are quiet. The poll is cheap. Nobody is waiting.

A sandboxed CI job, not a cron line. The naive way to schedule this is a systemd timer running a script on the host. It works, and it is invisible the moment anything goes wrong. I already run a Gitea instance with an Actions runner, so the job lives there instead: a scheduled workflow in a container, with the automation itself stored as versioned YAML, logs in a web UI, and secrets in a proper store instead of a dotfile. The runner was already up. The marginal cost of using it was zero, and the gain was a sandbox plus an audit trail.

Read-only by architecture, not by good behavior. This is the load-bearing decision. A prompt that says “do not post anything” is a suggestion. I wanted a guarantee. The triage model is handed a fixed set of tools that each run one hardcoded read command, with no shell access and no way to construct a write. The pull request reviewer runs with publishing explicitly disabled. And the token itself is meant to be a fine-grained credential with read scopes only. Three independent layers, any one of which is sufficient to stop a write. The model never gets to decide whether to post, because posting is not in its vocabulary.

A private boundary that never reconciles with the public one. The workflow lives in my private Git server and is never mirrored to GitHub. Data flows one way: the private side reaches out to read public GitHub, and nothing private flows back. The workflow file sits under the Gitea-specific path rather than the GitHub one, so the public platform could never interpret it even by accident. Internal endpoints, host names, and paths stay on the private side. This is a rule I hold for the whole stack: a tool can exist as a generic public repository and a personal instance with real paths at the same time, and the two must never be forced to match.

Deliver the briefing, do not file it. The first working version wrote a clean report and left it in the job log. A report nobody opens is not a briefing. The final step pushes the result to Matrix, the one channel I actually read. This sounds trivial. It is the difference between a system that produces value and a system that produces artifacts.

Why not the obvious tools

The reviewer at the center of this is not mine. The line-level pull request review is done by an existing open-source tool, pointed at my local model through its OpenAI-compatible endpoint. Reinventing that would have been vanity. What I built is the sovereign wrapper around it: the account-level briefing, the read-only guarantees, the delivery, the private boundary. Here is how that stacks up against the alternatives I considered.

	This system	SaaS reviewer	Cloud CI plus self-hosted runner	Hand-rolled script
Model	your local GPU	their cloud	your local GPU	your choice
Diffs leave the building	no	yes	no	no
Can post to GitHub	no, by design	yes	yes	depends
Inbound exposure	none	none	webhook or runner	none
Runs untrusted PR code	no	no	yes, the real risk	no
Account-level briefing	yes	no, per PR only	no	if you write it
Versioned and sandboxed	yes	n/a	yes	rarely
Cost	electricity	subscription	electricity	your time

The SaaS reviewers are genuinely good at the review itself. They are also a subscription, a data exfiltration path, and a bot with write access to your code. For a sovereign stack they are a non-starter on all three counts.

The mainstream self-hosted path is a cloud CI workflow that calls a local model through a self-hosted runner. It gives you instant reviews. It also runs the workflow on every pull request, including pull requests from forks, and a self-hosted runner executing untrusted fork code on a public repository is a documented remote code execution risk. My reviewer only needs the diff, never the right to run the contributor’s code. Polling from my own side, reading the diff through an API, executing nothing, is the safer shape even before you count the sovereignty argument.

The hand-rolled option is where I started, and the honest assessment is that the review quality of a purpose-built tool beats my own prompts. The right division of labor is to keep the lightweight account briefing as mine and delegate the deep review to the tool that already does it well.

Features, in plain terms

A daily briefing covering inbox, your outreach pull requests, repository hygiene, and any incoming contribution.
Line-level review of incoming pull requests by a local model, with publishing disabled.
A weekly scan that ranks open upstream issues by how well they match your skills, with the skill profile derived from your own repositories’ topics rather than hardcoded.
The repository list is discovered live from the API on every run. Add a repository and the system adapts the next morning. Nothing is hardcoded.
Delivery to Matrix, so the briefing reaches you instead of a log file.
Failure delivery on the same channel, so a broken run is loud rather than silent.

The four times the build lied

A system is not hardened by writing it. It is hardened by the moment it tells you green when it is broken, and you catch it. This one did that four times.

The first lie was the shell. The job steps used set -euo pipefail, the runner defaulted to dash, and dash has no pipefail. The first step died on its second line, the job went red, and the actual logic never ran. An easy fix, but a reminder that the default shell is not the shell you think it is.

The second lie was the false green. The container image shipped without the GitHub CLI, so every command that needed it failed. The report script caught those errors and exited zero anyway, and the job reported success while producing a report with no data in it. A success that did no work is worse than a failure, because nobody looks at it. The fix was a sanity step that fails hard if the model is unreachable or the CLI is missing, so a broken run can no longer masquerade as a healthy one.

The third lie was the quietest and the most dangerous. The reviewer tool requires Python 3.12. The container’s base image shipped Python 3.10. When the job installed the tool at runtime, the package manager silently fell back to a four-year-old version of it, because the modern version simply does not exist for 3.10. My own test of the modern tool had run on the host, on 3.12, against a throwaway pull request, and passed. The job would have run a different, ancient version that I never validated. The only reason I caught it was building a proper image and watching the install fail loudly on a version pin. The fix doubled as an optimization: a baked image with the CLI and a pinned reviewer preinstalled in an isolated environment, so the daily job needs no network install, cannot drift, and runs the exact version I tested.

The fourth lie was the delivery. The briefing step reported that it sent the message. It did not. The Matrix server binds to loopback on the host, and a container reaching the host gateway cannot see a loopback-only port. The model endpoint worked because that service listens on all interfaces; the chat server did not because it listens on one. The fix was to put the chat server on the same Docker network as the job and address it by name on its internal port. Reachability is not a property of a host. It is a property of an interface.

None of these were visible from reading the code. All of them were visible from running it and refusing to trust the word “success.”

What is still open

Honesty is cheaper than a second incident, so here is the gap list.

The write path is untested. Setting repository topics, cutting a release: the model could propose these, but I have only validated the read path. Any autonomy on writes waits behind a tested write loop and an explicit approval step.

The token is not yet read-only. During the build the job runs with a broader credential, with the no-publish flag as the active guard. Rotating it to a true read-only fine-grained token is the next desktop task, and a reminder is already scheduled to land on Monday.

The chat delivery depends on a runtime network attachment that does not survive a container rebuild. The durable fix is to declare that network in the chat server’s own configuration. Until then the briefing degrades quietly: the job still runs, you just stop seeing the message, which is exactly the kind of silent failure I spent the rest of the build eliminating. It is on the list for that reason.

Planned optimizations

Two are already designed. A rule-aware review would feed my own publishing rules into the pull request reviewer, so it flags a contribution that would leak identity or reconcile the public and private sides of a repository, which a generic reviewer cannot know to look for. A weekly digest would turn the daily snapshots into trend lines: stars over time, pull request velocity, which repository is drawing attention.

The larger question is whether the novel part of this deserves to become its own tool. The review engine is not novel; that ground is taken. The account-level sovereign briefing, read-only by architecture, delivered to your own channel, running against your own model, is a gap nobody fills. The discipline that has served the rest of my stack is to ship a tool only after it has run in my own production for weeks, not before. So the plan is to dogfood it, and if the daily briefing keeps earning its place, extract the briefing and leave the review to the tool that already does it.

What it taught me

Three lessons outlived the code.

The first is that a 35-billion-parameter model running locally is entirely capable of a multi-step, tool-driven read of an account: fetch the playbook, check the inbox, list the pull requests, write the summary, with zero invented repositories. The old belief that local models are too weak for agentic work was true for small models inside a heavy terminal harness. It is not true for a competent model driven through a direct API with a tight tool set.

The second is that the value was never in the review. It was in the delivery. The same report that died in a log file for two iterations became useful the instant it arrived in a channel I read. Build the pipe to the human first.

The third is the one I keep relearning: a green check mark is a claim, not a fact. This system told me it succeeded four times while doing something wrong. Each fix made the next lie harder to tell. That is what hardening actually is. Not the absence of failure, but the steady removal of every way the system can fail quietly.

The bot reviews my repositories every morning and cannot touch them. That constraint is not a limitation I am working around. It is the entire point.