Automate Better Blog Posts: Self-Hosted Article Optimization That Actually Works

April 15, 2026 6 min read

New to self-hosting AI? The Self-Hosted AI: Start Here hub walks the hardware-decision tree, inference-engine choice, and the operational gotchas that bite hardest in the first three months. Read it before or after this one, whichever fits your stage.

The Optimizer Script

cd /data/projects/sovereign-blog && python3 scripts/optimize_articles.py --eeat-below 4 --no-research

This command optimizes all articles with EEAT scores below 4, skipping web research for faster runs. After completion, rebuild your site:

npm run build

The script checks links, analyzes EEAT gaps, and updates articles in place while preserving your original structure and voice.

How Link Repair Works

def check_external_links(content: str) -> list[tuple[str, int]]:
    broken_links = []
    for link in extract_external_links(content):
        status = head_request(link)
        if status in {404, 410, "timeout", "error"}:
            broken_links.append((link, status))
    return broken_links

The script:

Scans your article for external links
Sends HEAD requests to each URL
Logs broken links (404/410/timeouts)
Uses SearXNG to find replacements from the same domain or relevant pages
If no replacement is found, removes the link but keeps the anchor text

Internal links (localhost, host.docker.internal) are skipped, they’re not meant to be externally accessible.

EEAT Gap Analysis Without Guesswork

def compute_eeat_gaps(content: str) -> dict[str, int]:
    return {
        "expertise": max(0, 8 - count_code_blocks(content)),
        "experience": max(0, 12 - count_specificity_markers(content)),
        "authority": max(0, 1200 - count_words(content)),
        "trust": max(0, 9 - count_caveats(content))
    }

The script calculates exact gaps:

Expertise: Needs 8+ code blocks (```)
Experience: Needs 12+ specificity markers (version strings, file paths, error outputs)
Authority: Needs 1200+ words total
Trust: Needs 9+ caveats/warnings/notes

Each gap generates a specific instruction like “Authority gap: 648 words short, expand the most technical sections with additional detail and context.” or “Trust gap: only 1 caveat found, add 8 more honest limitation notes.”

Web Research That Actually Helps

curl "http://localhost:8888/search?q=$(encode_query "$title $tags")&format=json" | jq -r '.[0:4] | .[].content'

When enabled, the optimizer:

Queries SearXNG with your article title and tags
Takes the top 4 snippets as fresh context
Passes them to Mistral to verify facts and update references

Disable with --no-research if SearXNG is down or you want deterministic output.

Mistral’s Role: Update, Don’t Rewrite

prompt = f"""
Update this article:
{article_body}

Fix these broken links:
{broken_links}

Fill these EEAT gaps:
{eeat_feedback}

Use this web context to verify facts:
{web_context}

Rules:
- Preserve structure and voice
- Make minimal targeted edits
- Never rewrite from scratch
- Temperature: 0.2 (conservative)
"""

Mistral receives the full article body plus specific instructions. The prompt explicitly forbids rewriting, only repair, update, and optimize. Temperature 0.2 ensures conservative edits with minimal hallucination risk.

What Stays Unchanged

date: 2026-04-16
heroImage: /images/cleanup-hero.webp
protected: false
tags: [linux, cleanup]

The optimizer never touches:

Publish dates
Images
Featured status
Protection flags
Tags
Affiliate links
Diagram references

Protected articles (protected: true) are always skipped, even with --force.

Cron Schedule for Zero Effort

0 4 * * 0 cd /data/projects/sovereign-blog && \
python3 scripts/optimize_articles.py --eeat-below 5 --no-research >> /data/logs/optimizer.log 2>&1 && \
npm run build

Run weekly on Sunday at 4 AM:

After the main pipeline publishes new content
When Mistral is less busy
With --no-research for speed
Logs output for debugging

Articles with perfect 5/5/5/5 EEAT scores are skipped automatically, making each run faster over time.

What I Actually Use

Mistral Small 4: Handles targeted edits without rewriting my voice

SearXNG: Finds fresh sources without tracking or ads

Astro: Builds static sites with zero runtime overhead

What this script does NOT optimize for

Naming what is not in scope is as important as naming what is. The optimizer script is targeted at structural and shape signals: word count, code-block density, section depth, em-dash count, filler-phrase count. It does not check claim accuracy. A confident-sounding article full of plausible-but-wrong version numbers will optimize cleanly because the score gate measures shape, not truth.

The factcheck-gate added in May 2026 closes that hole on the registry-existence side (Docker images, PyPI versions, npm packages). It does not close the hole on operational claims like “this configuration delivers 35-41 tok/s” which the optimizer cannot test. Those claims still need human attestation, and the optimizer is not the right place to add that check.

When to run this script and when to leave articles alone

The optimizer is meant for two narrow scenarios: a freshly-imported article from Mistral that needs a polish-pass before going live, and a backfill across older articles after a scoring-rule change. Outside those two cases, running the optimizer regularly is a good way to slowly grind every article into the same template-shape, which is exactly what the optimizer is supposed to prevent. The right cadence is event-driven (after rule changes, after pipeline updates, after author-flagged drift) rather than calendar-driven.

The one optimization that matters more than the script does

Most of the score-uplift across the corpus during the May 2026 polish-pass came not from any single signal the optimizer measures, but from removing AI-tells that the optimizer flags as negatives: em-dashes, filler phrases, three-bullet repetition. Those are the easiest signals to fix mechanically, and they happen to also be the highest-impact signals on reader-perception. Mistral generates them by default; the optimizer reduces them by gating; humans can preempt them in the prompt-template. The most efficient long-term improvement is upstream prompt-template work, not downstream optimizer-pass work.

The link-repair sub-feature, honestly

The link-repair pattern (find broken internal links, suggest replacements) is the optimizer feature with the highest false-positive rate. About one in three suggested replacements is structurally correct but semantically wrong (the suggested article is the right slug-shape but the wrong topic for the context the link sits in). Human review on every suggested replacement is non-optional. Treating the link-repair output as automatic is how broken backlinks turn into worse-broken backlinks pointing to unrelated articles.

The pattern this script most reliably catches is the slow drift toward Mistral-template prose: paragraph after paragraph of plausible-sounding sentences that read fine in isolation and do not actually carry information when read as a sequence. The optimizer does not detect that drift directly (no LLM-judges-LLM loop is reliable enough at this scale), but the proxy signals it does measure (filler-phrase count, hedging-phrase count, sentence-length stdev, three-bullet repetition) collectively flag the drift indirectly with enough precision to be useful as a signal-to-author rather than as an automatic-fix. The author-side intervention is to read the flagged paragraphs out loud, which is the simplest reliable test for “does this prose actually say something”. If reading it out loud produces no surprise or no learning, it goes.

The optimizer’s most under-stated property is that it makes the cost of bad writing visible at exactly the moment it would otherwise become invisible. A draft that scores 100 against a 130 floor with three filler-phrase warnings and an em-dash spike is obviously not ready; the optimizer surfaces that obvious-on-inspection state without requiring a human to inspect every draft. That visibility-of-state-by-default property is the thing that scales editorial discipline beyond what manual review can sustain at production cadence. The optimizer is not a quality engine, it is a state-visibility tool, and that distinction is the reason it earns its place in the pipeline rather than feeling like extra ceremony around a process that should just work.