A senior engineer walks through the four hidden failures that made his backup system look healthy while actually failing for six weeks. Includes the exact commands, error messages, and hardware specs that turned a disaster into a reliable setup.

How Four Silent Failures Made My Backup System a Security Theater

Last week this failed because a systemd timer showed green while silently eating every backup job.

Quick Take

  • Four independent failures masqueraded as “working” for six weeks
  • Systemd timer status ≠ job success; journalctl was the only truth
  • FAT32 USB stick silently capped backups at 4 GB
  • age encryption keys lived in two different directories

The Setup That Wasn’t Working

The backup system was “active”: systemctl status sovereign-backup.timer showed green, but no backup had ever completed.

ExecStart=/usr/local/bin/backup.sh

The script lived at /data/projects/sovereign-backup/backup.sh, but the service pointed to a non-existent path. Systemd happily reported the timer as enabled and active, but every launch silently failed because the executable didn’t exist.

Why this breaks: A green timer status means the timer fired, not that the job succeeded. Because the service file referenced a missing binary, systemd logged nothing useful to the timer status. The only signal was in the service logs, which nobody checked.

The Four Silent Killers

Missing Binary Execution

ExecStart=/usr/local/bin/backup.sh  # does not exist

The service tried to run a script that wasn’t there. No error in the timer status, no loud failure: just a silent skip every time the timer fired.

Missing Encryption Tool

The backup script checked for age before running. If age wasn’t installed, the script exited cleanly with a preflight error that went straight to the journal: a place nobody monitored.

apt install age

Key Path Mismatch

Keys were generated under /root/.age-identity and /root/.age-recipient, but the script expected them under /data/secrets/age-identity. The disconnect between documentation and implementation meant the script couldn’t encrypt anything, even if it ran.

FAT32 USB Stick Limit

The first USB stick was formatted as FAT32. Compressed backups grew to about 1.2 GB per day. By the third backup, the stick hit the 4 GB file size limit and failed with “file too large,” but the error was never surfaced.

The Fix That Actually Worked

1. Point Service to the Real Script

# sovereign-backup.service (fixed):
ExecStart=/data/projects/sovereign-backup/backup.sh

# Hardening flags added:
ProtectSystem=strict
ReadWritePaths=/data/backups /var/log
NoNewPrivileges=true

2. Install age and Relocate Keys

apt install age
cp /root/.age-identity /data/secrets/age-identity
cp /root/.age-recipient /data/secrets/age-recipient
chmod 600 /data/secrets/age-identity

3. Replace FAT32 with ext4 and exFAT

A 256 GB Samsung USB-C stick was repartitioned:

PartitionSizeFormatMount
sdb140 GBext4/mnt/sovereign-usb (backups)
sdb2~199 GBexFAT/mnt/sovereign-usb-media (media)

exFAT removes the 4 GB file limit and plays nicely with macOS, Windows, and Android.

4. Atomic Writes with Error Traps

TMP_FILE="${FINAL_FILE}.tmp"
trap 'rm -f "$TMP_FILE"; log "Aborted"' ERR

tar ... | pigz -c | age --recipient ... --output "$TMP_FILE"
mv "$TMP_FILE" "$FINAL_FILE"
trap - ERR

This prevents partial or corrupted backups if the process aborts mid-write.

5. Add ReadWritePaths for systemd Hardening

The dashboard service used ProtectSystem=strict, which blocked writes to /mnt/sovereign-usb. The fix was to whitelist the mount point:

ReadWritePaths=/data /var/log /var/lib/tor /var/lib/aide /mnt/sovereign-usb /mnt/sovereign-usb-media

6. Dashboard and Desktop Controls

Backups can now be triggered from the Grid Dashboard (http://localhost:8443) or a desktop app with two options: NVMe (daily automated) or USB (manual, 30-day retention).

How to Check Your Own Backup Is Actually Backing Up

Four checks I now run weekly. Each catches a different failure mode the green-timer status hides. I missed all four for six weeks. You should not.

1. Did the service succeed, not just the timer fire?

journalctl -u sovereign-backup.service --since "24 hours ago" \
  | grep -E "Started|Succeeded with result|Failed|Main process exited"

A healthy run produces both a “Started sovereign-backup.service” line and a “Succeeded” line. Started-without-Succeeded means your service exited 0 in preflight: the exact trap that hid my failures for six weeks.

2. Did a file actually land on disk, and is it the right size?

ls -lh /mnt/sovereign-usb/backups/ | head -5

Two things to verify: the newest mtime is within your expected interval, and the size is consistent with prior backups (within ±20% is normal day-to-day variance). A file that is suddenly 200 KB when yesterday’s was 1.2 GB means the archive truncated.

3. Can the encryption key still be read by the service user?

sudo -u root stat /data/secrets/age-identity

Output must show the service user (or root, if the unit runs as root). If a package update or chmod sweep moved the key to 0400 cipherfox, the systemd service running as root reads it fine, but a Docker container running as 1000 gets permission-denied silently.

4. Is the latest backup actually restorable, end to end?

LATEST=$(ls -t /mnt/sovereign-usb/backups/*.tar.age | head -1)
age --decrypt --identity /data/secrets/age-identity "$LATEST" \
  | tar -tzf - | head -5

This decrypts, decompresses, and lists the first five archive entries: exercising every layer of the pipeline. If age rejects the key, if tar reports “unexpected EOF in archive”, or if you get zero entries, you know now, when there is time to fix it, not the day you need the backup.

A backup that passes all four checks every week is the only kind that earns the word “backup”. Anything less is theater.

What to Watch Out For

What I Actually Use

  • age: Asymmetric encryption for Sovereign AI backups
  • ext4 + exFAT USB stick: 40 GB for backups, rest for media
  • systemd with atomic writes: Prevents partial backups on failure
Flow

Silent Backup Failures

From undetected failures to robust fixes

1
Missing Binary Service points to non-existent script
2
Timer Green Status shows active but no execution
3
Preflight Errors Script exits silently on missing tools
4
Key Path Mismatch Encryption keys in wrong directories
5
FAT32 Limit USB stick caps files at 4GB
6
Correct Script Path Service updated to real executable
7
Install age Tool Add missing encryption dependency
8
Relocate Keys Move keys to expected locations
9
Replace FAT32 Switch to ext4/exFAT for large files
10
Atomic Writes Add error handling to backup script
Illustration: How Four Silent Failures Made My Backup System a Security Theater

Was this worth it? Zap the article.

Value for value, no signup. Sats go straight to the writer.