How Four Silent Failures Made My Backup System a Security Theater
Last week this failed because a systemd timer showed green while silently eating every backup job.
Quick Take
- Four independent failures masqueraded as “working” for six weeks
- Systemd timer status ≠ job success; journalctl was the only truth
- FAT32 USB stick silently capped backups at 4 GB
- age encryption keys lived in two different directories
The Setup That Wasn’t Working
The backup system was “active”: systemctl status sovereign-backup.timer showed green, but no backup had ever completed.
ExecStart=/usr/local/bin/backup.sh
The script lived at /data/projects/sovereign-backup/backup.sh, but the service pointed to a non-existent path. Systemd happily reported the timer as enabled and active, but every launch silently failed because the executable didn’t exist.
Why this breaks: A green timer status means the timer fired, not that the job succeeded. Because the service file referenced a missing binary, systemd logged nothing useful to the timer status. The only signal was in the service logs, which nobody checked.
The Four Silent Killers
Missing Binary Execution
ExecStart=/usr/local/bin/backup.sh # does not exist
The service tried to run a script that wasn’t there. No error in the timer status, no loud failure: just a silent skip every time the timer fired.
Missing Encryption Tool
The backup script checked for age before running. If age wasn’t installed, the script exited cleanly with a preflight error that went straight to the journal: a place nobody monitored.
apt install age
Key Path Mismatch
Keys were generated under /root/.age-identity and /root/.age-recipient, but the script expected them under /data/secrets/age-identity. The disconnect between documentation and implementation meant the script couldn’t encrypt anything, even if it ran.
FAT32 USB Stick Limit
The first USB stick was formatted as FAT32. Compressed backups grew to about 1.2 GB per day. By the third backup, the stick hit the 4 GB file size limit and failed with “file too large,” but the error was never surfaced.
The Fix That Actually Worked
1. Point Service to the Real Script
# sovereign-backup.service (fixed):
ExecStart=/data/projects/sovereign-backup/backup.sh
# Hardening flags added:
ProtectSystem=strict
ReadWritePaths=/data/backups /var/log
NoNewPrivileges=true
2. Install age and Relocate Keys
apt install age
cp /root/.age-identity /data/secrets/age-identity
cp /root/.age-recipient /data/secrets/age-recipient
chmod 600 /data/secrets/age-identity
3. Replace FAT32 with ext4 and exFAT
A 256 GB Samsung USB-C stick was repartitioned:
| Partition | Size | Format | Mount |
|---|---|---|---|
| sdb1 | 40 GB | ext4 | /mnt/sovereign-usb (backups) |
| sdb2 | ~199 GB | exFAT | /mnt/sovereign-usb-media (media) |
exFAT removes the 4 GB file limit and plays nicely with macOS, Windows, and Android.
4. Atomic Writes with Error Traps
TMP_FILE="${FINAL_FILE}.tmp"
trap 'rm -f "$TMP_FILE"; log "Aborted"' ERR
tar ... | pigz -c | age --recipient ... --output "$TMP_FILE"
mv "$TMP_FILE" "$FINAL_FILE"
trap - ERR
This prevents partial or corrupted backups if the process aborts mid-write.
5. Add ReadWritePaths for systemd Hardening
The dashboard service used ProtectSystem=strict, which blocked writes to /mnt/sovereign-usb. The fix was to whitelist the mount point:
ReadWritePaths=/data /var/log /var/lib/tor /var/lib/aide /mnt/sovereign-usb /mnt/sovereign-usb-media
6. Dashboard and Desktop Controls
Backups can now be triggered from the Grid Dashboard (http://localhost:8443) or a desktop app with two options: NVMe (daily automated) or USB (manual, 30-day retention).
How to Check Your Own Backup Is Actually Backing Up
Four checks I now run weekly. Each catches a different failure mode the green-timer status hides. I missed all four for six weeks. You should not.
1. Did the service succeed, not just the timer fire?
journalctl -u sovereign-backup.service --since "24 hours ago" \
| grep -E "Started|Succeeded with result|Failed|Main process exited"
A healthy run produces both a “Started sovereign-backup.service” line and a “Succeeded” line. Started-without-Succeeded means your service exited 0 in preflight: the exact trap that hid my failures for six weeks.
2. Did a file actually land on disk, and is it the right size?
ls -lh /mnt/sovereign-usb/backups/ | head -5
Two things to verify: the newest mtime is within your expected interval, and the size is consistent with prior backups (within ±20% is normal day-to-day variance). A file that is suddenly 200 KB when yesterday’s was 1.2 GB means the archive truncated.
3. Can the encryption key still be read by the service user?
sudo -u root stat /data/secrets/age-identity
Output must show the service user (or root, if the unit runs as root). If a package update or chmod sweep moved the key to 0400 cipherfox, the systemd service running as root reads it fine, but a Docker container running as 1000 gets permission-denied silently.
4. Is the latest backup actually restorable, end to end?
LATEST=$(ls -t /mnt/sovereign-usb/backups/*.tar.age | head -1)
age --decrypt --identity /data/secrets/age-identity "$LATEST" \
| tar -tzf - | head -5
This decrypts, decompresses, and lists the first five archive entries: exercising every layer of the pipeline. If age rejects the key, if tar reports “unexpected EOF in archive”, or if you get zero entries, you know now, when there is time to fix it, not the day you need the backup.
A backup that passes all four checks every week is the only kind that earns the word “backup”. Anything less is theater.
What to Watch Out For
- Timer status ≠ job success: Run
journalctl -u sovereign-backup.serviceregularly; don’t trust the green timer. - Preflight checks in the journal aren’t enough: For critical services, add Matrix or email alerts on failure.
- Key paths must be consistent: Where keys are generated must match where they’re referenced.
- Filesystem limits bite silently: FAT32’s 4 GB file cap will fail backups without warning.
- Hardened systemd can block writes:
ProtectSystem=strictplus missingReadWritePathswill break scripts that write to allowed paths.
What I Actually Use
- age: Asymmetric encryption for Sovereign AI backups
- ext4 + exFAT USB stick: 40 GB for backups, rest for media
- systemd with atomic writes: Prevents partial backups on failure
Silent Backup Failures
From undetected failures to robust fixes