Backup System Rebuilt from Scratch: The Night I Found Out Six Months of Backups Were Fake
My backup system looked perfect for six months. systemctl status sovereign-backup.timer showed green. The timer was enabled. The service was active. But when I tried to restore a file last week, I realized nothing had ever been backed up.
Quick Take
- Systemd timer showed green but never actually ran the backup script
ageencryption tool wasn’t installed, failing silently in the background- Keys lived in
/root/.age-identitybut script expected/data/secrets/age-identity- USB stick formatted as FAT32 would have failed at 3.4 GB due to 4 GB file limit
The Invisible Failure: Systemd Timer vs Actual Execution
ExecStart=/usr/local/bin/backup.sh
The service file pointed to /usr/local/bin/backup.sh which didn’t exist. The real script lived at /data/projects/sovereign-backup/backup.sh. I had planned to symlink it but never did.
ls -l /usr/local/bin/backup.sh
# ls: cannot access '/usr/local/bin/backup.sh': No such file or directory
The timer showed active because systemd only checks if the unit is enabled and the schedule is valid, not whether the script actually exists or runs successfully.
systemctl status sovereign-backup.timer
# ● sovereign-backup.timer - Daily Sovereign AI Backups
# Loaded: loaded (/etc/systemd/system/sovereign-backup.timer; enabled; vendor preset: enabled)
# Active: active (waiting) since ...
The real failure only showed up in the service logs.
journalctl -u sovereign-backup.service --no-pager | grep -i "failed\|error"
# Failed at step ExecStart: No such file or directory
Why Silent Failures Happen: Preflight Checks That Don’t Alert
The backup script included a preflight check for age encryption.
command -v age
# /usr/bin/age
Wait, it was installed. But on another machine. The script ran on a different host where age wasn’t present.
which age
# /usr/bin/age
# On target host:
which age
# /usr/bin/which: no age in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin)
The script exited with an error code but didn’t trigger any alert because I never set up notifications for failed preflight checks.
grep -r "Preflight check failed" /var/log/sovereign-backup.log
# 2024-04-14T02:00:01Z sovereign-backup[1234]: Preflight check failed: age not found
The Fix: Making Backups Actually Happen
First, point the service to the real script.
# sovereign-backup.service (fixed):
ExecStart=/data/projects/sovereign-backup/backup.sh
ProtectSystem=strict
ReadWritePaths=/data/backups /var/log
NoNewPrivileges=true
Then install age and move keys to the expected location.
apt install age
cp /root/.age-identity /data/secrets/age-identity
cp /root/.age-recipient /data/secrets/age-recipient
chmod 600 /data/secrets/age-identity
For the USB stick, repartition to avoid FAT32’s 4 GB file limit.
lsblk
# NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
# sdb 8:16 1 238.4G 0 disk
# └─sdb1 8:17 1 238.4G 0 part
sudo parted /dev/sdb
(parted) mklabel gpt
(parted) mkpart sovereign-backup ext4 1MiB 40GiB
(parted) mkpart sovereign-media exfat 40GiB 100%
(parted) quit
sudo mkfs.ext4 /dev/sdb1
sudo mkfs.exfat /dev/sdb2
Add the mount points to systemd’s read-write paths.
ReadWritePaths=/data /var/log /var/lib/tor /var/lib/aide /mnt/sovereign-usb /mnt/sovereign-usb-media
Finally, use atomic writes to prevent partial backups.
TMP_FILE="${FINAL_FILE}.tmp"
trap 'rm -f "$TMP_FILE"; log "Backup aborted"' ERR
tar --exclude='*.tmp' -czf - /data | pigz -c | age --recipient-file /data/secrets/age-recipient --output "$TMP_FILE"
mv "$TMP_FILE" "$FINAL_FILE"
trap - ERR
What Went Wrong: Lessons Hard Learned
I trusted green status lights more than logs. Systemd timers show enabled and active, not whether the job actually ran. Preflight checks that exit with errors don’t help if no one sees them. Key paths drift when documentation and implementation diverge. Filesystem limits like FAT32’s 4 GB file cap break silently until the third backup fails.
The most dangerous failures are the ones that look healthy.
What I Actually Use
- DGX Spark ARM64 server: Runs daily backups at 02:00 via systemd with 14-day retention
- 256 GB Samsung USB-C stick: Formatted with ext4 for backups and exFAT for media, mounted at
/mnt/sovereign-usb- Mistral Small 4: Encrypts backups using age with asymmetric keys stored in
/data/secrets/
Backup Failure Fix
From silent failure to verified execution