I copied my DGX Spark /data/ convention to a standard Ubuntu laptop. Three weeks later I forgot Docker exists. Root partition filled to 96 percent. Here is the diagnosis, the surgery, and the rule I should have followed.

The /data/ Convention Trap: Ubuntu-LVM Lessons That Bit Me Twice

I spent the day building a sovereign-AI setup on a Lenovo Legion laptop for someone else. The laptop runs a standard Ubuntu 26.04 LVM install. The DGX Spark on my desk runs NVIDIA’s preinstalled image on a single large ext4 partition. The convention I use across both is /data/ for project data, AI models, and config I want separated from the system. Here is the thing I had not examined closely enough: on both machines /data/ is just a directory on the root filesystem, not a separate volume. The convention looked structural and never was. What kept it from biting on the Spark was disk size, not disk layout, and I did not understand that until the laptop nearly filled up.

I bind-mounted /data/ai/ onto the home partition early in setup, because I did at least know the laptop’s /data/ had no volume of its own. I felt clever, and stopped there. Three weeks of container pulls later, the root partition hit 96 percent full and the system was minutes away from refusing writes. The bind-mount strategy was right. The bind-mount list was incomplete. This post is the post-mortem.

What the Spark layout actually looks like

I have to correct the assumption I started with, because it was wrong in a way that turns out to be the whole point of this post. The DGX Spark is not hand-partitioned. It ships with NVIDIA’s preinstalled image on a single large ext4 partition: a small EFI partition and then one root filesystem spanning the entire 3.7 TB NVMe. No LUKS, no LVM, no separate /data or /var volumes. I never custom-partitioned it, because the NVIDIA image was preinstalled and I was not going to wipe a working AI box just to re-slice the disk.

So /data/ on the Spark is not a separate volume. It is a directory on root, exactly like everything else. /data/projects/, /data/ai/, /data/secrets/, /data/scripts/ all sit on the same single filesystem as /, /home, and /var/lib/docker. The convention is load-bearing for everything I write, tools have absolute paths hard-coded, KB scripts read from /data/projects/sovereign-kb/, the MCP server config references /data/projects/kb-stack/, and none of it knows or cares where /data physically sits, because there is nowhere else for it to sit.

Here is the part I got wrong for months. I believed the /data/ convention gave me storage isolation. It never did. The reason it never bit me on the Spark is not that /data was a protected volume. It is that the Spark has a 3.7 TB disk. There is so much room that Docker can write hundreds of gigabytes of image layers and root never gets close to full. The convention was always just a naming habit, and the disk size hid that fact.

This is the assumption I carried to the laptop. The laptop does not have 3.7 TB of headroom to hide behind.

What the Ubuntu installer actually gives you

The friend’s laptop came as a stock Lenovo with Windows. I wiped, shrank Windows to 300G, set up LUKS on the remaining 681G, and ran the Ubuntu 26.04 installer’s LVM-in-LUKS path. The installer made three volumes:

No /data. No /var. Just root and home. Standard.

This is a perfectly reasonable layout for a personal laptop. Most users keep their files in /home, install software into /usr, and never think about it. The 80G root will hold the system and a reasonable amount of /var for logs and a few packages. The 513G home will hold user files. This is the layout the installer ships because it works for most people.

It does not work for the convention I wanted. The convention needs /data to absorb large project-shaped writes without touching root.

I made the choice to keep the convention rather than rewrite the tools. I created /data/ai/ as a directory inside the root volume, then bind-mounted it onto /home/USER/.ai/. The bind-mount lives in fstab as none bind,nofail 0 0. The effect: anything written to /data/ai/foo lands physically on the home volume. The path stays compatible with the Spark convention. The disk math stays sane.

This worked. For two of the three large eaters. I missed the third.

The three large eaters

On any modern Linux laptop running an AI workload, there are three large eaters that grow without bound if you do not constrain them:

The first is AI model storage. Ollama models, ComfyUI checkpoints, faster-whisper models, Piper TTS voices (Thorsten + the English defaults). On this laptop today, that totals 19G. That landed in /data/ai/ and got bind-mounted to home. Solved on day one.

The second is Python virtual environments for AI tooling. The kb-stack venv on this laptop is 5.5G because it contains torch, transformers, sentence-transformers, chromadb, and the long tail of CUDA bindings each of those drag in. That landed in /data/projects/kb-stack/.venv and got bind-mounted to home. Solved on day one.

The third is Docker container storage. The OpenWebUI image is 6.7G. ComfyUI is 16.7G. SearXNG is 375M but the speaker container is 12G and faster-whisper is 8.6G. Gitea is 245M. Sum: about 45G of images, plus some layer-cache and metadata, currently 29G actual on-disk because of layer sharing.

Docker writes all of that into /var/lib/docker/ by default. /var/lib/docker/ lives on root. I did not bind-mount it.

Three weeks of container pulls later, the laptop’s root partition was at 96 percent full and rising. I had 4G of breathing room. The first time I noticed was when an apt update failed with “No space left on device” trying to extract a security patch. That is roughly the worst time to notice.

Why bind-mount of /var/lib/docker/ is harder than /data/ai/

I tried the obvious fix first: stop Docker, rsync /var/lib/docker/ to /home/USER/.docker-data/, bind-mount it back. Same pattern that worked for /data/ai/.

It did not work. The rsync copied 633K of metadata and zero bytes of actual image content.

The reason involves Docker’s storage driver. Modern Docker (29.5 on Ubuntu 26.04) uses overlayfs as the default storage driver. overlayfs does not store image layers as ordinary files in the on-disk path. It assembles layer mounts at daemon startup, using metadata to overlay the layers on top of each other to produce the running container’s view. When the daemon is stopped, those overlay mounts are torn down. The disk space the daemon was using returns to the filesystem, and the on-disk path looks nearly empty.

du -sh /var/lib/docker/ while the daemon is running: 29G. After stopping the daemon: under 1M. The rsync sees the second state.

This is a problem if you wanted to migrate Docker storage via bind-mount the same way you migrate any other directory.

The deeper reason is that overlayfs works through a mount-time composition. The on-disk layout under /var/lib/docker/overlay2/<sha>/ looks like a set of diff/ directories and a lower link file that points to other layer paths. None of those diff/ directories contain the merged filesystem view that a running container sees. The merge happens when Docker calls mount -t overlay overlay -o lowerdir=...:...:...,upperdir=...,workdir=... target. The merged view exists at runtime, in kernel memory, and is exposed at the container’s rootfs mountpoint.

When you rsync the on-disk path while the daemon is stopped, you copy the diff/ directories. The disk usage is small because each diff/ only contains the per-layer delta against its parent. When you rsync the same path while the daemon is running, the kernel still hands rsync the on-disk diff/ data, not the merged mountpoint view. Either way, rsync gets the unmerged delta files. To “see” the full image content, you would need to walk the layer chain and reconstruct the merged tree, which is exactly what the storage driver does at mount time.

The practical takeaway is that for any non-trivial Docker migration, the storage driver is not your friend. Use Docker’s own tools: docker save and docker load, or docker volume for named volumes, or the data-root daemon config for full-storage migration. Do not try to do filesystem-level surgery on a path that is not actually a filesystem.

The right migration path

Docker provides a daemon-level config option data-root that tells it where to put its storage. The migration is:

  1. Stop Docker.
  2. Save all the images you want to keep as tarballs with docker save -o image.tar IMAGE:TAG.
  3. Edit /etc/docker/daemon.json to set "data-root": "/home/USER/.docker-data".
  4. Move or delete the old /var/lib/docker/.
  5. Start Docker. It will create the new data-root and recreate its internal directories there.
  6. Load the images back with docker load -i image.tar for each.
  7. Recreate the containers from your compose files. They will reattach to your bind-mounted volumes automatically.

I have not run this yet on the friend’s laptop. There is currently 16G of headroom on root after pruning unused images, and I judged that the right migration was during a planned maintenance window rather than mid-setup-day. The plan lives in the laptop’s ~/docs/plans/2026-05-29-docker-storage-move.md and the steps are step-by-step with verification points.

What I want to flag for anyone reading this who is in the same situation: do not try the rsync-then-bind-mount trick on /var/lib/docker/. Use data-root. The migration takes about 30 minutes, costs you the disk for a temporary tarball cache, and gives you a clean reproducible setup.

The complete bind-mount list

For anyone replicating this convention on a standard Ubuntu LVM install, here is the complete list of paths that I should have bind-mounted from day one:

/data/ai          -> /home/USER/.ai
/data/projects    -> /home/USER/.projects
/var/lib/docker   -> /home/USER/.docker        (via data-root, not bind-mount)
/var/log          -> /home/USER/.logs           (optional, slow-growing)
/var/cache/apt    -> /home/USER/.apt-cache      (optional, resettable)
/var/lib/snapd    -> /home/USER/.snapd          (only if you use snaps)

The first two are bind-mounts in /etc/fstab. The third is a daemon config change. The remaining three are optional depending on what services you run and how aggressive you want to be about keeping root clean.

The rule that would have saved me three weeks: on a standard Ubuntu LVM install, every directory that can grow without bound must be redirected to the home volume before the service that writes to it is installed. Not after, not “when I notice”, but before.

Why I am not switching to custom partitioning

The cleanest fix for this whole class of problem would have been: do not use the stock LVM layout. Hand-partition the install with vg0-root (30G), vg0-var (100G), vg0-data (100G), vg0-home (rest). That gives you four real volumes, each with the right purpose, no bind-mount dance required.

I am not going to do this on the laptop, for two reasons.

First, the laptop user is not me. He is a friend who needs to operate this system without me looking over his shoulder. A standard layout means standard tools work, standard recovery procedures work, and standard advice from the Ubuntu community applies. A custom layout means I am the only person in his life who can answer questions about disk geometry. That is exactly the dependency I am trying not to create.

Second, the bind-mount strategy works. It is uglier than the custom layout but functionally equivalent. The cost is the discipline of maintaining the complete bind-mount list, which I now have written down in this article and in the laptop’s docs directory.

If I were setting up a new server for myself today, I would use the custom layout. For someone else’s daily-driver laptop, the standard layout plus a complete bind-mount list is the right tradeoff.

The bigger lesson about portable conventions

The deeper trap here was treating /data/ as a portable convention without examining what made it work in the first place. I assumed it worked on the Spark because /data was a real volume with its own size and its own boundary. It is not. It is a directory on a single 3.7 TB filesystem, and it worked only because that filesystem is large enough that nothing ever fills it. The path was a label for free space all along. On the laptop, /data/ started as the same label on an 80 GB root, and the difference between 3.7 TB and 80 GB is the difference between a convention that looks structural and one that bites in three weeks.

This kind of mistake is recurrent across infrastructure work. A pattern that works on one host gets copied to another, keeps its shape, loses whatever was quietly holding it up, and silently degrades. The fix is not to ban portable conventions. The fix is to ask, every time you transplant one: what actually made this work on the old host? If the honest answer is “nothing structural, the old box just had room to spare,” then the convention is a naming habit, not a safeguard, and the new host will expose that the moment it has less slack. If you want real isolation, build a real boundary (a separate volume, a quota, a cgroup limit). Do not let a directory name stand in for one.

I am now applying the same question to every other convention I share between Spark and laptop. /etc/sovereign/ for sovereign-AI service configuration: same on both hosts, no special mechanism needed, transplants fine. /data/secrets/ with chmod 700 and chown root:root: same on both hosts, but the laptop user has different UIDs than the Spark, so I had to explicitly add the laptop user to a secrets group rather than rely on root-only access. /data/scripts/ symlinked into /usr/local/bin/: same on both hosts, but the laptop ships some scripts that the Spark does not, so I keep the canonical list in a MANIFEST.md per host rather than assuming the paths match.

The result is that each convention now has a stated mechanism, an explicit list of what is shared and what is host-local, and a paragraph in the dashboard’s Learning tab explaining the trade-off. The friend who owns the laptop does not need to know any of this. But the next time I look at the laptop in six months and wonder why something is structured the way it is, the answer is on the system, not in my memory.

The migration timeline

I am tracking the Docker data-root migration as a calendar item rather than an emergency. Current root usage is 64 percent. The growth rate over the last week was about 1.5 percent per week, driven mostly by apt cache and journal growth. At that rate, root will hit 85 percent (my soft threshold for the dashboard) in 14 weeks, and 95 percent in 20 weeks. The migration takes about 30 minutes if all goes well, plus a backup buffer.

The plan is to schedule it inside a maintenance window where the friend is not actively using the laptop for AI work, run it with him watching so he sees the procedure, and then check it into the laptop’s docs/runbooks/ as the canonical procedure for next time. The maintenance window is also a chance to do a full apt full-upgrade, a kernel-cleanup, and an aide --update pass. These are the housekeeping operations that benefit from being batched.

I considered moving the data-root proactively at setup time. I decided against it because the friend should not start his ownership with an unfamiliar Docker layout. The standard /var/lib/docker is what every Docker tutorial on the internet references. The non-standard /home/USER/.docker-data is what my dashboard Learning tab will eventually explain. There is a sequencing argument here: standard first, custom only once the user understands what the customization buys him.

What I learned later (2026-05-30 update)

I ran the Docker data-root migration the day after this post was drafted. It half-worked. The other half is a longer story and I am appending it rather than rewriting the section above, because the original section reflects what I believed at the time and the update reflects what I learned by trying.

The data-root config moves Docker. It does not move containerd. I edited /etc/docker/daemon.json, set "data-root": "/home/USER/.docker-data", restarted Docker, and the images and containers migrated cleanly. Then I checked root usage expecting the big drop. The drop was about half of what I expected. Containerd, which Docker uses as its lower-layer runtime in Docker 29, has its own state at /var/lib/containerd/. That directory is not affected by Docker’s data-root setting because containerd is a separate daemon with its own config. The image-layer storage that I thought lived under /var/lib/docker/overlay2/ is actually split: the high-level metadata that Docker manages does move, and the low-level snapshot data that containerd manages does not. The disk savings from the Docker migration alone were modest. The full migration required also editing /etc/containerd/config.toml to change the containerd root directory and restarting containerd before restarting Docker, in that order. The right migration path on Docker 29 is two daemons, two config files, two restarts. The original section above implies one.

Watchtower 1.7.1 crashed against Docker 29.5 with an API mismatch. Watchtower had been silently working through the setup-day week, and I noticed today that it had been crash-looping for 48 hours after a Docker minor-version update. The crash was an API-version mismatch: Watchtower 1.7.1 calls Docker API v1.43, and Docker 29.5 dropped support for that version. Watchtower upstream had not shipped a release for it as of the writing of this update. The pragmatic response, since I cannot let an unmaintained auto-updater run unsupervised against the friend’s daily-driver, was to disable Watchtower and write a small replacement in an afternoon. The replacement is called composewarden, reads each compose project’s directory, pulls images, and recreates containers with a label-opt-in semantic so I can keep the conservative default of “only update containers that explicitly opt in.” It is about 400 lines of Go and does the one thing Watchtower used to do correctly. The replacement is now what runs on the friend’s box. I left the failed Watchtower container stopped for one week as evidence in the Doktor tab, then removed it.

The two takeaways for the original post. First, the complete bind-mount list in the section above is still right for everything except Docker. The Docker entry needs an asterisk and a footnote that says “and also containerd, see config notes.” I will edit the post directly before publish to add the containerd line. Second, the implicit assumption I had that “Docker manages all its storage in one place” was wrong on Docker 29. The split between Docker and containerd is the kind of architectural detail that the docs mention in passing but does not show up in any of the migration guides I had read. The lesson lives in the laptop’s KB now as ~/kb/docker/containerd-root-split.md and the Lernen-tab in the dashboard will eventually reference it.

Honest correction: the original section above says “The migration takes about 30 minutes if all goes well.” It took closer to two hours, because the containerd half was not in my plan and Watchtower’s incidental death happened in the same maintenance window. The 30-minute estimate is wrong for the Docker-29-on-Ubuntu-26 reality. A realistic estimate is two hours for a first-time migration with the containerd step included, plus another half-day if your auto-updater also turns out to be incompatible. Plan accordingly.

The full containerd migration (2026-05-30 afternoon receipts)

I ran the complete two-daemon migration later the same afternoon. Numbers, since the original section above was specifically wrong about how much disk this actually frees.

Before, root partition. 58 GB used of 79 GB, 78 percent. Home was at 76 GB of 513 GB, 16 percent. That is the state after the morning’s Docker-only data-root move and the image prune. Root usage had stopped rising but the number had not dropped the way I expected, because containerd was still writing snapshots into /var/lib/containerd/.

The two-step sequence I actually ran.

  1. docker compose down across all four active stacks (openwebui, comfyui, gitea, mcpo). Four stacks, four directories with compose files, four down commands. Took about 90 seconds total.
  2. systemctl stop docker docker.socket containerd. Both daemons fully stopped. This is the state where containerd’s overlay snapshots are unmounted and the on-disk path actually reflects all the data, not just the unmerged deltas.
  3. Edit /etc/containerd/config.toml. Set root = "/home/USER/.containerd" and state = "/run/containerd" explicitly. The root setting is the one that moves the snapshot store. The state setting I left on /run/containerd because that is tmpfs and irrelevant to disk pressure.
  4. rsync -aHAX /var/lib/containerd/ /home/USER/.containerd/. With the daemon stopped, this copied 42 GB of real files. No overlay-driver weirdness because no overlay mounts were active. The rsync took 8 minutes on the NVMe-to-NVMe internal copy.
  5. systemctl start containerd docker. Containerd came up against the new root. Docker came up against its already-migrated data-root from the morning. Both daemons logged clean.
  6. docker compose up -d in each of the four stack directories. Containers recreated against the same volume bind-mounts as before, no data loss, no recreation of named volumes.
  7. Verification pass: docker ps showed all 6 containers Up and healthy. curl http://localhost:8080/ and the other endpoint pokes returned 200 for every web service. The watchdocker dry-run still discovered all 4 compose projects and reported “would pull, no upgrade pending” as expected for the just-restarted state.
  8. Cleanup: rm -rf /var/lib/docker /var/lib/containerd after a 5-minute soak period where I watched logs to make sure no daemon was still writing to the old paths. Reclaimed the inode entries and freed the actual blocks.

After, root partition. 17 GB used of 79 GB, 22 percent. Home at 118 GB of 513 GB, 25 percent. The drop on root was 41 GB, exactly the size of the containerd snapshot store plus a few hundred MB of Docker metadata that the morning move had not caught. The full containerd-plus-Docker migration reclaimed what the docker-only morning attempt had failed to reclaim.

Downtime. About 15 to 20 minutes of container services unavailable, almost entirely the rsync window plus the cautious “start one daemon, watch logs, start the next” cadence. The friend was not actively using the laptop during the window. For anyone scheduling this against active use, plan a 30-minute maintenance window and you will have buffer.

The critical insight, restated for searchability. Docker 29 delegates image and snapshot storage to containerd. Setting data-root in /etc/docker/daemon.json moves Docker’s view of the world. It does not move containerd’s view. Containerd has its own daemon, its own config file (/etc/containerd/config.toml), its own root setting. Both must be migrated for the full disk win. The morning’s data-root-only approach was insufficient. The afternoon’s two-daemon approach was the complete fix. If you only do one, you will see modest savings and confusion. If you do both, you reclaim the full image footprint to wherever you want it to live.

Sibling posts on this thread

24 Hours Setting Up a Lenovo Legion Pro 7 Gen 10 is the full day-of mechanics post that names this trap as one of the day’s surprises.

Sovereign Friend-Setup explains why “standard layout plus bind-mounts” was the right call for someone else’s daily-driver laptop.

Dashboard As Learning-Cockpit shows where the disk-usage warnings surface to the friend, and how the Learning tab will eventually explain the data-root migration.

We Were Wrong About Local 8B Tool-Use is the model-side technical post that depends on /data/ai/ being a place where multi-gigabyte models can land without filling root.

What survived the day

The friend’s laptop currently has 16G of headroom on root, three weeks of breathing room before the next migration becomes urgent, and a written plan with step-by-step verification for the Docker data-root move when it gets there. The convention /data/ai/, /data/projects/, /data/secrets/ is preserved exactly. The Spark and the laptop now look the same from the perspective of any tool that reads from those paths.

The one thing I wish I had: the discipline to make the bind-mount list complete before deploying any service that writes to /var/. Future me, please write that on a sticky note before the next setup day. The note can replace this article.

Was this worth it? Zap the article.

Value for value, no signup. Sats go straight to the writer.