ComfyUI plus FLUX.1-schnell on DGX Spark: Per-Style Visual Vocabularies
New to self-hosting AI? The Self-Hosted AI: Start Here hub walks the hardware-decision tree, inference-engine choice, and the operational gotchas that bite hardest in the first three months. Read it before or after this one, whichever fits your stage.
I picked FLUX.1-schnell over the alternatives after a week of testing FLUX-dev, SDXL, and Flux1.dev side by side on the same prompts. Schnell loses some peak quality on portrait work, but for the per-article hero images this blog needs (illustrative, legible at thumb-scale, generated in 4-8 seconds), schnell is the right tradeoff. The decisions below are what fell out of that choice, in the order they came up.
Architecture Decisions
Why sequential instead of parallel?
The DGX Spark’s 128GB unified memory pool is shared between CPU and GPU. Mistral consumes ~93GB during inference, while ComfyUI requires ~25-30GB. Running both simultaneously triggers OOM kills because the system cannot allocate sufficient contiguous memory blocks. Sequential execution ensures each service receives the full 128GB allocation during its runtime window.
Blog workflow:
- Write article with Mistral → generate image prompts
- Desktop link “ComfyUI” → stops Mistral, opens ComfyUI
- Generate images
- Desktop link “Mistral” → stops ComfyUI, starts Mistral
- Merge article + images → publish
#!/bin/bash
sudo systemctl stop sglang-mistral4 sglang-healthcheck.timer
sleep 3
bash ~/bin/start-comfyui-optimized.sh
xdg-open http://127.0.0.1:8188
Watch out: The DGX Spark’s unified memory architecture means GPU and CPU compete for the same pool. Even if total memory appears sufficient, fragmentation can cause allocation failures during large model loads.
Why FLUX.1-schnell?
FLUX.1-schnell operates under Apache 2.0 license, granting unrestricted commercial usage rights without attribution requirements. This contrasts with many diffusion models that impose non-commercial clauses or require licensing fees. The generated images are yours to use without legal encumbrances.
Gotcha: While FLUX.1-schnell is Apache 2.0 licensed, some derivative workflows or custom nodes in ComfyUI may have separate licenses. Always verify the license status of any custom nodes you import.
Why SparkyUI instead of standard Docker?
Standard ComfyUI Docker images target amd64 architectures, which lack support for the GB10’s SM121.1 compute capability. The SparkyUI image includes SageAttention optimizations specifically compiled for compute capability 12.1, which provides 2-3x speed improvements on the GB10 Blackwell GPU. Building the image locally (~20 minutes) ensures compatibility with the DGX Spark’s ARM64 architecture.
Limitation: The SageAttention compilation process requires CUDA 13+ toolkit and specific NVIDIA driver versions. Mismatched versions will cause build failures or runtime errors.
Why systemctl instead of docker stop for Mistral?
Mistral runs under sglang-mistral4.service with unless-stopped restart policy. A plain docker stop triggers systemd’s auto-restart mechanism, causing immediate service resurrection. All scripts must use sudo systemctl stop sglang-mistral4 to properly terminate the service.
Caveat: The
unless-stoppedpolicy means systemd will restart the service if it crashes or exits unexpectedly. This can mask underlying issues during development.
Scripts & Desktop Links
| Action | Script |
|---|---|
| Start ComfyUI (stops Mistral) | ~/bin/start-comfyui-safe.sh / Desktop link |
| Stop ComfyUI | bash /data/scripts/stop-comfyui.sh |
| Start Mistral (stops ComfyUI) | bash /data/scripts/start-mistral.sh / Desktop link |
| Stop all AI services | ~/bin/stop-ai-services.sh / Desktop link |
Mistral start script (start-mistral.sh):
#!/bin/bash
docker stop comfyui || true
sleep 3
sudo systemctl start sglang-healthcheck.timer sglang-mistral4
until curl -s http://127.0.0.1:8000/health >/dev/null; do sleep 5; done
echo "Mistral ready"
Watch out: The health check endpoint may take 30-60 seconds to become available after service startup. Scripts should implement proper wait loops rather than assuming immediate readiness.
One-Time Installation
1. Download models (~29 GB)
bash /data/scripts/install-comfyui-optimized.sh
Gotcha: The FLUX.1-schnell model file (flux1-schnell.safetensors) is 12GB. Downloading over IPv6 may hang due to CDN restrictions. Use
wget -4to force IPv4 connections.
2. Build Docker image (once, ~20 min)
cd /data/projects/comfyui
sudo docker compose build
This compiles SageAttention for SM121A (TORCH_CUDA_ARCH_LIST="12.1"). Only needed on updates or when changing base images.
Limitation: The build process requires approximately 20GB of temporary disk space. Ensure
/tmphas sufficient free space or configure Docker to use an alternate temp directory.
3. Configuration (/data/projects/comfyui/.env)
COMFYUI_HOST_PATH=/ai/models/ComfyUI
SPARKYUI_DATA_PATH=/data/comfyui
COMFYUI_PORT=8188
COMFYUIMINI_PORT=8189
COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory \
--force-fp16 --fp16-unet --fp16-vae --fp16-text-enc \
--dont-upcast-attention --use-sage-attention
Caveat: The
--disable-pinned-memoryflag reduces unified fabric overhead but may increase latency for some operations. Benchmark your specific workload before deployment.
GB10 Optimizations
ComfyUI Flags
--disable-pinned-memory # Cuts unified fabric overhead
--force-fp16 # SageAttention only supports FP16
--fp16-unet --fp16-vae --fp16-text-enc
--dont-upcast-attention # Keep attention layers in FP16
--use-sage-attention # Use SageAttention backend (SM121A)
Avoid these:
--gpu-only, fights the unified memory fabric--cache-none, disables natural caching--highvram, hurts GB10 performance--bf16-*, SageAttention doesn’t support BF16
Watch out: The
--use-sage-attentionflag requires PyTorch 2.3+ with SageAttention patches. Older versions will fail silently or fall back to standard attention mechanisms.
Docker Environment
TORCH_COMPILE_DISABLE=1 # Triton has no SM121A support yet
TORCHDYNAMO_DISABLE=1
PYTORCH_NO_CUDA_MEMORY_CACHING=1
CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
OMP_NUM_THREADS=20 # Use all ARM cores
Limitation: The
TORCH_COMPILE_DISABLE=1setting disables PyTorch’s compilation optimizations, which can reduce performance by 15-20% on some workloads. This is a necessary tradeoff for SM121A compatibility.
FLUX.1-schnell Workflow in ComfyUI
For blog images (4-step, ~10-30s after warmup):
DualCLIPLoader
├── clip_name1: clip_l.safetensors (type: clip_l)
└── clip_name2: t5xxl_fp8_e4m3fn.safetensors (type: t5)
└── CLIPTextEncode (positive prompt)
UNETLoader
└── flux1-schnell.safetensors (weight_dtype: default)
└── ModelSamplingFlux → BasicGuider
VAELoader → ae.safetensors
EmptyLatentImage (1024×1024)
KSampler (steps=4, sampler=euler, scheduler=simple, cfg=1.0)
VAEDecode → SaveImage
Gotcha: The FLUX.1-schnell model requires specific CLIP and T5 text encoder variants. Using incorrect versions will produce distorted outputs or runtime errors.
Known Issues & Fixes
| Problem | Cause | Fix |
|---|---|---|
Mistral restarts after docker stop | systemd unless-stopped | sudo systemctl stop sglang-mistral4 |
| PyTorch warning about SM121A | PyTorch doesn’t officially support SM121A yet | Harmless, ignore it |
| torch.compile disabled | Triton lacks SM121A support | Expected; TORCH_COMPILE_DISABLE=1 |
| IPv6 download hangs | HF CDN blocks IPv6 | wget -4 |
| Download interrupted | Network dropout | wget -c, rerun script, it skips finished files |
| ComfyUI not responding immediately | Still initializing | Wait 30-60s |
| 403 on HF download | Gated repo; terms not accepted | Visit huggingface.co/black-forest-labs/FLUX.1-schnell → “Agree and access” |
| SageAttention build fails | Missing CUDA 13+ toolkit | Install CUDA 13.0+ and set CUDA_HOME |
| Systemd service fails to start | Port conflict | Check sudo lsof -i :8000 and adjust ports |
| ComfyUI crashes on startup | Outdated PyTorch version | Upgrade to PyTorch 2.3+ with SageAttention support |
Watch out: The DGX Spark’s ARM64 architecture means some x86-optimized Python packages may fail. Always verify package compatibility before installation.
What I Actually Use
- NVIDIA DGX Spark: ARM64 server with 128GB unified memory and GB10 Blackwell
- SparkyUI: Docker image with SageAttention for SM121A
- FLUX.1-schnell: Apache 2.0 model for unrestricted commercial use
Where the FLUX-on-DGX-Spark setup gets sharp
Three details turned out to matter more than the install steps suggest.
First, the model snapshot path matters. FLUX.1-schnell’s HF download is large and the default ComfyUI cache directory is in the home filesystem, which on the DGX Spark setup is a smaller partition than /data. Symlinking ComfyUI’s models/checkpoints/ directory to /data/models/comfyui-checkpoints/ before the first download saves a “no space left on device” surprise three minutes into the first generation.
Second, the workflow-JSON shape is opinionated. ComfyUI’s UI exports a JSON that includes node-positioning metadata and tons of UI state that is not load-bearing for headless runs. For the per-article hero generation we strip the JSON down to just the model load, prompt encode, sampler, save image, and decode nodes. Result: workflow files that are diffable in git, comprehensible six months later, and load in milliseconds rather than seconds.
Third, the per-style visual vocabularies (landscape for conclusion articles, precision instruments for best-practice, scaffolding for code) live in the prompt templates, not in the workflow JSON. That means switching style is a prompt-template change rather than a workflow-JSON edit, which the rate-of-iteration on prompt experimentation made obvious within the first week of running this pipeline.
ComfyUI + FLUX.1-schnell Setup
Self-hosted AI workflow on NVIDIA DGX Spark