ComfyUI + FLUX.1-schnell Setup
Architecture Decisions
Why sequential instead of parallel?
The DGX Spark’s 128GB unified memory pool is shared between CPU and GPU. Mistral consumes ~93GB during inference, while ComfyUI requires ~25-30GB. Running both simultaneously triggers OOM kills because the system cannot allocate sufficient contiguous memory blocks. Sequential execution ensures each service receives the full 128GB allocation during its runtime window.
Blog workflow:
- Write article with Mistral → generate image prompts
- Desktop link “ComfyUI” → stops Mistral, opens ComfyUI
- Generate images
- Desktop link “Mistral” → stops ComfyUI, starts Mistral
- Merge article + images → publish
#!/bin/bash
sudo systemctl stop sglang-mistral4 sglang-healthcheck.timer
sleep 3
bash ~/bin/start-comfyui-optimized.sh
xdg-open http://127.0.0.1:8188
Watch out: The DGX Spark’s unified memory architecture means GPU and CPU compete for the same pool. Even if total memory appears sufficient, fragmentation can cause allocation failures during large model loads.
Why FLUX.1-schnell?
FLUX.1-schnell operates under Apache 2.0 license, granting unrestricted commercial usage rights without attribution requirements. This contrasts with many diffusion models that impose non-commercial clauses or require licensing fees. The generated images are yours to use without legal encumbrances.
Gotcha: While FLUX.1-schnell is Apache 2.0 licensed, some derivative workflows or custom nodes in ComfyUI may have separate licenses. Always verify the license status of any custom nodes you import.
Why SparkyUI instead of standard Docker?
Standard ComfyUI Docker images target amd64 architectures, which lack support for the GB10’s SM121.1 compute capability. The SparkyUI image includes SageAttention optimizations specifically compiled for compute capability 12.1, which provides 2-3x speed improvements on the GB10 Blackwell GPU. Building the image locally (~20 minutes) ensures compatibility with the DGX Spark’s ARM64 architecture.
Limitation: The SageAttention compilation process requires CUDA 13+ toolkit and specific NVIDIA driver versions. Mismatched versions will cause build failures or runtime errors.
Why systemctl instead of docker stop for Mistral?
Mistral runs under sglang-mistral4.service with unless-stopped restart policy. A plain docker stop triggers systemd’s auto-restart mechanism, causing immediate service resurrection. All scripts must use sudo systemctl stop sglang-mistral4 to properly terminate the service.
Caveat: The
unless-stoppedpolicy means systemd will restart the service if it crashes or exits unexpectedly. This can mask underlying issues during development.
Scripts & Desktop Links
| Action | Script |
|---|---|
| Start ComfyUI (stops Mistral) | ~/bin/start-comfyui-safe.sh / Desktop link |
| Stop ComfyUI | bash /data/scripts/stop-comfyui.sh |
| Start Mistral (stops ComfyUI) | bash /data/scripts/start-mistral.sh / Desktop link |
| Stop all AI services | ~/bin/stop-ai-services.sh / Desktop link |
Mistral start script (start-mistral.sh):
#!/bin/bash
docker stop comfyui || true
sleep 3
sudo systemctl start sglang-healthcheck.timer sglang-mistral4
until curl -s http://127.0.0.1:8000/health >/dev/null; do sleep 5; done
echo "Mistral ready"
Watch out: The health check endpoint may take 30-60 seconds to become available after service startup. Scripts should implement proper wait loops rather than assuming immediate readiness.
One-Time Installation
1. Download models (~29 GB)
bash /data/scripts/install-comfyui-optimized.sh
Gotcha: The FLUX.1-schnell model file (flux1-schnell.safetensors) is 12GB. Downloading over IPv6 may hang due to CDN restrictions. Use
wget -4to force IPv4 connections.
2. Build Docker image (once, ~20 min)
cd /data/projects/comfyui
sudo docker compose build
This compiles SageAttention for SM121A (TORCH_CUDA_ARCH_LIST="12.1"). Only needed on updates or when changing base images.
Limitation: The build process requires approximately 20GB of temporary disk space. Ensure
/tmphas sufficient free space or configure Docker to use an alternate temp directory.
3. Configuration (/data/projects/comfyui/.env)
COMFYUI_HOST_PATH=/ai/models/ComfyUI
SPARKYUI_DATA_PATH=/data/comfyui
COMFYUI_PORT=8188
COMFYUIMINI_PORT=8189
COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory \
--force-fp16 --fp16-unet --fp16-vae --fp16-text-enc \
--dont-upcast-attention --use-sage-attention
Caveat: The
--disable-pinned-memoryflag reduces unified fabric overhead but may increase latency for some operations. Benchmark your specific workload before deployment.
GB10 Optimizations
ComfyUI Flags
--disable-pinned-memory # Cuts unified fabric overhead
--force-fp16 # SageAttention only supports FP16
--fp16-unet --fp16-vae --fp16-text-enc
--dont-upcast-attention # Keep attention layers in FP16
--use-sage-attention # Use SageAttention backend (SM121A)
Avoid these:
--gpu-only– fights the unified memory fabric--cache-none– disables natural caching--highvram– hurts GB10 performance--bf16-*– SageAttention doesn’t support BF16
Watch out: The
--use-sage-attentionflag requires PyTorch 2.3+ with SageAttention patches. Older versions will fail silently or fall back to standard attention mechanisms.
Docker Environment
TORCH_COMPILE_DISABLE=1 # Triton has no SM121A support yet
TORCHDYNAMO_DISABLE=1
PYTORCH_NO_CUDA_MEMORY_CACHING=1
CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
OMP_NUM_THREADS=20 # Use all ARM cores
Limitation: The
TORCH_COMPILE_DISABLE=1setting disables PyTorch’s compilation optimizations, which can reduce performance by 15-20% on some workloads. This is a necessary tradeoff for SM121A compatibility.
FLUX.1-schnell Workflow in ComfyUI
For blog images (4-step, ~10-30s after warmup):
DualCLIPLoader
├── clip_name1: clip_l.safetensors (type: clip_l)
└── clip_name2: t5xxl_fp8_e4m3fn.safetensors (type: t5)
└── CLIPTextEncode (positive prompt)
UNETLoader
└── flux1-schnell.safetensors (weight_dtype: default)
└── ModelSamplingFlux → BasicGuider
VAELoader → ae.safetensors
EmptyLatentImage (1024×1024)
KSampler (steps=4, sampler=euler, scheduler=simple, cfg=1.0)
VAEDecode → SaveImage
Gotcha: The FLUX.1-schnell model requires specific CLIP and T5 text encoder variants. Using incorrect versions will produce distorted outputs or runtime errors.
Known Issues & Fixes
| Problem | Cause | Fix |
|---|---|---|
Mistral restarts after docker stop | systemd unless-stopped | sudo systemctl stop sglang-mistral4 |
| PyTorch warning about SM121A | PyTorch doesn’t officially support SM121A yet | Harmless, ignore it |
| torch.compile disabled | Triton lacks SM121A support | Expected; TORCH_COMPILE_DISABLE=1 |
| IPv6 download hangs | HF CDN blocks IPv6 | wget -4 |
| Download interrupted | Network dropout | wget -c, rerun script, it skips finished files |
| ComfyUI not responding immediately | Still initializing | Wait 30-60s |
| 403 on HF download | Gated repo; terms not accepted | Visit huggingface.co/black-forest-labs/FLUX.1-schnell → “Agree and access” |
| SageAttention build fails | Missing CUDA 13+ toolkit | Install CUDA 13.0+ and set CUDA_HOME |
| Systemd service fails to start | Port conflict | Check sudo lsof -i :8000 and adjust ports |
| ComfyUI crashes on startup | Outdated PyTorch version | Upgrade to PyTorch 2.3+ with SageAttention support |
Watch out: The DGX Spark’s ARM64 architecture means some x86-optimized Python packages may fail. Always verify package compatibility before installation.
What I Actually Use
- NVIDIA DGX Spark: ARM64 server with 128GB unified memory and GB10 Blackwell
- SparkyUI: Docker image with SageAttention for SM121A
- FLUX.1-schnell: Apache 2.0 model for unrestricted commercial use
ComfyUI + FLUX.1-schnell Setup
Self-hosted AI workflow on NVIDIA DGX Spark