ComfyUI + FLUX.1-schnell Setup

April 18, 2026

Architecture Decisions

Why sequential instead of parallel?

The DGX Spark’s 128GB unified memory pool is shared between CPU and GPU. Mistral consumes ~93GB during inference, while ComfyUI requires ~25-30GB. Running both simultaneously triggers OOM kills because the system cannot allocate sufficient contiguous memory blocks. Sequential execution ensures each service receives the full 128GB allocation during its runtime window.

Blog workflow:

Write article with Mistral → generate image prompts
Desktop link “ComfyUI” → stops Mistral, opens ComfyUI
Generate images
Desktop link “Mistral” → stops ComfyUI, starts Mistral
Merge article + images → publish

#!/bin/bash
sudo systemctl stop sglang-mistral4 sglang-healthcheck.timer
sleep 3
bash ~/bin/start-comfyui-optimized.sh
xdg-open http://127.0.0.1:8188

Watch out: The DGX Spark’s unified memory architecture means GPU and CPU compete for the same pool. Even if total memory appears sufficient, fragmentation can cause allocation failures during large model loads.

Why FLUX.1-schnell?

FLUX.1-schnell operates under Apache 2.0 license, granting unrestricted commercial usage rights without attribution requirements. This contrasts with many diffusion models that impose non-commercial clauses or require licensing fees. The generated images are yours to use without legal encumbrances.

Gotcha: While FLUX.1-schnell is Apache 2.0 licensed, some derivative workflows or custom nodes in ComfyUI may have separate licenses. Always verify the license status of any custom nodes you import.

Why SparkyUI instead of standard Docker?

Standard ComfyUI Docker images target amd64 architectures, which lack support for the GB10’s SM121.1 compute capability. The SparkyUI image includes SageAttention optimizations specifically compiled for compute capability 12.1, which provides 2-3x speed improvements on the GB10 Blackwell GPU. Building the image locally (~20 minutes) ensures compatibility with the DGX Spark’s ARM64 architecture.

Limitation: The SageAttention compilation process requires CUDA 13+ toolkit and specific NVIDIA driver versions. Mismatched versions will cause build failures or runtime errors.

Why systemctl instead of docker stop for Mistral?

Mistral runs under sglang-mistral4.service with unless-stopped restart policy. A plain docker stop triggers systemd’s auto-restart mechanism, causing immediate service resurrection. All scripts must use sudo systemctl stop sglang-mistral4 to properly terminate the service.

Caveat: The unless-stopped policy means systemd will restart the service if it crashes or exits unexpectedly. This can mask underlying issues during development.

Scripts & Desktop Links

Action	Script
Start ComfyUI (stops Mistral)	`~/bin/start-comfyui-safe.sh` / Desktop link
Stop ComfyUI	`bash /data/scripts/stop-comfyui.sh`
Start Mistral (stops ComfyUI)	`bash /data/scripts/start-mistral.sh` / Desktop link
Stop all AI services	`~/bin/stop-ai-services.sh` / Desktop link

Mistral start script (start-mistral.sh):

#!/bin/bash
docker stop comfyui || true
sleep 3
sudo systemctl start sglang-healthcheck.timer sglang-mistral4
until curl -s http://127.0.0.1:8000/health >/dev/null; do sleep 5; done
echo "Mistral ready"

Watch out: The health check endpoint may take 30-60 seconds to become available after service startup. Scripts should implement proper wait loops rather than assuming immediate readiness.

One-Time Installation

1. Download models (~29 GB)

bash /data/scripts/install-comfyui-optimized.sh

Gotcha: The FLUX.1-schnell model file (flux1-schnell.safetensors) is 12GB. Downloading over IPv6 may hang due to CDN restrictions. Use wget -4 to force IPv4 connections.

2. Build Docker image (once, ~20 min)

cd /data/projects/comfyui
sudo docker compose build

This compiles SageAttention for SM121A (TORCH_CUDA_ARCH_LIST="12.1"). Only needed on updates or when changing base images.

Limitation: The build process requires approximately 20GB of temporary disk space. Ensure /tmp has sufficient free space or configure Docker to use an alternate temp directory.

3. Configuration (`/data/projects/comfyui/.env`)

COMFYUI_HOST_PATH=/ai/models/ComfyUI
SPARKYUI_DATA_PATH=/data/comfyui
COMFYUI_PORT=8188
COMFYUIMINI_PORT=8189
COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory \
  --force-fp16 --fp16-unet --fp16-vae --fp16-text-enc \
  --dont-upcast-attention --use-sage-attention

Caveat: The --disable-pinned-memory flag reduces unified fabric overhead but may increase latency for some operations. Benchmark your specific workload before deployment.

GB10 Optimizations

ComfyUI Flags

--disable-pinned-memory     # Cuts unified fabric overhead
--force-fp16                # SageAttention only supports FP16
--fp16-unet --fp16-vae --fp16-text-enc
--dont-upcast-attention     # Keep attention layers in FP16
--use-sage-attention        # Use SageAttention backend (SM121A)

Avoid these:

--gpu-only – fights the unified memory fabric
--cache-none – disables natural caching
--highvram – hurts GB10 performance
--bf16-* – SageAttention doesn’t support BF16

Watch out: The --use-sage-attention flag requires PyTorch 2.3+ with SageAttention patches. Older versions will fail silently or fall back to standard attention mechanisms.

Docker Environment

TORCH_COMPILE_DISABLE=1      # Triton has no SM121A support yet
TORCHDYNAMO_DISABLE=1
PYTORCH_NO_CUDA_MEMORY_CACHING=1
CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
OMP_NUM_THREADS=20           # Use all ARM cores

Limitation: The TORCH_COMPILE_DISABLE=1 setting disables PyTorch’s compilation optimizations, which can reduce performance by 15-20% on some workloads. This is a necessary tradeoff for SM121A compatibility.

FLUX.1-schnell Workflow in ComfyUI

For blog images (4-step, ~10-30s after warmup):

DualCLIPLoader
  ├── clip_name1: clip_l.safetensors         (type: clip_l)
  └── clip_name2: t5xxl_fp8_e4m3fn.safetensors  (type: t5)
      └── CLIPTextEncode (positive prompt)

UNETLoader
  └── flux1-schnell.safetensors  (weight_dtype: default)
      └── ModelSamplingFlux → BasicGuider

VAELoader → ae.safetensors
EmptyLatentImage (1024×1024)
KSampler (steps=4, sampler=euler, scheduler=simple, cfg=1.0)
VAEDecode → SaveImage

Gotcha: The FLUX.1-schnell model requires specific CLIP and T5 text encoder variants. Using incorrect versions will produce distorted outputs or runtime errors.

Known Issues & Fixes

Problem	Cause	Fix
Mistral restarts after `docker stop`	systemd `unless-stopped`	`sudo systemctl stop sglang-mistral4`
PyTorch warning about SM121A	PyTorch doesn’t officially support SM121A yet	Harmless, ignore it
torch.compile disabled	Triton lacks SM121A support	Expected; `TORCH_COMPILE_DISABLE=1`
IPv6 download hangs	HF CDN blocks IPv6	`wget -4`
Download interrupted	Network dropout	`wget -c`, rerun script, it skips finished files
ComfyUI not responding immediately	Still initializing	Wait 30-60s
403 on HF download	Gated repo; terms not accepted	Visit huggingface.co/black-forest-labs/FLUX.1-schnell → “Agree and access”
SageAttention build fails	Missing CUDA 13+ toolkit	Install CUDA 13.0+ and set `CUDA_HOME`
Systemd service fails to start	Port conflict	Check `sudo lsof -i :8000` and adjust ports
ComfyUI crashes on startup	Outdated PyTorch version	Upgrade to PyTorch 2.3+ with SageAttention support

Watch out: The DGX Spark’s ARM64 architecture means some x86-optimized Python packages may fail. Always verify package compatibility before installation.

What I Actually Use

NVIDIA DGX Spark: ARM64 server with 128GB unified memory and GB10 Blackwell

SparkyUI: Docker image with SageAttention for SM121A

FLUX.1-schnell: Apache 2.0 model for unrestricted commercial use

Stack