Optimized workflow for running FLUX.1-schnell and Mistral sequentially on NVIDIA DGX Spark with 128GB unified memory

ComfyUI + FLUX.1-schnell Setup


Architecture Decisions

Why sequential instead of parallel?

The DGX Spark’s 128GB unified memory pool is shared between CPU and GPU. Mistral consumes ~93GB during inference, while ComfyUI requires ~25-30GB. Running both simultaneously triggers OOM kills because the system cannot allocate sufficient contiguous memory blocks. Sequential execution ensures each service receives the full 128GB allocation during its runtime window.

Blog workflow:

  1. Write article with Mistral → generate image prompts
  2. Desktop link “ComfyUI” → stops Mistral, opens ComfyUI
  3. Generate images
  4. Desktop link “Mistral” → stops ComfyUI, starts Mistral
  5. Merge article + images → publish
#!/bin/bash
sudo systemctl stop sglang-mistral4 sglang-healthcheck.timer
sleep 3
bash ~/bin/start-comfyui-optimized.sh
xdg-open http://127.0.0.1:8188

Watch out: The DGX Spark’s unified memory architecture means GPU and CPU compete for the same pool. Even if total memory appears sufficient, fragmentation can cause allocation failures during large model loads.

Why FLUX.1-schnell?

FLUX.1-schnell operates under Apache 2.0 license, granting unrestricted commercial usage rights without attribution requirements. This contrasts with many diffusion models that impose non-commercial clauses or require licensing fees. The generated images are yours to use without legal encumbrances.

Gotcha: While FLUX.1-schnell is Apache 2.0 licensed, some derivative workflows or custom nodes in ComfyUI may have separate licenses. Always verify the license status of any custom nodes you import.

Why SparkyUI instead of standard Docker?

Standard ComfyUI Docker images target amd64 architectures, which lack support for the GB10’s SM121.1 compute capability. The SparkyUI image includes SageAttention optimizations specifically compiled for compute capability 12.1, which provides 2-3x speed improvements on the GB10 Blackwell GPU. Building the image locally (~20 minutes) ensures compatibility with the DGX Spark’s ARM64 architecture.

Limitation: The SageAttention compilation process requires CUDA 13+ toolkit and specific NVIDIA driver versions. Mismatched versions will cause build failures or runtime errors.

Why systemctl instead of docker stop for Mistral?

Mistral runs under sglang-mistral4.service with unless-stopped restart policy. A plain docker stop triggers systemd’s auto-restart mechanism, causing immediate service resurrection. All scripts must use sudo systemctl stop sglang-mistral4 to properly terminate the service.

Caveat: The unless-stopped policy means systemd will restart the service if it crashes or exits unexpectedly. This can mask underlying issues during development.


ActionScript
Start ComfyUI (stops Mistral)~/bin/start-comfyui-safe.sh / Desktop link
Stop ComfyUIbash /data/scripts/stop-comfyui.sh
Start Mistral (stops ComfyUI)bash /data/scripts/start-mistral.sh / Desktop link
Stop all AI services~/bin/stop-ai-services.sh / Desktop link

Mistral start script (start-mistral.sh):

#!/bin/bash
docker stop comfyui || true
sleep 3
sudo systemctl start sglang-healthcheck.timer sglang-mistral4
until curl -s http://127.0.0.1:8000/health >/dev/null; do sleep 5; done
echo "Mistral ready"

Watch out: The health check endpoint may take 30-60 seconds to become available after service startup. Scripts should implement proper wait loops rather than assuming immediate readiness.


One-Time Installation

1. Download models (~29 GB)

bash /data/scripts/install-comfyui-optimized.sh

Gotcha: The FLUX.1-schnell model file (flux1-schnell.safetensors) is 12GB. Downloading over IPv6 may hang due to CDN restrictions. Use wget -4 to force IPv4 connections.

2. Build Docker image (once, ~20 min)

cd /data/projects/comfyui
sudo docker compose build

This compiles SageAttention for SM121A (TORCH_CUDA_ARCH_LIST="12.1"). Only needed on updates or when changing base images.

Limitation: The build process requires approximately 20GB of temporary disk space. Ensure /tmp has sufficient free space or configure Docker to use an alternate temp directory.

3. Configuration (/data/projects/comfyui/.env)

COMFYUI_HOST_PATH=/ai/models/ComfyUI
SPARKYUI_DATA_PATH=/data/comfyui
COMFYUI_PORT=8188
COMFYUIMINI_PORT=8189
COMFYUI_FLAGS=--listen 0.0.0.0 --port 8188 --disable-pinned-memory \
  --force-fp16 --fp16-unet --fp16-vae --fp16-text-enc \
  --dont-upcast-attention --use-sage-attention

Caveat: The --disable-pinned-memory flag reduces unified fabric overhead but may increase latency for some operations. Benchmark your specific workload before deployment.


GB10 Optimizations

ComfyUI Flags

--disable-pinned-memory     # Cuts unified fabric overhead
--force-fp16                # SageAttention only supports FP16
--fp16-unet --fp16-vae --fp16-text-enc
--dont-upcast-attention     # Keep attention layers in FP16
--use-sage-attention        # Use SageAttention backend (SM121A)

Avoid these:

Watch out: The --use-sage-attention flag requires PyTorch 2.3+ with SageAttention patches. Older versions will fail silently or fall back to standard attention mechanisms.

Docker Environment

TORCH_COMPILE_DISABLE=1      # Triton has no SM121A support yet
TORCHDYNAMO_DISABLE=1
PYTORCH_NO_CUDA_MEMORY_CACHING=1
CUDA_MANAGED_FORCE_DEVICE_ALLOC=1
OMP_NUM_THREADS=20           # Use all ARM cores

Limitation: The TORCH_COMPILE_DISABLE=1 setting disables PyTorch’s compilation optimizations, which can reduce performance by 15-20% on some workloads. This is a necessary tradeoff for SM121A compatibility.


FLUX.1-schnell Workflow in ComfyUI

For blog images (4-step, ~10-30s after warmup):

DualCLIPLoader
  ├── clip_name1: clip_l.safetensors         (type: clip_l)
  └── clip_name2: t5xxl_fp8_e4m3fn.safetensors  (type: t5)
      └── CLIPTextEncode (positive prompt)

UNETLoader
  └── flux1-schnell.safetensors  (weight_dtype: default)
      └── ModelSamplingFlux → BasicGuider

VAELoader → ae.safetensors
EmptyLatentImage (1024×1024)
KSampler (steps=4, sampler=euler, scheduler=simple, cfg=1.0)
VAEDecode → SaveImage

Gotcha: The FLUX.1-schnell model requires specific CLIP and T5 text encoder variants. Using incorrect versions will produce distorted outputs or runtime errors.


Known Issues & Fixes

ProblemCauseFix
Mistral restarts after docker stopsystemd unless-stoppedsudo systemctl stop sglang-mistral4
PyTorch warning about SM121APyTorch doesn’t officially support SM121A yetHarmless, ignore it
torch.compile disabledTriton lacks SM121A supportExpected; TORCH_COMPILE_DISABLE=1
IPv6 download hangsHF CDN blocks IPv6wget -4
Download interruptedNetwork dropoutwget -c, rerun script, it skips finished files
ComfyUI not responding immediatelyStill initializingWait 30-60s
403 on HF downloadGated repo; terms not acceptedVisit huggingface.co/black-forest-labs/FLUX.1-schnell → “Agree and access”
SageAttention build failsMissing CUDA 13+ toolkitInstall CUDA 13.0+ and set CUDA_HOME
Systemd service fails to startPort conflictCheck sudo lsof -i :8000 and adjust ports
ComfyUI crashes on startupOutdated PyTorch versionUpgrade to PyTorch 2.3+ with SageAttention support

Watch out: The DGX Spark’s ARM64 architecture means some x86-optimized Python packages may fail. Always verify package compatibility before installation.


What I Actually Use

  • NVIDIA DGX Spark: ARM64 server with 128GB unified memory and GB10 Blackwell
  • SparkyUI: Docker image with SageAttention for SM121A
  • FLUX.1-schnell: Apache 2.0 model for unrestricted commercial use
Stack

ComfyUI + FLUX.1-schnell Setup

Self-hosted AI workflow on NVIDIA DGX Spark

6
Management systemctl scripts for clean service control
5
Workflow Sequential service switching via systemd
4
Model FLUX.1-schnell (Apache 2.0 licensed)
3
Runtime Docker with SparkyUI (SM121A optimized)
2
OS Layer Linux with CUDA 13+ support
1
Hardware NVIDIA DGX Spark (128GB unified memory)