The 3.5-Hour Deadlock That Was Really an AttributeError
The vllm-omni TTS container froze for 3.5 hours on a DGX Spark with NVIDIA GB10 Blackwell, showing zero GPU usage and no errors.
Quick Take
- A single
AttributeErrorinVoxtralTTSConfiglooked like a GPU hang because the crash was hidden behind a longstartup_future.result(timeout=300).- The bug was introduced when
transformers 5.5.4addedvalidate_token_ids()toPretrainedConfig.__init__, which accessedself.text_configbeforevllm-omnihad set it.- The fix required moving three lines of code, no hardware tweaks, no new images, no rebuilds.
The Silent Container Freeze
We migrated the podcast pipeline’s TTS module from mistral_inference to vllm-omni using the official Docker image:
FROM vllm/vllm-openai:latest
RUN pip install git+https://github.com/vllm-project/vllm-omni.git
After the 7.5 GB model download completed, the container printed:
(APIServer pid=1) INFO [weight_utils.py:50] Using model weights format ['*']
Then it sat. Docker stats showed 0.16 % CPU, 2.3 GB RAM, 0 B network I/O, and frozen block I/O. nvidia-smi reported 0 MiB GPU memory used. The process list inside the container (ps -eo pid,stat) showed 127 threads all in S (sleeping) state with no further log lines for 3.5 hours.
This looked like a deadlock, no progress, no errors, no crash.
The Wrong Suspects
Our first hypothesis was Blackwell-specific: PTX-JIT compilation hangs on Compute Capability 12.1 (sm_121a). Evidence:
torch.cuda.get_arch_list()returned['sm_80', 'sm_90', 'sm_100', 'sm_120', 'compute_120']- PyTorch warned on startup: “Maximum cuda capability supported is (8.0) - (12.0)”
- This meant the runtime would compile PTX kernels at runtime for
sm_121a, which can stall on fresh hardware without prebuilt kernels.
We tested six plausible fixes:
| # | Fix | Cost | Rationale |
|---|---|---|---|
| 1 | --enforce-eager | 20-30 % slower | disables torch.compile and CUDA Graphs |
| 2 | VLLM_DISABLED_KERNELS=cutlass_moe_mm,cutlass_scaled_mm + TORCH_CUDA_ARCH_LIST=12.0 | none | CUTLASS lacks enable_sm120_family |
| 3 | Prebuilt Blackwell images | none | community images already exist |
| 4 | Pin versions vllm==0.18.0 + vllm-omni==0.18.0 | none | older combos are more stable |
| 5 | Prebuilt PyTorch wheels for sm_121a | none | replaces source builds |
| 6 | Source build with TORCH_CUDA_ARCH_LIST=12.1a | 25-45 min build | last resort |
We tried options 1 and 2. Both failed at the same log line:
(APIServer pid=1) INFO [weight_utils.py:50] Using model weights format ['*']
[... silence ...]
At this point it was tempting to conclude the issue was robust against torch.compile tweaks, so it must be the PTX-JIT. The next step would have been a full source build.
The Breakthrough Came from Raw Logs
The key was widening the log filter:
docker logs -f voxtral 2>&1 | grep -E --line-buffered \
"Traceback|Error|RuntimeError|AttributeError|Uvicorn|Loading|ready|CUDA"
After 5 minutes the filter timed out with no matches. But when I checked the container status immediately afterward, it had exited with code 1. The container had crashed, not hung. Our log filter had missed the crash because the traceback only appeared in full when we dumped the entire log:
docker logs voxtral 2>&1 | tail -80
Without this raw dump, our output summarizer had collapsed the output to “0 errors, 6 warnings,” leading us to falsely conclude the process was still running.
The last lines revealed the real error:
File "vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py",
line 36, in get_text_config
return self.text_config
File "transformers/configuration_utils.py", line 422, in __getattribute__
return super().__getattribute__(key)
AttributeError: 'VoxtralTTSConfig' object has no attribute 'text_config'.
Did you mean: 'get_text_config'?
RuntimeError: Orchestrator initialization failed:
'VoxtralTTSConfig' object has no attribute 'text_config'
No deadlock. A crash hidden behind a long startup_future.result(timeout=...) timeout. In our first session the process had waited 3.5 hours, likely the same crash propagating through the layers before the traceback surfaced.
Why This Breaks
The code in vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py defines:
class VoxtralTTSConfig(PretrainedConfig):
model_type = "voxtral_tts"
def __init__(
self,
text_config: PretrainedConfig | dict | None = None,
audio_config: dict[str, Any] | None = None,
**kwargs: Any,
) -> None:
super().__init__(**kwargs) # (1)
if isinstance(text_config, PretrainedConfig):
self.text_config = text_config # (2)
elif isinstance(text_config, dict):
self.text_config = PretrainedConfig.from_dict(text_config)
else:
self.text_config = PretrainedConfig()
self.audio_config = audio_config or {}
def get_text_config(self, **kwargs: Any) -> PretrainedConfig:
return self.text_config # (3)
At first glance this looks correct: self.text_config is set in all three branches. The problem is at line (1), calling super().__init__(**kwargs) before self.text_config exists.
In transformers 5.5.4, PretrainedConfig.__init__ now runs HuggingFace Hub’s dataclass validator (huggingface_hub/dataclasses.py:251: validator(self)), which includes validate_token_ids() (transformers/configuration_utils.py:446). This method calls self.get_text_config(decoder=True), which in (3) reads self.text_config, but (2) hasn’t executed yet because (1) happens first.
This is a classic Python initialization order bug combined with library drift. vllm-omni was written against an older transformers version where the parent __init__ didn’t access subclass attributes. The new transformers 5.x added this validation, and the existing order breaks.
Three Lines to Fix It
Reorder the initialization so self.text_config exists before the parent __init__ runs:
def __init__(self, text_config=None, audio_config=None, **kwargs):
if isinstance(text_config, PretrainedConfig):
self.text_config = text_config
elif isinstance(text_config, dict):
self.text_config = PretrainedConfig.from_dict(text_config)
else:
self.text_config = PretrainedConfig()
super().__init__(**kwargs) # now safe to call
self.audio_config = audio_config or {}
That’s it. No rebuilds, no hardware changes, no new images. The container starts in under a minute.
What I Actually Use
- vllm/vllm-openai:latest, the official image for serving open-weight models
- vllm-omni main, the TTS extension that integrates Voxtral models
- NVIDIA GB10 Blackwell with 128 GB unified memory, the hardware that exposed the init-order bug
Status update (2026-05-04): fixed upstream
The init-order bug this article documents has been fixed in vllm-omni main as of 2026-04-25 by PR #3065 (“Migrate Voxtral TTS config and parser registry”). The same PR also moved the file from vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py to vllm_omni/transformers_utils/configs/voxtral_tts.py. A follow-up PR #3232 added an explanatory comment in the source documenting why assignment-before-super is required.
To be clear about credit: the upstream fix was authored by yuanheng-zhao independent of this article. I documented the bug separately on this blog while running into it locally, but I did not file an issue or PR upstream for this specific fix, so the merge is not from my contribution. Recording it here to keep the timeline honest and to point readers at the canonical fix.
What this means for operators:
- On a post-2026-04-25 vllm-omni snapshot, the bug is gone, no local patch needed.
- The local
patch_voxtral_config.pyin this repo (/data/config/voxtral/patch_voxtral_config.py) is now a no-op that exits cleanly when it cannot find the old file path. Kept in the tree for anyone still running a pre-#3065 build. - The article remains valid as a record of the diagnosis, the symptom, and the fix shape. Useful for understanding what the upstream PR actually changed, even if the fix is no longer something you apply yourself.
The earlier postscript on this article (about the patch being baked into the Dockerfile build step) is still factually correct for pre-#3065 vllm-omni builds, but is operationally obsolete on current upstream.
Silent Freeze Debug
From hidden AttributeError to minimal fix