The 3.5-Hour Deadlock That Was Really an AttributeError

May 3, 2026 6 min read

The vllm-omni TTS container froze for 3.5 hours on a DGX Spark with NVIDIA GB10 Blackwell, showing zero GPU usage and no errors.

Quick Take

A single AttributeError in VoxtralTTSConfig looked like a GPU hang because the crash was hidden behind a long startup_future.result(timeout=300).

The bug was introduced when transformers 5.5.4 added validate_token_ids() to PretrainedConfig.__init__, which accessed self.text_config before vllm-omni had set it.

The fix required moving three lines of code, no hardware tweaks, no new images, no rebuilds.

The Silent Container Freeze

We migrated the podcast pipeline’s TTS module from mistral_inference to vllm-omni using the official Docker image:

FROM vllm/vllm-openai:latest
RUN pip install git+https://github.com/vllm-project/vllm-omni.git

After the 7.5 GB model download completed, the container printed:

(APIServer pid=1) INFO [weight_utils.py:50] Using model weights format ['*']

Then it sat. Docker stats showed 0.16 % CPU, 2.3 GB RAM, 0 B network I/O, and frozen block I/O. nvidia-smi reported 0 MiB GPU memory used. The process list inside the container (ps -eo pid,stat) showed 127 threads all in S (sleeping) state with no further log lines for 3.5 hours.

This looked like a deadlock, no progress, no errors, no crash.

The Wrong Suspects

Our first hypothesis was Blackwell-specific: PTX-JIT compilation hangs on Compute Capability 12.1 (sm_121a). Evidence:

torch.cuda.get_arch_list() returned ['sm_80', 'sm_90', 'sm_100', 'sm_120', 'compute_120']
PyTorch warned on startup: “Maximum cuda capability supported is (8.0) - (12.0)”
This meant the runtime would compile PTX kernels at runtime for sm_121a, which can stall on fresh hardware without prebuilt kernels.

We tested six plausible fixes:

#	Fix	Cost	Rationale
1	`--enforce-eager`	20-30 % slower	disables torch.compile and CUDA Graphs
2	`VLLM_DISABLED_KERNELS=cutlass_moe_mm,cutlass_scaled_mm` + `TORCH_CUDA_ARCH_LIST=12.0`	none	CUTLASS lacks `enable_sm120_family`
3	Prebuilt Blackwell images	none	community images already exist
4	Pin versions `vllm==0.18.0 + vllm-omni==0.18.0`	none	older combos are more stable
5	Prebuilt PyTorch wheels for `sm_121a`	none	replaces source builds
6	Source build with `TORCH_CUDA_ARCH_LIST=12.1a`	25-45 min build	last resort

We tried options 1 and 2. Both failed at the same log line:

(APIServer pid=1) INFO [weight_utils.py:50] Using model weights format ['*']
[... silence ...]

At this point it was tempting to conclude the issue was robust against torch.compile tweaks, so it must be the PTX-JIT. The next step would have been a full source build.

The Breakthrough Came from Raw Logs

The key was widening the log filter:

docker logs -f voxtral 2>&1 | grep -E --line-buffered \
  "Traceback|Error|RuntimeError|AttributeError|Uvicorn|Loading|ready|CUDA"

After 5 minutes the filter timed out with no matches. But when I checked the container status immediately afterward, it had exited with code 1. The container had crashed, not hung. Our log filter had missed the crash because the traceback only appeared in full when we dumped the entire log:

docker logs voxtral 2>&1 | tail -80

Without this raw dump, our output summarizer had collapsed the output to “0 errors, 6 warnings,” leading us to falsely conclude the process was still running.

The last lines revealed the real error:

File "vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py",
  line 36, in get_text_config
    return self.text_config
File "transformers/configuration_utils.py", line 422, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'VoxtralTTSConfig' object has no attribute 'text_config'.
  Did you mean: 'get_text_config'?

RuntimeError: Orchestrator initialization failed:
  'VoxtralTTSConfig' object has no attribute 'text_config'

No deadlock. A crash hidden behind a long startup_future.result(timeout=...) timeout. In our first session the process had waited 3.5 hours, likely the same crash propagating through the layers before the traceback surfaced.

Why This Breaks

The code in vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py defines:

class VoxtralTTSConfig(PretrainedConfig):
    model_type = "voxtral_tts"

    def __init__(
        self,
        text_config: PretrainedConfig | dict | None = None,
        audio_config: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)   # (1)

        if isinstance(text_config, PretrainedConfig):
            self.text_config = text_config   # (2)
        elif isinstance(text_config, dict):
            self.text_config = PretrainedConfig.from_dict(text_config)
        else:
            self.text_config = PretrainedConfig()

        self.audio_config = audio_config or {}

    def get_text_config(self, **kwargs: Any) -> PretrainedConfig:
        return self.text_config   # (3)

At first glance this looks correct: self.text_config is set in all three branches. The problem is at line (1), calling super().__init__(**kwargs) before self.text_config exists.

In transformers 5.5.4, PretrainedConfig.__init__ now runs HuggingFace Hub’s dataclass validator (huggingface_hub/dataclasses.py:251: validator(self)), which includes validate_token_ids() (transformers/configuration_utils.py:446). This method calls self.get_text_config(decoder=True), which in (3) reads self.text_config, but (2) hasn’t executed yet because (1) happens first.

This is a classic Python initialization order bug combined with library drift. vllm-omni was written against an older transformers version where the parent __init__ didn’t access subclass attributes. The new transformers 5.x added this validation, and the existing order breaks.

Three Lines to Fix It

Reorder the initialization so self.text_config exists before the parent __init__ runs:

def __init__(self, text_config=None, audio_config=None, **kwargs):
    if isinstance(text_config, PretrainedConfig):
        self.text_config = text_config
    elif isinstance(text_config, dict):
        self.text_config = PretrainedConfig.from_dict(text_config)
    else:
        self.text_config = PretrainedConfig()

    super().__init__(**kwargs)   # now safe to call

    self.audio_config = audio_config or {}

That’s it. No rebuilds, no hardware changes, no new images. The container starts in under a minute.

What I Actually Use

vllm/vllm-openai:latest, the official image for serving open-weight models

vllm-omni main, the TTS extension that integrates Voxtral models

NVIDIA GB10 Blackwell with 128 GB unified memory, the hardware that exposed the init-order bug

Status update (2026-05-04): fixed upstream

The init-order bug this article documents has been fixed in vllm-omni main as of 2026-04-25 by PR #3065 (“Migrate Voxtral TTS config and parser registry”). The same PR also moved the file from vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py to vllm_omni/transformers_utils/configs/voxtral_tts.py. A follow-up PR #3232 added an explanatory comment in the source documenting why assignment-before-super is required.

To be clear about credit: the upstream fix was authored by yuanheng-zhao independent of this article. I documented the bug separately on this blog while running into it locally, but I did not file an issue or PR upstream for this specific fix, so the merge is not from my contribution. Recording it here to keep the timeline honest and to point readers at the canonical fix.

What this means for operators:

On a post-2026-04-25 vllm-omni snapshot, the bug is gone, no local patch needed.
The local patch_voxtral_config.py in this repo (/data/config/voxtral/patch_voxtral_config.py) is now a no-op that exits cleanly when it cannot find the old file path. Kept in the tree for anyone still running a pre-#3065 build.
The article remains valid as a record of the diagnosis, the symptom, and the fix shape. Useful for understanding what the upstream PR actually changed, even if the fix is no longer something you apply yourself.

The earlier postscript on this article (about the patch being baked into the Dockerfile build step) is still factually correct for pre-#3065 vllm-omni builds, but is operationally obsolete on current upstream.

Flow

Silent Freeze Debug

From hidden AttributeError to minimal fix

Symptom Container froze 3.5h with 0 GPU usage

Misdiagnosis PTX-JIT compilation hang on Blackwell

Log Analysis Widen logs revealed hidden AttributeError

Root Cause transformers 5.5.4 init accessed unset text_config

Fix Moved 3 lines of code in VoxtralTTSConfig

Result No hardware changes, no rebuilds needed