How a three-line Python init order bug masqueraded as a Blackwell GPU hang, and why checking raw logs beat all hardware theories.

The 3.5-Hour Deadlock That Was Really an AttributeError

The vllm-omni TTS container froze for 3.5 hours on a DGX Spark with NVIDIA GB10 Blackwell, showing zero GPU usage and no errors.

Quick Take

  • A single AttributeError in VoxtralTTSConfig looked like a GPU hang because the crash was hidden behind a long startup_future.result(timeout=300).
  • The bug was introduced when transformers 5.5.4 added validate_token_ids() to PretrainedConfig.__init__, which accessed self.text_config before vllm-omni had set it.
  • The fix required moving three lines of code, no hardware tweaks, no new images, no rebuilds.

The Silent Container Freeze

We migrated the podcast pipeline’s TTS module from mistral_inference to vllm-omni using the official Docker image:

FROM vllm/vllm-openai:latest
RUN pip install git+https://github.com/vllm-project/vllm-omni.git

After the 7.5 GB model download completed, the container printed:

(APIServer pid=1) INFO [weight_utils.py:50] Using model weights format ['*']

Then it sat. Docker stats showed 0.16 % CPU, 2.3 GB RAM, 0 B network I/O, and frozen block I/O. nvidia-smi reported 0 MiB GPU memory used. The process list inside the container (ps -eo pid,stat) showed 127 threads all in S (sleeping) state with no further log lines for 3.5 hours.

This looked like a deadlock, no progress, no errors, no crash.

The Wrong Suspects

Our first hypothesis was Blackwell-specific: PTX-JIT compilation hangs on Compute Capability 12.1 (sm_121a). Evidence:

We tested six plausible fixes:

#FixCostRationale
1--enforce-eager20-30 % slowerdisables torch.compile and CUDA Graphs
2VLLM_DISABLED_KERNELS=cutlass_moe_mm,cutlass_scaled_mm + TORCH_CUDA_ARCH_LIST=12.0noneCUTLASS lacks enable_sm120_family
3Prebuilt Blackwell imagesnonecommunity images already exist
4Pin versions vllm==0.18.0 + vllm-omni==0.18.0noneolder combos are more stable
5Prebuilt PyTorch wheels for sm_121anonereplaces source builds
6Source build with TORCH_CUDA_ARCH_LIST=12.1a25-45 min buildlast resort

We tried options 1 and 2. Both failed at the same log line:

(APIServer pid=1) INFO [weight_utils.py:50] Using model weights format ['*']
[... silence ...]

At this point it was tempting to conclude the issue was robust against torch.compile tweaks, so it must be the PTX-JIT. The next step would have been a full source build.

The Breakthrough Came from Raw Logs

The key was widening the log filter:

docker logs -f voxtral 2>&1 | grep -E --line-buffered \
  "Traceback|Error|RuntimeError|AttributeError|Uvicorn|Loading|ready|CUDA"

After 5 minutes the filter timed out with no matches. But when I checked the container status immediately afterward, it had exited with code 1. The container had crashed, not hung. Our log filter had missed the crash because the traceback only appeared in full when we dumped the entire log:

docker logs voxtral 2>&1 | tail -80

Without this raw dump, our output summarizer had collapsed the output to “0 errors, 6 warnings,” leading us to falsely conclude the process was still running.

The last lines revealed the real error:

File "vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py",
  line 36, in get_text_config
    return self.text_config
File "transformers/configuration_utils.py", line 422, in __getattribute__
    return super().__getattribute__(key)
AttributeError: 'VoxtralTTSConfig' object has no attribute 'text_config'.
  Did you mean: 'get_text_config'?

RuntimeError: Orchestrator initialization failed:
  'VoxtralTTSConfig' object has no attribute 'text_config'

No deadlock. A crash hidden behind a long startup_future.result(timeout=...) timeout. In our first session the process had waited 3.5 hours, likely the same crash propagating through the layers before the traceback surfaced.

Why This Breaks

The code in vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py defines:

class VoxtralTTSConfig(PretrainedConfig):
    model_type = "voxtral_tts"

    def __init__(
        self,
        text_config: PretrainedConfig | dict | None = None,
        audio_config: dict[str, Any] | None = None,
        **kwargs: Any,
    ) -> None:
        super().__init__(**kwargs)   # (1)

        if isinstance(text_config, PretrainedConfig):
            self.text_config = text_config   # (2)
        elif isinstance(text_config, dict):
            self.text_config = PretrainedConfig.from_dict(text_config)
        else:
            self.text_config = PretrainedConfig()

        self.audio_config = audio_config or {}

    def get_text_config(self, **kwargs: Any) -> PretrainedConfig:
        return self.text_config   # (3)

At first glance this looks correct: self.text_config is set in all three branches. The problem is at line (1), calling super().__init__(**kwargs) before self.text_config exists.

In transformers 5.5.4, PretrainedConfig.__init__ now runs HuggingFace Hub’s dataclass validator (huggingface_hub/dataclasses.py:251: validator(self)), which includes validate_token_ids() (transformers/configuration_utils.py:446). This method calls self.get_text_config(decoder=True), which in (3) reads self.text_config, but (2) hasn’t executed yet because (1) happens first.

This is a classic Python initialization order bug combined with library drift. vllm-omni was written against an older transformers version where the parent __init__ didn’t access subclass attributes. The new transformers 5.x added this validation, and the existing order breaks.

Three Lines to Fix It

Reorder the initialization so self.text_config exists before the parent __init__ runs:

def __init__(self, text_config=None, audio_config=None, **kwargs):
    if isinstance(text_config, PretrainedConfig):
        self.text_config = text_config
    elif isinstance(text_config, dict):
        self.text_config = PretrainedConfig.from_dict(text_config)
    else:
        self.text_config = PretrainedConfig()

    super().__init__(**kwargs)   # now safe to call

    self.audio_config = audio_config or {}

That’s it. No rebuilds, no hardware changes, no new images. The container starts in under a minute.

What I Actually Use

  • vllm/vllm-openai:latest, the official image for serving open-weight models
  • vllm-omni main, the TTS extension that integrates Voxtral models
  • NVIDIA GB10 Blackwell with 128 GB unified memory, the hardware that exposed the init-order bug

Status update (2026-05-04): fixed upstream

The init-order bug this article documents has been fixed in vllm-omni main as of 2026-04-25 by PR #3065 (“Migrate Voxtral TTS config and parser registry”). The same PR also moved the file from vllm_omni/model_executor/models/voxtral_tts/configuration_voxtral_tts.py to vllm_omni/transformers_utils/configs/voxtral_tts.py. A follow-up PR #3232 added an explanatory comment in the source documenting why assignment-before-super is required.

To be clear about credit: the upstream fix was authored by yuanheng-zhao independent of this article. I documented the bug separately on this blog while running into it locally, but I did not file an issue or PR upstream for this specific fix, so the merge is not from my contribution. Recording it here to keep the timeline honest and to point readers at the canonical fix.

What this means for operators:

The earlier postscript on this article (about the patch being baked into the Dockerfile build step) is still factually correct for pre-#3065 vllm-omni builds, but is operationally obsolete on current upstream.

Flow

Silent Freeze Debug

From hidden AttributeError to minimal fix

1
Symptom Container froze 3.5h with 0 GPU usage
2
Misdiagnosis PTX-JIT compilation hang on Blackwell
3
Log Analysis Widen logs revealed hidden AttributeError
4
Root Cause transformers 5.5.4 init accessed unset text_config
5
Fix Moved 3 lines of code in VoxtralTTSConfig
6
Result No hardware changes, no rebuilds needed