All articles tagged "voxtral" — self-hosted AI fixes, setups, and architecture notes.
Eight engineering fixes deep, three weeks of patches, two failure modes on the same engine. The Voxtral open checkpoint has no path to release-quality podcast audio. The drama of staying with it anyway, and the three engines I plan to spike next.
Read article →
Per-block ffmpeg loudnorm averages multiple speakers to one gain, leaving the quieter voice quieter. Dynamic-mode loudnorm eats the first 3 seconds of audio.
The intro music wasn't playing for the first four seconds of every podcast episode. RMS at minus infinity. The fix was one keyword, eval=frame.
Voxtral 4B advertises voice cloning, accepts ref_audio in the API, then crashes the engine because the encoder weights live only in Mistral's hosted product.
Rendering a 367-character podcast turn as one Voxtral call takes 21 seconds. Split into 90-character chunks: 35 seconds. Same words, same voice, 38 percent more wallclock.
How we fixed loudness pumping, markup stripping, and dialogue rhythm in a self-hosted podcast pipeline
How a silent AttributeError nearly killed our TTS pipeline, and why three lines of code fixed it forever.
How a three-line Python init order bug masqueraded as a Blackwell GPU hang, and why checking raw logs beat all hardware theories.
How a single flag killed my self-hosted TTS stack, and how I fixed it without losing a second of audio.