All articles tagged "tts" — self-hosted AI fixes, setups, and architecture notes.
Eleven VibeVoice renders, one Voxtral baseline, the operator's ears. The first day of the three-day TTS spike that follows the V6=0/10 verdict. Engineering-log shape, with the actual audio embedded.
Read article →
Eight engineering fixes deep, three weeks of patches, two failure modes on the same engine. The Voxtral open checkpoint has no path to release-quality podcast audio. The drama of staying with it anyway, and the three engines I plan to spike next.
Per-block ffmpeg loudnorm averages multiple speakers to one gain, leaving the quieter voice quieter. Dynamic-mode loudnorm eats the first 3 seconds of audio.
Voxtral 4B advertises voice cloning, accepts ref_audio in the API, then crashes the engine because the encoder weights live only in Mistral's hosted product.
Rendering a 367-character podcast turn as one Voxtral call takes 21 seconds. Split into 90-character chunks: 35 seconds. Same words, same voice, 38 percent more wallclock.
How a silent AttributeError nearly killed our TTS pipeline, and why three lines of code fixed it forever.
How a three-line Python init order bug masqueraded as a Blackwell GPU hang, and why checking raw logs beat all hardware theories.
How a single flag killed my self-hosted TTS stack, and how I fixed it without losing a second of audio.