WER: how often the words come out wrong : Learn

WER (word error rate) is a way to score a TTS model's intelligibility. You run the generated audio through a speech recognizer, compare what it transcribes against the text you asked for, and count the fraction of words that were wrong (substituted, dropped, or added). A lower WER means the words came out clearly; it says nothing about whether the voice sounds human.

How does WER work?

WER closes a loop. You give the TTS model a sentence, record what it speaks, then feed that audio into a speech recognizer and read back its transcript. You line the transcript up against the original text and count three kinds of mistake: words that were swapped for the wrong word, words that were dropped, and words that were inserted. The total errors over the total words is the rate.

Because the whole thing is mechanical, WER is cheap to run at scale and gives you a single number to sort models by. The catch is that it inherits the recognizer’s own blind spots. If the recognizer mishears a fine but unusual pronunciation, that counts against the TTS model even though a human would have understood it.

Why does WER matter, and where does it stop?

WER is the floor you want every voice to clear. A model that scores badly is dropping or garbling words, and no amount of pleasant tone fixes a sentence you cannot follow. So WER is a good gate: it catches the systems that are unstable or unintelligible before you waste time on anything else.

What WER does not tell you is whether the voice sounds alive. A flat, robotic reading of every word in the right order scores beautifully. That is the trap: a model can win on WER and still feel lifeless, so you pair it with measures of naturalness and prosody before deciding a voice is actually good.

WER: how often the words come out wrong

At a glance

How does WER work?

Why does WER matter, and where does it stop?

WER

A naturalness score

Related terms

At a glance

How does WER work?

Why does WER matter, and where does it stop?

WER

A naturalness score

Related terms

Go deeper