This study evaluates a method for determining the quality of
synthetic speech systems. The method involves the use of an
auditory lexical decision task to assess the quality of synthetic
speech generators relative to each other and to natural speech by
using reaction time differences and error rates. Seven voices were
evaluated; four synthesizers provided six voices (DECtalk 1.8 Perfect
Paul, DECtaik 1.8 Beautiful Betty, DECtaik 2.0 Perfect Paul, DEC talk 2.0
Beautiful Betty, Votrax Personal Speech, Votrax Type'n'Talk) and
natural speech provided the seventh voice. Both reaction times and
error rates were higher for the low quality synthetic speech systems.
The results document that the DECtalk can currently be considered a
high quality synthesizer and that the Personal Speech and the
Type'n'Talk can be considered low quality synthesizers. The results
obtained by using this method can be explained by use of the
Activation-Verification model (Paap, McDonald, Schvaneveldt, and
Noel, 1986). Within the framework of this model, the results of this
study suggest that the verification phase is the bottle-neck in
processing words produced by synthetic speech generators. This
interpretation suggests that by emphasizing the differences between
different phonemes, to make them more uniquely identifiable, rather
than concentrating on making them more "natural" might lead to
improved results with synthesized speech.