Example utterance types (Festival Speech Synthesis System)

14.3 Example utterance types

A number of utterance types are currently supported. It is easy to add new ones but the standard distribution includes the following.

Text ¶

Raw text as a string.

(Utterance Text "This is an example")

Words ¶

A list of words

(Utterance Words (this is an example))

Words may be atomic or lists if further features need to be specified. For example to specify a word and its part of speech you can use

(Utterance Words (I (live (pos v)) in (Reading (pos n) (tone H-H%))))

Note: the use of the tone feature requires an intonation mode that supports it.

Any feature and value named in the input will be added to the Word item.

Phrase

This allows explicit phrasing and features on Tokens to be specified. The input consists of a list of phrases each contains a list of tokens.

(Utterance
 Phrase
 ((Phrase ((name B))
   I saw the man
   (in ((EMPH 1)))
   the park)
  (Phrase ((name BB))
   with the telescope)))

ToBI tones and accents may also be specified on Tokens but these will only take effect if the selected intonation method uses them.

Segments ¶

This allows specification of segments, durations and F0 target values.

(Utterance
 Segments
 ((# 0.19 )
  (h 0.055 (0 115))
  (@ 0.037 (0.018 136))
  (l 0.064 )
  (ou 0.208 (0.0 134) (0.100 135) (0.208 123))
  (# 0.19)))

Note the times are in seconds NOT milliseconds. The format of each segment entry is segment name, duration in seconds, and list of target values. Each target value consists of a pair of point into the segment (in seconds) and F0 value in Hz.

Phones ¶

This allows a simple specification of a list of phones. Synthesis specifies fixed durations (specified in FP_duration, default 100 ms) and monotone intonation (specified in FP_F0, default 120Hz). This may be used for simple checks for waveform synthesizers etc.

(Utterance Phones (# h @ l ou #))

Note the function SayPhones allows synthesis and playing of lists of phones through this utterance type.

Wave ¶

A waveform file. Synthesis here simply involves loading the file.

(Utterance Wave fred.wav)

Others are supported, as defined in lib/synthesis.scm but are used internally by various parts of the system. These include Tokens used in TTS and SegF0 used by utt.resynth.