Utterance I/O (Festival Speech Synthesis System)

14.7 Utterance I/O

A number of functions are available to allow an utterance’s structure to be made available for other programs.

The whole structure, all relations, items and features may be saved in an ascii format using the function utt.save. This file may be reloaded using the utt.load function. Note the waveform is not saved using the form.

Individual aspects of an utterance may be selectively saved. The waveform itself may be saved using the function utt.save.wave. This will save the waveform in the named file in the format specified in the Parameter Wavefiletype. All formats supported by the Edinburgh Speech Tools are valid including nist, esps, sun, riff, aiff, raw, alaw and ulaw. Note the functions utt.wave.rescale and utt.wave.resample may be used to change the gain and sample frequency of the waveform before saving it. A waveform may be imported into an existing utterance with the function utt.import.wave. This is specifically designed to allow external methods of waveform synthesis. However if you just wish to play an external wave or make it into an utterance you should consider the utterance Wave type.

The segments of an utterance may be saved in a file using the function utt.save.segs which saves the segments of the named utterance in xlabel format. Any other stream may also be saved using the more general utt.save.relation which takes the additional argument of a relation name. The names of each item and the end feature of each item are saved in the named file, again in Xlabel format, other features are saved in extra fields. For more elaborated saving methods you can easily write a Scheme function to save data in an utterance in whatever format is required. See the file lib/mbrola.scm for an example.

A simple function to allow the displaying of an utterance in Entropic’s Xwaves tool is provided by the function display. It simply saves the waveform and the segments and sends appropriate commands to (the already running) Xwaves and xlabel programs.

A function to synthesize an externally specified utterance is provided for by utt.resynth which takes two filename arguments, an xlabel segment file and an F0 file. This function loads, synthesizes and plays an utterance synthesized from these files. The loading is provided by the underlying function utt.load.segf0.