Next: Singing Synthesis, Up: Examples [Contents][Index]
This example shows how we can use part of the standard synthesis process to tokenize and tag a file of text. This section does not cover training and setting up a part of speech tag set See POS tagging, only how to go about using the standard POS tagger on text.
This example also shows how to use Festival as a simple scripting language, and how to modify various methods used during text to speech.
The file examples/text2pos contains an executable shell script which will read arbitrary ascii text from standard input and produce words and their part of speech (one per line) on standard output.
A Festival script, like any other UNIX script, it must start with the
the characters #!
followed by the name of the festival
executable. For scripts the option -script
is also
required. Thus our first line looks like
#!/usr/local/bin/festival -script
Note that the pathname may need to be different on your system
Following this we have copious comments, to keep our lawyers happy, before we get into the real script.
The basic idea we use is that the tts process segments text into
utterances, those utterances are then passed to a list of functions, as
defined by the Scheme variable tts_hooks
. Normally this variable
contains a list of two function, utt.synth
and utt.play
which
will synthesize and play the resulting waveform. In this case, instead,
we wish to predict the part of speech value, and then print it out.
The first function we define basically replaces the normal synthesis
function utt.synth
. It runs the standard festival utterance
modules used in the synthesis process, up to the point where POS is
predicted. This function looks like
(define (find-pos utt) "Main function for processing TTS utterances. Predicts POS and prints words with their POS" (Token utt) (POS utt) )
The normal text-to-speech process first tokenizes the text splitting it
in to “sentences”. The utterance type of these is Token
. Then
we call the Token
utterance module, which converts the tokens to
a stream of words. Then we call the POS
module to predict part
of speech tags for each word. Normally we would call other modules
ultimately generating a waveform but in this case we need no further
processing.
The second function we define is one that will print out the words and parts of speech
(define (output-pos utt) "Output the word/pos for each word in utt" (mapcar (lambda (pair) (format t "%l/%l\n" (car pair) (car (cdr pair)))) (utt.features utt 'Word '(name pos))))
This uses the utt.features
function to extract features from the
items in a named stream of an utterance. In this case we want the
name
and pos
features for each item in the Word
stream. Then for each pair we print out the word’s name, a slash and its
part of speech followed by a newline.
Our next job is to redefine the functions to be called
during text to speech. The variable tts_hooks
is defined
in lib/tts.scm. Here we set it to our two newly-defined
functions
(set! tts_hooks (list find-pos output-pos))
So that garbage collection messages do not appear on the screen we stop the message from being outputted by the following command
(gc-status nil)
The final stage is to start the tts process running on standard input. Because we have redefined what functions are to be run on the utterances, it will no longer generate speech but just predict part of speech and print it to standard output.
(tts_file "-")
Next: Singing Synthesis, Up: Examples [Contents][Index]