High Level Parameter Speech Synthesis System - $500
- Uses 13 higher-level parameters to produce more
natural-sounding speech
- Uses your Windows-compatible audio card no other hardware
necessary
- Runs under Windows®
2000/XP/Vista
- Developed and used every day by speech researchers at Sensimetrics
- Volume discounts are
available
Speech Synthesis for Research
HLsyn
is a parametric synthesizer designed and produced by Sensimetrics
Corporation. Since its original release in 1995, numerous improvements have been made, and
all are now incorporated in the present version 2.2. The most important advance has been
the addition of three new control parameters for aerodynamic and source control.
Improvements have also been made in the algorithms employed and in user documentation,
which is extensive.
Background
HLsyn
may reasonably be termed a
quasi-articulatory synthesizer. The design
of the HLsyn synthesizer is based on the observation that the values of the
40-odd parameters used to control
SenSyn and similar Klatt-type synthesizers are
not independent, but are subject to inter-parameter constraints. The constraints arise
because speech production, as a physical process, permits only certain combinations of
synthesis parameters to arise, and also limits the rates at which the parameter values can
change with time. To express these constraints, a small set of 10 high-level (HL)
parameters was originally proposed. This set has now been expanded to include 13 HL
parameters, providing improved aerodynamic and source control. The HL parameters are more
closely related to the actual states and articulatory movements in the vocal tract than
are lower-level (Klatt) parameters. The principle employed in HLsyn
is simple: a set of
mapping relations within HLsyn transforms the HL parameters into the values of
the corresponding lower-level parameters that control
SenSyn.
The HL
parameters
There are thirteen HL parameters. These include five
constriction areas within the vocal tract:
an, the cross-sectional area of the opening to the nasal cavity;
ag,
the average area of the glottal opening bounded by the membranous portion of the glottis;
ap
the area of the glottal opening bounded by the cartilaginous portion of the glottis;
al,
the cross-sectional area of a constriction formed by the lips; and ab,
the cross-sectional area of a constriction formed by the tongue blade.
Parameters al and ab come into play only during the
production of consonants.
Another HL parameter is ue, the active rate of change of vocal-tract
volume during obstruent consonants. This parameter can act to expand the vocal-tract
volume in order to facilitate glottal vibration during obstruent consonants, or to prevent
the expansion of the vocal-tract volume in order to inhibit glottal vibration. The
dc
parameter controls the compliance of the vocal tract walls and the vocal folds. This
parameter can simulate compliance changes, such as those observed during unvoiced
consonants. Subglottal pressure, ps, is a parameter that can be employed
to control source amplitudes, important for emphasis in running speech.
Five of the HL parameters are similar (but not necessarily identical) to lower-level
(Klatt) parameters. These are the fundamental frequency f0 and the four
formant frequencies (f1, f2, f3, and f4).
The latter specify the natural frequencies of the vocal tract, without accounting for
acoustic coupling of the vocal tract to the trachea or nasal cavity, provided that there
is no localized constriction formed by the tongue blade or the lips.
The time-varying HL formant-frequency parameters, in effect, specify how the shape of the
vocal tract changes with time, independent of any nasal or tracheal coupling or local
constriction. If there is nasal or tracheal coupling or a local constriction (as specified
by the parameters an, ag, ap, al,
and ab), then the mapping relations automatically modify the HL formant
parameters to yield the lower-level formant parameters used to control the synthesizer.
The HL parameters f1, f2, f3, and f4
actually describe the aspects of vocal-tract shape that are determined by tongue-body
position, jaw position, pharyngeal shape, and possible lip rounding.
HLsyn includes a facility for manually entering time-varying parameters, for displaying
the parameters in graphical or numerical form, for playing back the results of the
synthesis and for displaying the synthesized sound pattern in the form of a spectrogram.
Speaker-dependent constraints can be specified by the user to enable the synthesizer to
simulate different female and male voices.
Current research at Sensimetrics includes the formulation of rules for generating the
time-varying HL parameters from a phonetic string, so that HLsyn
can be incorporated into
a complete text-to-speech system.
HLsyn Synthesis Examples
The following are examples of copy synthesis
made with HLsyn. "The Raven" has been fine tuned by adjusting
HLsyn's
lower-level Klatt parameters.
The Raven - .wav 164KB
The Raven - .aif 164KB
"She gave the
kitten..." - .wav 33KB
"She gave the kitten..." - .aif 33KB
"Five women..." - .wav 31KB
"Five women..." - .aif 31KB
|