Topics

increasing naturalness of vowel synthesis

Jyrki Tuomainen
 

Hello list.
I've created a source signal from scratch using PitchTier and PointProcess for vowel synthesis.

"To Sound (phonation)" command provides a couple of parameters to manipulate the naturalness of the glottal source, and I've had some success, but would still like to increase the naturalness. My problem is that I still get this attack like onset, and the vowel sound monotonous and the voice is "tight". I think I've got the pitch contour alright, but does anyone have suggestions for what other parameters I could try (e.g. how to add some jitter), and/or what would be good values for those parameters that are available in the To Sound (phonation) command?


Thanks, and best wishes, -Jyrki

--
New temporary address from July 2006 to Autumn 2007

Jyrki Tuomainen, University College London, Human Communication Science
Remax House, 31/32 Alfred Place, London WC1E 7DP, UK
Tel: +44 (0)20 7679 4214 (Internal 24214), Fax: +44 (0)207 679 4238
j.tuomainen@..., http://www.hcs.ucl.ac.uk/
--

Boersma Paul
 

At 11:21 +0000 11-12-06, Jyrki Tuomainen wrote:
I still get this attack-like onset
did you already try lowering the "adaptation factor" to 0.5 or so? If not, then try; if you did, then a multiplication with an IntensityTier may help.

the vowel sounds monotonous
this should depend on not much more than the PitchTier that you used to create the voice pulses. You could include a short rise at the beginning. Other suggestions are welcome.

and the voice is "tight".
a spectral issue. Make sure that you apply at least 10 formants. I assume that you did that all right, so now I'm waiting for suggestions.

how to add some jitter
Select the PointProcess, choose "To Matrix", then "Formula":
self + randomGauss (0, 0.0001)
then "To PointProcess". Perhaps this is a good moment for me to make a Formula command for PointProcesses.

what would be good values for
those parameters that are available in the To Sound (phonation) command?
Powers of 2 and 3 (rather than 3 and 4) would yield something close to Rosenberg's glottal source model, which was also used in the Klatt synthesizer. But 3 and 4 have been chosen as defaults because they seem to approximate better the known relations between human glottal parameters.

Any better wisdom on this would indeed be very welcome.
--

Paul Boersma
Phonetic Sciences, University of Amsterdam
Spuistraat 210, room 303
1012VT Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/
phone +31-20-5252385