Completely different formants and distances F0, F1,F2, same voice in two distinct recordings
epoqe
Hi all,
for the purpose of a voice comparison, i am comparing two recordings of the same certain voice.
I'm having significant differences, and I can't figure out the cause.
These are the averages, for example, for Vowel "I" after taking at least 20 measurements
Rec 1 vowel:
F0 111
F1 320
F2 2141
Rec 2 vowel:
F0 101
F1 1891
F2 2511
These are the averages for the Vowel "O":
Rec 1 vowel:
F0 107
F1 542
F2 1001
Rec 2 vowel:
F0 122
F1 719
F2 2660
Since a useful comparison for voice comparison is the similarity between the distance F1 and F2 between the two voices, in this case, if I hadn't been sure that the two voices had belonged to the same person, I would have thought they were two distinct people .
What could these so different distances be due to?
I have already ascertained that it does not depend on the sampling frequency.
No big noise differences between the two files, and Noise also shouldn't affect distance so much.
Thanks for any advice
|
|
Damien Hall
Dear epoqe
It looks to me as if Praat may have not detected a formant here, in at least one, if not two cases.
For “I” (I presume this is /i/?)
Recording 1 has F1 320; Rec 2 has F1 1891, which is more towards the F3 range, if quite low. So could it be that Praat has not detected the real F2 here?
For “O” (is this /o/?)
Recording 1 has F2 1001; Rec 2 has F2 2660. Again, could the detected formant in Rec 2 actually be F3?
Just some quick thoughts.
Best wishes
-- Newcastle University (UK) (French) Linguistics Tel. +44 191 208 8521 Latest two publications | Linguistic reflections | Buchanan Esperanto « | Brit. Conf. of Undergrad Research
|
|
epoqe
Hi,
thank you for your feedback, actually I took at least 20 measures for /i/ and /o/ As I stated above those numbers are the medium of 20 measurement. F3 for /i/ , for example average was 2727 in rec 1 and 3214 in rec 2. I've tried also adding noise, reverb, and adjust the same high-cut filter for rec2, but still big differences between the 2 known voices, related to F1 and F2. Any other advice or test I can do? |
|
Boersma Paul
I have too little information to answer this. The second series seems to be unusually far off. How exactly did you measure the formants?
Also, if you send me the recordings (off the Praat List), I can have a look.
best wishes,
Paul
_____
Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632 1012VB Amsterdam, The Netherlands http://www.fon.hum.uva.nl/paul/ |
|
Bogdan Rozborski
W dniu 28.11.2022 o 08:02, epoqe pisze:
Hi "Epoqe". There are many reasons why you cannot expect two formant
measurements of the same vowel to be identical even for the same
speaker. It simply cannot happen, as each instance of a measured
vowel will ALWAYS be differently pronounced by the same speaker.
Besides that, one cannot successfully identify a speaker by means
of jus single phonetic feature such as formant frequency. And it
is something fundamentally wrong with your data. It does not
happen for a human to have F1 located at such a high frequency
spread, namely 320Hz and 1891 even if two different vowels are
measured. It looks like your measurement procedure catches value
of the second formant instead the first one. Best, Bogdan . |
|
epoqe
I will send 15 seconds of sample A/B if you write me by mail.
|
|
epoqe
SOLVED, with a Bug Reporting!
In formant setting, I have set number of formants 4, and somehow F1 calculation was missing. Set back to 5 number of formants, then F1 was correctly calculated. Same audio portion, different F1 if number of formants is different than 5 in formant setting. Thank you all for your feedback. With my best |
|
Bogdan Rozborski
W dniu 30.11.2022 o 10:12, epoqe pisze:
SOLVED, with a Bug Reporting! Hi "Epoqe". Solved by accident. First, are you sure you have exactly five
formants in your speech signal (I mean vowels where formants are
measurable)? As far as I see, you are searching for just three
first formants, so you can just set "Number of formants" to 3, and
do not forget to set "Formant ceiling" to 3000 Hz for male voice
or 3300 Hz for a female voice (this procedure is well described in
Praat's manual). Try that out. Best, Bogdan. |
|
epoqe
If I set number of formats 3, is calculating wrong F1 formants. F2 formants are the one in F1 instead.
check those 2 pictures... same segment with 3 formants calculation and then 7 formants calculation |
|
Boersma Paul
It is best to always ask for five formants, even if one is interested only in the first two. This is because F2 and F3 depend strongly on the shape of the oral and pharyngeal cavities, so that e.g. F3 will jump below and above the formant ceiling of the analysis
(e.g. 3000 Hz?). F5 is much more stable, and it is therefore more likely that the formant ceiling lies between F5 and F6, so that the 5 formants that are detected below the ceiling make sense.
(yes, for [u] the 5-formant ceiling still wants to be 600 Hz lower than for [i], but this effect is even larger if you choose a 3-formant ceiling)
_____
Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632 1012VB Amsterdam, The Netherlands http://www.fon.hum.uva.nl/paul/ |
|
epoqe
Well noted.
Thank you for the clear explanation. Make absolutely sense. Good lesson for future extractions. with my best |
|
Bogdan Rozborski
W dniu 30.11.2022 o 12:22, Boersma Paul
via groups.io pisze:
It is best to always ask for five formants, even if one is interested only in the first two. This is because F2 and F3 depend strongly on the shape of the oral and pharyngeal cavities, so that e.g. F3 will jump below and above the formant ceiling of the analysis (e.g. 3000 Hz?). F5 is much more stable, and it is therefore more likely that the formant ceiling lies between F5 and F6, so that the 5 formants that are detected below the ceiling make sense. Well, my "good practice" is to view the spectrogram of a given
segment (a vowel) before setting up the formant measurement. If I
see that F3 exceeds 3000 Hz I will increase formant ceiling
setting appropriately. Most often I work with degraded speech
where F3 is the most I can observe within the speech signal
analysed. Though it takes lot more time, I always segment my
recordings into segments, and then I do formant analysis manually
for each individual segment. Best, Bogdan. |
|
Boersma Paul
yes, that makes sense in that case.
_____
Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632 1012VB Amsterdam, The Netherlands http://www.fon.hum.uva.nl/paul/ |
|
Henning Reetz
There are two articles (a recent one and an older one) that deal with accuracy of formant measurements, which might be of interest:
toggle quoted message
Show quoted text
Whalen, D. H., Chen, W.-R., Shadle Ch. H., and Fulop S. A. (2022). “Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986),” The Journal of the Acoustical Society of America 152, 933–941. Monsen, R. B., and Engebretson, A. M. (1983). “The accuracy of formant frequency measurements: A comparison of spectrographic analysis and linear prediction,” Journal of Speech, Language, and Hearing Research 26, 89–97. Best wishes, Henning Reetz
|
|