Completely different formants and distances F0, F1,F2, same voice in two distinct recordings


epoqe
 

Hi all,
for the purpose of a voice comparison, i am comparing two recordings of the same certain voice.
I'm having significant differences, and I can't figure out the cause.
These are the averages, for example, for Vowel "I" after taking at least 20 measurements
 
Rec 1 vowel:
F0 111
F1 320
F2 2141
 
Rec 2 vowel:
F0 101
F1 1891
F2 2511
 
 
These are the averages for the Vowel "O":
 
Rec 1 vowel:
F0 107
F1 542
F2 1001
 
Rec 2 vowel:
F0 122
F1 719
F2 2660
 
Since a useful comparison for voice comparison is the similarity between the distance F1 and F2 between the two voices, in this case, if I hadn't been sure that the two voices had belonged to the same person, I would have thought they were two distinct people .
 
What could these so different distances be due to?
I have already ascertained that it does not depend on the sampling frequency.
No big noise differences between the two files, and Noise also shouldn't affect distance so much.
 
Thanks for any advice


Damien Hall
 

Dear epoqe

 

It looks to me as if Praat may have not detected a formant here, in at least one, if not two cases.

 

For “I” (I presume this is /i/?)

 

Recording 1 has F1 320; Rec 2 has F1 1891, which is more towards the F3 range, if quite low. So could it be that Praat has not detected the real F2 here?

 

For “O” (is this /o/?)

 

Recording 1 has F2 1001; Rec 2 has F2 2660. Again, could the detected formant in Rec 2 actually be F3?

 

Just some quick thoughts.

 

Best wishes


Damien Hall

 

--

Damien Hall

Newcastle University (UK)

(French) Linguistics

Tel. +44 191 208 8521

Latest two publications | Linguistic reflections | Buchanan Esperanto « | Brit. Conf. of Undergrad Research

 

 


epoqe
 

Hi,
thank you for your feedback,
actually I took at least 20 measures for /i/ and /o/  
As I stated above those numbers are the medium of 20 measurement.
F3 for /i/ , for example average was 2727 in rec 1 and 3214 in rec 2.

I've tried also adding noise, reverb, and adjust the same high-cut filter for rec2, but still big differences between the 2 known voices, related to F1 and F2.
Any other advice or test I can do?


Boersma Paul
 

I have too little information to answer this. The second series seems to be unusually far off. How exactly did you measure the formants?

Also, if you send me the recordings (off the Praat List), I can have a look.

best wishes,
Paul

On 28 Nov 2022, at 16:18, epoqe via groups.io <epoqe@...> wrote:

Hi,
thank you for your feedback,
actually I took at least 20 measures for /i/ and /o/  
As I stated above those numbers are the medium of 20 measurement.
F3 for /i/ , for example average was 2727 in rec 1 and 3214 in rec 2.

I've tried also adding noise, reverb, and adjust the same high-cut filter for rec2, but still big differences between the 2 known voices, related to F1 and F2.
Any other advice or test I can do?

_____

Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632
1012VB Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/


Bogdan Rozborski
 

W dniu 28.11.2022 o 08:02, epoqe pisze:
Hi all,
for the purpose of a voice comparison, i am comparing two recordings of the same certain voice.
I'm having significant differences, and I can't figure out the cause.
These are the averages, for example, for Vowel "I" after taking at least 20 measurements
 
Rec 1 vowel:
F0 111
F1 320
F2 2141
 
Rec 2 vowel:
F0 101
F1 1891
F2 2511
 
 
These are the averages for the Vowel "O":
 
Rec 1 vowel:
F0 107
F1 542
F2 1001
 
Rec 2 vowel:
F0 122
F1 719
F2 2660
 
Since a useful comparison for voice comparison is the similarity between the distance F1 and F2 between the two voices, in this case, if I hadn't been sure that the two voices had belonged to the same person, I would have thought they were two distinct people .
 
What could these so different distances be due to?
I have already ascertained that it does not depend on the sampling frequency.
No big noise differences between the two files, and Noise also shouldn't affect distance so much.
 
Thanks for any advice

Hi "Epoqe".

There are many reasons why you cannot expect two formant measurements of the same vowel to be identical even for the same speaker. It simply cannot happen, as each instance of a measured vowel will ALWAYS be differently pronounced by the same speaker. Besides that, one cannot successfully identify a speaker by means of jus single phonetic feature such as formant frequency. And it is something fundamentally wrong with your data. It does not happen for a human to have F1 located at such a high frequency spread, namely 320Hz and 1891 even if two different vowels are measured. It looks like your measurement procedure catches value of the second formant instead the first one.

Best, Bogdan .


epoqe
 

I will send 15 seconds of sample A/B if you write me by mail. 

Here some measurement just for vowel E

 


epoqe
 

SOLVED, with a Bug Reporting!

In formant setting, I have set number of formants 4, and somehow F1 calculation was missing.
Set back to 5 number of formants, then F1 was correctly calculated.
Same audio portion, different F1 if number of formants is different than 5 in formant setting.
Thank you all for your feedback.

With my best


Bogdan Rozborski
 

W dniu 30.11.2022 o 10:12, epoqe pisze:
SOLVED, with a Bug Reporting!

In formant setting, I have set number of formants 4, and somehow F1 calculation was missing.
Set back to 5 number of formants, then F1 was correctly calculated.
Same audio portion, different F1 if number of formants is different than 5 in formant setting.
Thank you all for your feedback.

With my best

Hi "Epoqe".

Solved by accident. First, are you sure you have exactly five formants in your speech signal (I mean vowels where formants are measurable)? As far as I see, you are searching for just three first formants, so you can just set "Number of formants" to 3, and do not forget to set "Formant ceiling" to 3000 Hz for male voice or 3300 Hz for a female voice (this procedure is well described in Praat's manual). Try that out.

Best, Bogdan.


epoqe
 

If I set number of formats 3, is calculating wrong F1 formants. F2 formants are the one in F1 instead.
check those 2 pictures... same segment with 3 formants calculation

and then 7 formants calculation


Boersma Paul
 

It is best to always ask for five formants, even if one is interested only in the first two. This is because F2 and F3 depend strongly on the shape of the oral and pharyngeal cavities, so that e.g. F3 will jump below and above the formant ceiling of the analysis (e.g. 3000 Hz?). F5 is much more stable, and it is therefore more likely that the formant ceiling lies between F5 and F6, so that the 5 formants that are detected below the ceiling make sense.

(yes, for [u] the 5-formant ceiling still wants to be 600 Hz lower than for [i], but this effect is even larger if you choose a 3-formant ceiling)

On 30 Nov 2022, at 11:34, Bogdan Rozborski via groups.io <b.rozborski@...> wrote:

W dniu 30.11.2022 o 10:12, epoqe pisze:
SOLVED, with a Bug Reporting!

In formant setting, I have set number of formants 4, and somehow F1 calculation was missing.
Set back to 5 number of formants, then F1 was correctly calculated.
Same audio portion, different F1 if number of formants is different than 5 in formant setting.
Thank you all for your feedback.

With my best

Hi "Epoqe".

Solved by accident. First, are you sure you have exactly five formants in your speech signal (I mean vowels where formants are measurable)? As far as I see, you are searching for just three first formants, so you can just set "Number of formants" to 3, and do not forget to set "Formant ceiling" to 3000 Hz for male voice or 3300 Hz for a female voice (this procedure is well described in Praat's manual). Try that out.

Best, Bogdan.


_____

Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632
1012VB Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/


epoqe
 

Well noted.
Thank you for the clear explanation. Make absolutely sense.

Good lesson for future extractions.

with my best


Bogdan Rozborski
 

W dniu 30.11.2022 o 12:22, Boersma Paul via groups.io pisze:
It is best to always ask for five formants, even if one is interested only in the first two. This is because F2 and F3 depend strongly on the shape of the oral and pharyngeal cavities, so that e.g. F3 will jump below and above the formant ceiling of the analysis (e.g. 3000 Hz?). F5 is much more stable, and it is therefore more likely that the formant ceiling lies between F5 and F6, so that the 5 formants that are detected below the ceiling make sense.

(yes, for [u] the 5-formant ceiling still wants to be 600 Hz lower than for [i], but this effect is even larger if you choose a 3-formant ceiling)

On 30 Nov 2022, at 11:34, Bogdan Rozborski via groups.io <b.rozborski@...> wrote:

W dniu 30.11.2022 o 10:12, epoqe pisze:
SOLVED, with a Bug Reporting!

In formant setting, I have set number of formants 4, and somehow F1 calculation was missing.
Set back to 5 number of formants, then F1 was correctly calculated.
Same audio portion, different F1 if number of formants is different than 5 in formant setting.
Thank you all for your feedback.

With my best

Hi "Epoqe".

Solved by accident. First, are you sure you have exactly five formants in your speech signal (I mean vowels where formants are measurable)? As far as I see, you are searching for just three first formants, so you can just set "Number of formants" to 3, and do not forget to set "Formant ceiling" to 3000 Hz for male voice or 3300 Hz for a female voice (this procedure is well described in Praat's manual). Try that out.

Best, Bogdan.


_____

Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632
1012VB Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/

Well, my "good practice" is to view the spectrogram of a given segment (a vowel) before setting up the formant measurement. If I see that F3 exceeds 3000 Hz I will increase formant ceiling setting appropriately. Most often I work with degraded speech where F3 is the most I can observe within the speech signal analysed. Though it takes lot more time, I always segment my recordings into segments, and then I do formant analysis manually for each individual segment.

Best, Bogdan.


Boersma Paul
 

yes, that makes sense in that case.

On 30 Nov 2022, at 12:46, Bogdan Rozborski via groups.io <b.rozborski@...> wrote:

W dniu 30.11.2022 o 12:22, Boersma Paul via groups.io pisze:
It is best to always ask for five formants, even if one is interested only in the first two. This is because F2 and F3 depend strongly on the shape of the oral and pharyngeal cavities, so that e.g. F3 will jump below and above the formant ceiling of the analysis (e.g. 3000 Hz?). F5 is much more stable, and it is therefore more likely that the formant ceiling lies between F5 and F6, so that the 5 formants that are detected below the ceiling make sense.

(yes, for [u] the 5-formant ceiling still wants to be 600 Hz lower than for [i], but this effect is even larger if you choose a 3-formant ceiling)

On 30 Nov 2022, at 11:34, Bogdan Rozborski via groups.io <b.rozborski@...> wrote:

W dniu 30.11.2022 o 10:12, epoqe pisze:
SOLVED, with a Bug Reporting!

In formant setting, I have set number of formants 4, and somehow F1 calculation was missing.
Set back to 5 number of formants, then F1 was correctly calculated.
Same audio portion, different F1 if number of formants is different than 5 in formant setting.
Thank you all for your feedback.

With my best

Hi "Epoqe".

Solved by accident. First, are you sure you have exactly five formants in your speech signal (I mean vowels where formants are measurable)? As far as I see, you are searching for just three first formants, so you can just set "Number of formants" to 3, and do not forget to set "Formant ceiling" to 3000 Hz for male voice or 3300 Hz for a female voice (this procedure is well described in Praat's manual). Try that out.

Best, Bogdan.


_____

Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632
1012VB Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/

Well, my "good practice" is to view the spectrogram of a given segment (a vowel) before setting up the formant measurement. If I see that F3 exceeds 3000 Hz I will increase formant ceiling setting appropriately. Most often I work with degraded speech where F3 is the most I can observe within the speech signal analysed. Though it takes lot more time, I always segment my recordings into segments, and then I do formant analysis manually for each individual segment.

Best, Bogdan.


_____

Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632
1012VB Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/


Henning Reetz
 

There are two articles (a recent one and an older one) that deal with accuracy of formant measurements, which might be of interest:

Whalen, D. H., Chen, W.-R., Shadle Ch. H., and Fulop S. A. (2022). Formants are easy to measure; resonances, not so much: Lessons from Klatt (1986),  The Journal of the Acoustical Society of America 152, 933–941.

Monsen, R. B., and Engebretson, A. M. (1983). “The accuracy of formant frequency measurements: A comparison of spectrographic analysis and linear prediction,” Journal of Speech, Language, and Hearing Research 26, 89–97.

Best wishes, Henning Reetz


On 30 Nov 2022, at 11:46, Bogdan Rozborski <b.rozborski@...> wrote:

W dniu 30.11.2022 o 12:22, Boersma Paul via groups.io pisze:
It is best to always ask for five formants, even if one is interested only in the first two. This is because F2 and F3 depend strongly on the shape of the oral and pharyngeal cavities, so that e.g. F3 will jump below and above the formant ceiling of the analysis (e.g. 3000 Hz?). F5 is much more stable, and it is therefore more likely that the formant ceiling lies between F5 and F6, so that the 5 formants that are detected below the ceiling make sense.

(yes, for [u] the 5-formant ceiling still wants to be 600 Hz lower than for [i], but this effect is even larger if you choose a 3-formant ceiling)

On 30 Nov 2022, at 11:34, Bogdan Rozborski via groups.io <b.rozborski@...> wrote:

W dniu 30.11.2022 o 10:12, epoqe pisze:
SOLVED, with a Bug Reporting!

In formant setting, I have set number of formants 4, and somehow F1 calculation was missing.
Set back to 5 number of formants, then F1 was correctly calculated.
Same audio portion, different F1 if number of formants is different than 5 in formant setting.
Thank you all for your feedback.

With my best

Hi "Epoqe".

Solved by accident. First, are you sure you have exactly five formants in your speech signal (I mean vowels where formants are measurable)? As far as I see, you are searching for just three first formants, so you can just set "Number of formants" to 3, and do not forget to set "Formant ceiling" to 3000 Hz for male voice or 3300 Hz for a female voice (this procedure is well described in Praat's manual). Try that out.

Best, Bogdan.


_____

Paul Boersma
Professor of Phonetic Sciences
University of Amsterdam
Spuistraat 134, room 632
1012VB Amsterdam, The Netherlands
http://www.fon.hum.uva.nl/paul/

Well, my "good practice" is to view the spectrogram of a given segment (a vowel) before setting up the formant measurement. If I see that F3 exceeds 3000 Hz I will increase formant ceiling setting appropriately. Most often I work with degraded speech where F3 is the most I can observe within the speech signal analysed. Though it takes lot more time, I always segment my recordings into segments, and then I do formant analysis manually for each individual segment.

Best, Bogdan.