Topics

Same problem here

wenzel_lisa@...
 

Dear all,
I´ve got pretty much the same problem as Milena:

For a research project, we want to use the F0-range and the F0-mean as a measure of emotional arousal.
Per person, there are multiple voice recordings of which each takes at least 3 minutes of talking. We also have to cut out some parts in between and will be cutting out detected mistakes of the algorithm by hand by comparing the pitch line with the voice of the speaker and with the spectrogram.

As the calculation of the pitch is dependent on the graphical display, only by zooming into a segment of speech, the calculated mean/max/min for that segment differs greatly (p.e. 20Hz difference for the mean, 100Hz difference for the max).

As we need the mean/max/min for the whole 3 minutes, I am now insecure if the values we get by selecting the whole 3-minutes sound segment and choosing "Get maximum pitch" etc are still meaningful or rather random? I don´t know how the algorithm works when you zoom out, p.e. if all data is being considered by summarizing them into mean values or if the algorithm chooses every 10th data point and doesn´t consider the rest of the data that is being taken into account when zoomed in?

Also, I read above that it would be helpful to extract a pitch object, but for cutting out mistakes of the algorithm, i need to be able to see the spectrogram behind it. Thats not possibe, is it?

Thank you very much for reading this and also for your answers given above - this page is already very helpful.

Best wishes!

Lisa

Boersma Paul
 

We are not talking about a "problem" with pitch measurement. As noted often on these pages, as well as in the Intro and the FAQ in the manual, measurements depend on where you take them, and they are all equally good. Please read the Intro and the FAQ before speculating further about how and where pitch is measured.

The problem is in what you define as a "range". Should any local hiccup matter? For measuring variability, it's much better to take the 10% and 90% quantiles, for instance, than the minimum and maximum. You will notice that those quantiles are much more robust against shifting the precise measurement times.

And I advise against deriving large numbers of measurements from the Sound window. That's poorly reproducible, not because of the zooming (which shouldn't matter much) but because you have little reproducibility of the measurement settings. Try automating this with the help of a TextGrid; see the scripting examples.

As the calculation of the pitch is dependent on the graphical display, only by zooming into a segment of speech, the calculated mean/max/min for that segment differs greatly (p.e. 20Hz difference for the mean, 100Hz difference for the max).
yes, zooming in to a small segment causes all the invisible stuff to the left and right of the visible window to be ignored. With the TextGrid scripting example, one global pitch curve is computed, and means (and quantiles) can be computed locally from that curve.
_____

Paul Boersma
Professor of Phonetic Sciences, University of Amsterdam

Visiting address: Spuistraat 134, room 632, Amsterdam
Mail: P.O. Box 1642, 1000BP Amsterdam, The Netherlands
Website: http://www.fon.hum.uva.nl/paul/