Topics

[espeak-ng:master] reported: Misrendering of the word "wherever" #github


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By RhyanJohnson:

I'm having a bizarre issue with the rendering of the word "wherever" as well!

I discovered the issue when I tried decoding some espeak-ng results and found a couple of unexpected, non-English characters in my results (namely ɕ in the transcription 'ɛkskləɕeɪʃən' and ʁ in the transcription 'æɹəʁɪklɛɾɚsɪksɛfɛf') Upon further investigation, the unexpected characters always seemed to follow the word 'wherever', so I compiled samples of text in my dataset that include the word 'wherever' into a sample text file.

INPUT (sample)

I lay eggs wherever I happen to be, said the hen, ruffling her feathers and then shaking them into place.
and Trot fed it a handful of fresh blue clover and smoothed and petted it until the lamb was eager to follow her wherever she might go.
The habits of mind that characterize a person strongly disposed toward critical thinking include a desire to follow reason and evidence wherever they may lead,
But wherever they fought - in North Africa or the South Pacific or Western Europe -- the infantry bore the brunt of the fighting on the ground -- and seven out of ten suffered casualties.
scurrying for major stories whenever and wherever they could be found.

When running the espeak command, I discovered all sorts of various unexpected phonemes after the word 'wherever'. The results were that these unexpected characters were consistently after the word 'wherever' and were consistently found in the transcription of each line of text containing 'wherever'.

COMMAND:

espeak-ng -ven-us --ipa=3 -q -f sample --sep=_

EXPECTED OUTPUT: (Received ~20% of the time)

a_ɪ lˈe_ɪ ˈɛɡz wɛɹˈɛvɚɹ a_ɪ hˈæpən tə bˈiː
 sˈɛd ðə hˈɛn
 ɹˈʌflɪŋ hɜː fˈɛðɚz ænd ðˈɛn ʃˈe_ɪkɪŋ ðˌɛm ˌɪntʊ plˈe_ɪs
 ænd tɹˈɑːt fˈɛd ɪɾ ɐ hˈændfə_l ʌv fɹˈɛʃ blˈuː klˈo_ʊvɚ ænd smˈuːðd ænd pˈɛɾᵻd ɪɾ ʌntˈɪl ðə lˈæm wʌz ˈiːɡɚ tə fˈɑːlo_ʊ hɜː wɛɹˈɛvɚ ʃiː mˌa_ɪt ɡˈo_ʊ
 ðə hˈæbɪts ʌv mˈa_ɪnd ðæt kˈæɹɪktɚɹˌa_ɪz ɐ pˈɜːsən stɹˈɔŋli dɪspˈo_ʊzd təwˈɔː_ɹd kɹˈɪɾɪkə_l θˈɪŋkɪŋ ɪŋklˈuːd ɐ dɪzˈa_ɪ_ɚ tə fˈɑːlo_ʊ ɹˈiːzən ænd ˈɛvɪdəns wɛɹˈɛvɚ ðe_ɪ mˈe_ɪ lˈiːd
 bˌʌt wɛɹˈɛvɚ ðe_ɪ fˈɔːt ɪn nˈɔː_ɹθ ˈæfɹɪkə ɔː_ɹ ðə sˈa_ʊθ pɐsˈɪfɪk ɔː_ɹ wˈɛstɚn jˈʊ_ɹɹəp ðɪ ˈɪnfəntɹi bˈoː_ɹ ðə bɹˈʌnt ʌvðə fˈa_ɪɾɪŋ ɔnðə ɡɹˈa_ʊnd ænd sˈɛvən ˌa_ʊɾəv tˈɛn sˈʌfɚd kˈæʒuːə_lɾiz
 skˈɜːɹiɪŋ fɔː_ɹ mˈe_ɪd_ʒɚ stˈoːɹiz wɛnˌɛvɚɹ ænd wɛɹˈɛvɚ ðe_ɪ kʊd biː fˈa_ʊnd

I will attach a file containing many samples of actual output as there were many varieties of corrupted/garbage phonemes. But a couple examples included below.

ACTUAL OUTPUT:

a_ɪ lˈe_ɪ ˈɛɡz wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf a_ɪ hˈæpən tə bˈiː
 sˈɛd ðə hˈɛn
 ɹˈʌflɪŋ hɜː fˈɛðɚz ænd ðˈɛn ʃˈe_ɪkɪŋ ðˌɛm ˌɪntʊ plˈe_ɪs
 ænd tɹˈɑːt fˈɛd ɪɾ ɐ hˈændfə_l ʌv fɹˈɛʃ blˈuː klˈo_ʊvɚ ænd smˈuːðd ænd pˈɛɾᵻd ɪɾ ʌntˈɪl ðə lˈæm wʌz ˈiːɡɚ tə fˈɑːlo_ʊ hɜː wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf ʃiː mˌa_ɪt ɡˈo_ʊ
 ðə hˈæbɪts ʌv mˈa_ɪnd ðæt kˈæɹɪktɚɹˌa_ɪz ɐ pˈɜːsən stɹˈɔŋli dɪspˈo_ʊzd təwˈɔː_ɹd kɹˈɪɾɪkə_l θˈɪŋkɪŋ ɪŋklˈuːd ɐ dɪzˈa_ɪ_ɚ tə fˈɑːlo_ʊ ɹˈiːzən ænd ˈɛvɪdəns wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf ðe_ɪ mˈe_ɪ lˈiːd
 bˌʌt wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf ðe_ɪ fˈɔːt ɪn nˈɔː_ɹθ ˈæfɹɪkə ɔː_ɹ ðə sˈa_ʊθ pɐsˈɪfɪk ɔː_ɹ wˈɛstɚn jˈʊ_ɹɹəp ðɪ ˈɪnfəntɹi bˈoː_ɹ ðə bɹˈʌnt ʌvðə fˈa_ɪɾɪŋ ɔnðə ɡɹˈa_ʊnd ænd sˈɛvən ˌa_ʊɾəv tˈɛn sˈʌfɚd kˈæʒuːə_lɾiz
 skˈɜːɹiɪŋ fɔː_ɹ mˈe_ɪd_ʒɚ stˈoːɹiz wɛnˌɛvɚɹ ænd wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf ðe_ɪ kʊd biː fˈa_ʊnd
 a_ɪ lˈe_ɪ ˈɛɡz wɛɹˈɛvɚɹ ˈæɹəʁɪklˌɛɾɚsˈɪkssˈɛvənˈɛf tˈuː slˈæʃ a_ɪ hˈæpən tə bˈiː
 sˈɛd ðə hˈɛn
 ɹˈʌflɪŋ hɜː fˈɛðɚz ænd ðˈɛn ʃˈe_ɪkɪŋ ðˌɛm ˌɪntʊ plˈe_ɪs
 ænd tɹˈɑːt fˈɛd ɪɾ ɐ hˈændfə_l ʌv fɹˈɛʃ blˈuː klˈo_ʊvɚ ænd smˈuːðd ænd pˈɛɾᵻd ɪɾ ʌntˈɪl ðə lˈæm wʌz ˈiːɡɚ tə fˈɑːlo_ʊ hɜː wɛɹˈɛvɚɹ ˈæɹəʁɪklˌɛɾɚsˈɪkssˈɛvənˈɛf tˈuː slˈæʃ ʃiː mˌa_ɪt ɡˈo_ʊ
 ðə hˈæbɪts ʌv mˈa_ɪnd ðæt kˈæɹɪktɚɹˌa_ɪz ɐ pˈɜːsən stɹˈɔŋli dɪspˈo_ʊzd təwˈɔː_ɹd kɹˈɪɾɪkə_l θˈɪŋkɪŋ ɪŋklˈuːd ɐ dɪzˈa_ɪ_ɚ tə fˈɑːlo_ʊ ɹˈiːzən ænd ˈɛvɪdəns wɛɹˈɛvɚɹ ˈæɹəʁɪklˌɛɾɚsˈɪkssˈɛvənˈɛf tˈuː slˈæʃ ðe_ɪ mˈe_ɪ lˈiːd
 bˌʌt wɛɹˈɛvɚɹ ˈæɹəʁɪklˌɛɾɚsˈɪkssˈɛvənˈɛf tˈuː slˈæʃ ðe_ɪ fˈɔːt ɪn nˈɔː_ɹθ ˈæfɹɪkə ɔː_ɹ ðə sˈa_ʊθ pɐsˈɪfɪk ɔː_ɹ wˈɛstɚn jˈʊ_ɹɹəp ðɪ ˈɪnfəntɹi bˈoː_ɹ ðə bɹˈʌnt ʌvðə fˈa_ɪɾɪŋ ɔnðə ɡɹˈa_ʊnd ænd sˈɛvən ˌa_ʊɾəv tˈɛn sˈʌfɚd kˈæʒuːə_lɾiz
 a_ɪ lˈe_ɪ ˈɛɡz wɛɹˈɛvɚ ˈæt kˈo_ʊləŋkˌa_ɪ kˈo_ʊləŋki hˌæʃslɐʃwˈʌnsˈɛvəntˈuːwˈʌnwˈʌnsˈɛvəntˈuːwˈʌntɪldə twˈɛnti wˈʌn twˈɛnti wˈʌn tˈuː tˈuː slˌæʃko_ʊləntˈuːslɐʃtˌɪldəsˈɛvənslˌæʃko_ʊləntˈuːlpɚsˈɛntko_ʊl a_ɪ hˈæpən tə bˈiː
 sˈɛd ðə hˈɛn
 ɹˈʌflɪŋ hɜː fˈɛðɚz ænd ðˈɛn ʃˈe_ɪkɪŋ ðˌɛm ˌɪntʊ plˈe_ɪs
 ænd tɹˈɑːt fˈɛd ɪɾ ɐ hˈændfə_l ʌv fɹˈɛʃ blˈuː klˈo_ʊvɚ ænd smˈuːðd ænd pˈɛɾᵻd ɪɾ ʌntˈɪl ðə lˈæm wʌz ˈiːɡɚ tə fˈɑːlo_ʊ hɜː wɛɹˈɛvɚ ˈæt kˈo_ʊləŋkˌa_ɪ kˈo_ʊləŋki hˌæʃslɐʃwˈʌnsˈɛvəntˈuːwˈʌnwˈʌnsˈɛvəntˈuːwˈʌntɪldə twˈɛnti wˈʌn twˈɛnti wˈʌn tˈuː tˈuː slˌæʃko_ʊləntˈuːslɐʃtˌɪldəsˈɛvənslˌæʃko_ʊləntˈuːlpɚsˈɛntko_ʊl ʃiː mˌa_ɪt ɡˈo_ʊ
 ðə hˈæbɪts ʌv mˈa_ɪnd ðæt kˈæɹɪktɚɹˌa_ɪz ɐ pˈɜːsən stɹˈɔŋli dɪspˈo_ʊzd təwˈɔː_ɹd kɹˈɪɾɪkə_l θˈɪŋkɪŋ ɪŋklˈuːd ɐ dɪzˈa_ɪ_ɚ tə fˈɑːlo_ʊ ɹˈiːzən ænd ˈɛvɪdəns wɛɹˈɛvɚ ˈæt kˈo_ʊləŋkˌa_ɪ kˈo_ʊləŋki hˌæʃslɐʃwˈʌnsˈɛvəntˈuːwˈʌnwˈʌnsˈɛvəntˈuːwˈʌntɪldə twˈɛnti wˈʌn twˈɛnti wˈʌn tˈuː tˈuː slˌæʃko_ʊləntˈuːslɐʃtˌɪldəsˈɛvənslˌæʃko_ʊləntˈuːlpɚsˈɛntko_ʊl ðe_ɪ mˈe_ɪ lˈiːd
 bˌʌt wɛɹˈɛvɚ ˈæt kˈo_ʊləŋkˌa_ɪ kˈo_ʊləŋki hˌæʃslɐʃwˈʌnsˈɛvəntˈuːwˈʌnwˈʌnsˈɛvəntˈuːwˈʌntɪldə twˈɛnti wˈʌn twˈɛnti wˈʌn tˈuː tˈuː slˌæʃko_ʊləntˈuːslɐʃtˌɪldəsˈɛvənslˌæʃko_ʊləntˈuːlpɚsˈɛntko_ʊl ðe_ɪ fˈɔːt ɪn nˈɔː_ɹθ ˈæfɹɪkə ɔː_ɹ ðə sˈa_ʊθ pɐsˈɪfɪk ɔː_ɹ wˈɛstɚn jˈʊ_ɹɹəp ðɪ ˈɪnfəntɹi bˈoː_ɹ ðə bɹˈʌnt ʌvðə fˈa_ɪɾɪŋ ɔnðə ɡɹˈa_ʊnd ænd sˈɛvən ˌa_ʊɾəv tˈɛn sˈʌfɚd kˈæʒuːə_lɾiz
 skˈɜːɹiɪŋ fɔː_ɹ mˈe_ɪd_ʒɚ stˈoːɹiz wɛnˌɛvɚɹ ænd wɛɹˈɛvɚ ˈæt kˈo_ʊləŋkˌa_ɪ kˈo_ʊləŋki hˌæʃslɐʃwˈʌnsˈɛvəntˈuːwˈʌnwˈʌnsˈɛvəntˈuːwˈʌntɪldə twˈɛnti wˈʌn twˈɛnti wˈʌn tˈuː tˈuː slˌæʃko_ʊləntˈuːslɐʃtˌɪldəsˈɛvənslˌæʃko_ʊləntˈuːlpɚsˈɛntko_ʊl ðe_ɪ kʊd biː fˈa_ʊnd

Thank you!

espeak_wherever_samples.txt


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By valdisvi:

For word wherever there are strange rules in en_list file.

whereever   $2
wherever    $text whereever

My suggestion is to replace it with simple (as without rule at all it spells out incorrectly):

wherever    we@r'Ev3


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By rhdunn:

That change was made in 1.48.08 (992b4cf06884e84ad3b3ee29779aed6e8dbb773b), identified via git's blame functionality. Looking at the diff for that commit, the relevant part is the change from:

wherever     w%e@r-Ev3
whereever    w%e@r-Ev3

to:

whereever  $2
wherever  $text whereever

Note that according to wiktionary, whereever is an obsolete variant of "wherever".

These rules look correct to me -- stress "where|ever" on the second syllable; use "where|ever" when pronouncing "wherever" (so the spelling rules can be used, which may include some regional differences, although I haven't checked that is the case).


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By rhdunn:

The key issue here is why "wherever I":

wɛɹˈɛvɚɹ a_ɪ

is sometimes transformed into one of the following:

wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf a_ɪ
wɛɹˈɛvɚɹ ˈæɹəʁɪklˌɛɾɚsˈɪkssˈɛvənˈɛf tˈuː slˈæʃ a_ɪ
wɛɹˈɛvɚ ˈæt kˈo_ʊləŋkˌa_ɪ

This is most likely due to the use of the $text attribute, and could therefore affect other words. I also wonder if it is related to #690 which occurred when a rule was missing.

The first thing to do would be to check if this occurs on espeak 1.48.08, and if not run a bisect to identify which commit caused this bug. Alternatively, the behaviour needs to be understood -- what is espeak-ng doing when it is inserting those pronunciations.

Ideally, a short reproducible test case should be added to the tests to ensure that the bug is not reintroduced.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By rhdunn:

I've run this 1000 times on my Debian Linux system and not seen any differences:

seq 1 1000 | while read N ; do src/espeak-ng -ven-us -xq -f ../wherever-input.txt > ../wherever-output-2.txt ; diff ../wherever-output.txt ../wherever-output-2.txt ; done

This may be a compiler dependent bug (i.e. using msvc on Windows), or a bug in that version that was fixed.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By RhyanJohnson:

The key issue here is why "wherever I":

I want to be clear about my test input. My test input was 5 lines. Each line contained the word 'wherever' (emphasis added):

 1. I lay eggs wherever I happen to be, said the hen, ruffling her feathers and then shaking them into place.

 2. and Trot fed it a handful of fresh blue clover and smoothed and petted it until the lamb was eager to follow her wherever she might go.

 3. The habits of mind that characterize a person strongly disposed toward critical thinking include a desire to follow reason and evidence wherever they may lead,

 4. But wherever they fought - in North Africa or the South Pacific or Western Europe -- the infantry bore the brunt of the fighting on the ground -- and seven out of ten suffered casualties.

 5. scurrying for major stories whenever and wherever they could be found.

And that the unexpected transformations were consistent for the entire run. So if the command espeak-ng -ven-us --ipa=3 -q -f sample --sep=_ transformed 'wherever' as wɛɹˈɛvɚ sᵻɹˈɪlɪklˌɛɾɚfˈoː_ɹbˈiːˈɛf it did so for every occurrence of the word 'wherever' on each line of the sample input.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By rhdunn:

Thanks for your help trying to track down the issue.

Can you try the test with -X instead of --sep=_? -- That should provide more information about what is happening. Just one of the failure cases/runs should be enough.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By RhyanJohnson:

Output from running espeak-ng -ven-us --ipa=3 -q -f sample -X > sample_output.txt

sample_output_2.txt sample_output_6.txt sample_output_7.txt sample_output.txt


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By rhdunn:

Thanks. The relevant bit is:

Replace: wherever  whereever 
Flags: whereever $2
Unpronouncable? 'whereever'
 46	_) wh (Y [w]
Translate 'whereever'
 1	w    [w]
 22	wh    [w]
 46	_) wh (Y [w]
 85	where  [we@]

 99	ever (_ [Ev3]
 1	e    [E]

Translate '='
Found: '=' [_:i:kw@Lz_:]  $max3

Translate 'ˆ1:vY	'
Found: '_1' [w'02n]
 22	:    [koUl@n]

 1	v    [v]

Translate ''

So espeak-ng is processing additional content after it has translated whereever. -- This doesn't look like #690 as this is not failing to translate a given string.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By RhyanJohnson:

Right. It's somehow seeing the miscellaneous characters such as =^? after 'wherever' and then attempts to transform them? So we get wɛɹˈɛvɚɹ ˈikwəlz or something similar... Is that right?


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By rhdunn:

That's correct. It is related to the $text attribute which instructs espeak to translate that string (with two 'e's, so "where" and "ever" are matched in the pronunciation rules). Somehow, it is not finishing at the end of the word, but continuing for a bit longer.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #437 Misrendering of the word "wherever"
By RhyanJohnson:

Just confirming on my end:

Running your command with espeak 1.48.03

seq 1 1000 | while read N ; do src/espeak -ven-us -xq -f ../wherever-input.txt > ../wherever-output-2.txt ; diff ../wherever-output.txt ../wherever-output-2.txt ; done

resulted in consistent, clean output on the machine that was giving issue using espeak-ng 1.49.2

Thanks!