Date   

[espeak-ng:master] reported: Capital letter indication missing nouns after prepositions in some languages #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #949 Capital letter indication missing nouns after prepositions in some languages
By jaacoppi:

I can look into this as well, but expect to wait a few weeks rather than a few days.

Notes: Looks like the $combine keyword in dictsource/sk_rules is the cause of this problem. There's also FLAG_COMBINE and LOPT_COMBINE_WORDS.

The code block starting around line 1436 of translate.c has calls to TranslateWord() that probaly causes changes in word flags.


[espeak-ng:master] reported: Capital letter indication missing nouns after prepositions in some languages #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #949 Capital letter indication missing nouns after prepositions in some languages
By pvagner:

Looking at translate.c There is a check of option_capitals on line 1573. However this only does its thing when first letter of a word is capital. The code block which starts at line 1436 in the same file combines prepositions with the main word for some languages e.g. czech, slovak, hungarian and perhaps some others. And doing this affect that later check. It seems there is some other place where capital letter indication in the middle of a word is handled but I can't find it. It's very unlikely I can fix this on my own.


[espeak-ng:master] reported: Capital letter indication missing nouns after prepositions in some languages #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #949 Capital letter indication missing nouns after prepositions in some languages
By jaacoppi:

Looks like a bug that needs changes in the source code. Do you want to try to fix the code by yourself?

The command line parameter -k is handled by int option_capitals. The relevant code is in either readclause.c or translate.c


[espeak-ng:master] new issue: Capital letter indication missing nouns after prepositions in some languages #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by pvagner:
#949 Capital letter indication missing nouns after prepositions in some languages

So far I am able to reproduce this with slovak or czech voices. When calling eSpeak to indicate capital letters with sound the sound is missing after prepositions such as 'na', 'od', 'pred' 'po', 'za' and similar. So when any of these prepositions stands before a person name, the first capital letter of that name is not indicated with sound. When capital letters are indicated with word capital, it is working as expected. See these examples: espeak -v cs -k 1"od Libora" espeak -v cs -k 1"bez Libora" espeak -v cs -k 1"na Libora" Change -k 1 to -k 2 to understand indication with word capital is working as it should. I suspect this might be because of some rules but I can't figure it out on my own so I'd appreciate any hints you can advice.


[espeak-ng:master] reported: espeak crashes with sum strings #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #824 espeak crashes with sum strings
By jbowler:

The OP reported five strings as causing a problem. Presumably this was on Windows because the attached file (strings.txt) was in DOS format. As for this moment I can repro the problem with three of the strings, numbers 1, 3 and 5. The repro does not depend on OS or compiler (as discussed elsewhere).


[espeak-ng:master] reported: Temporary fix for issue #945 #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Pull Request #948 Temporary fix for issue #945
By jbowler:

Thinking about it, because of its nature (I'm not even going to use the two word phrase, I'm sure crackers continuously scan github for it), your downstreams (your customers) are going to prefer a patch that can be applied in isolation to existing releases.

The circular buffer solution is good for this reason but even that may be too extensive. You could try replacing the abort/assert with a return but 'return' is not valid in an expression in C, neither is "goto". Hacking every single use of n_ph_list2 to be followed by a check seems the minimal safe alternative (ph_list2 has to have an extra entry for this to be safe):

if (n_ph_list2 > N_PHONEME_LIST) return FLAG_OVERFLOW;

I guess there are only three calls to the function, checking for FLAG_OVERFLOW might offer a potential recovery. It looks like TranslateClause updates the actual parsing state so that repeated calls to TranslateWord2 just reparse the same input, but I can't tell for sure; there might a good recovery there.


[espeak-ng:master] reported: Temporary fix for issue #945 #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Pull Request #948 Temporary fix for issue #945
By jbowler:

You don't have a choice; the fix is definitely temporary; it doesn't recover from the overwrite and, indeed, without adding I think 2 entries to the end of ph_list2 it does not actually prevent it. Once espeak-ng is in this state it can be exploited, a cracker can potentially take over the user's machine. The only recourse is to immediately exit the program. I.e. the problem is not that espeak-ng fails, it is that espeak-ng, responding to externally supplied data, allows the user's machine to be infected by malware.

#include then use assert(!"ph buffer overflow"), 0 (or , NULL to avoid the gcc warnings...)

The problem with the assert is that it calls, as a minimum, write(2) and, in fact, I think it may call fprintf(3); I think it is safe to do that in this case with modern GCC because the strings, while global, are read-only (so cannot be maliciously overwritten).


[espeak-ng:master] reported: Temporary fix for issue #945 #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Pull Request #948 Temporary fix for issue #945
By jbowler:

You don't have a choice; the fix is definitely temporary; it doesn't recover from the overwrite and, indeed, without adding I think 2 entries to the end of ph_list2 it does not actually prevent it. Once espeak-ng is in this state it can be exploited, a cracker can potentially take over the user's machine. The only recourse is to immediately exit the program.


[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

The fundamental problem is that the code in TranslateWord2 is not reliably checking for end-of-output-buffer. Unless you do that the only suggestion above that works is (4). Even then there has to be code that, somehow, checks for the need for a new allocation - even with (4) each time a phoneme is added to the end of the list some code has to check for a new allocation. So I'm agreeing with your last paragraph but I think (3) offers some possibilities that maybe avoid this.

Suggestion (3) is easy and safe; it doesn't modify any code and it reliably detects overflow but it has no recovery. In the test case and in my equation.txt a circular buffer would get overwritten in the absence of some form of checking. Nevertheless it's an ingenious suggestion that has the merit that it self-evidently avoids the overflow (so it removes the exploit).

If (2) can be done and you combine it with (3) that might work. I think something like the test case would still fail - in the test case the "words" are ridiculously long because of the paucity of space (ASCII 0x20) characters which are the word separators. That said (2) seems attractive because, why wait to produce output? The current code does seem to search backward from end-of-clause to handle stress, but is that really necessary? How many phoneme look-ahead is really required before any given phoneme is output. Arabic might be a problem; IRC it is necessary to see the whole word to know how to select the glyphs, I don't know if that applies to phonemes too.

I don't understand (1); that's an ASCII \055 hyphen yet "-1" is a number and there is no pause after the "-" in English, whereas "2-1" is three words and there are two pauses. "+1" is also a number, "2-+1" and so on are valid. The rules in other languages may be completely different and actually speaking equations, even simple linear arithmetic, is effectively another language.

I suspect you are fine with a simplistic number parser that just recognizes decimal fractions. If someone wants to work out how to pronounce an engineering or scientific format number, e.g. "1.23E6", or something more mathematical using base 10 superscripts "1.23×106" that strikes me as much bigger and separate thing. My equation.txt example shows that mathematical/computer usages such as the word "iff" (pronounced "if and only if" ;-) and ">number" or "<number" are not currently handled.

If espeak-ng was using a simple circular buffer of size 1024 (i.e. slightly larger than at present) it wouldn't suffer the overwrite and it wouldn't fail in any case where it doesn't at present. I think I could relatively easily submit a patch for this without adding a complete class-based implementation of PHONEME_LIST2. (2) requires changes I don't understand but sounds simple, potentially it would allow ph_list2 to be reduced to, say, 128 phonemes; antidisestablishmentarianism only has 12 phonemes ;-)


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #912 GPLv3 license violation
By valdisvi:

This site of Speak NG project is used for collaboration of eSpeak NG users and developers. It is not meant neither for court nor police. If mentioned software violates license, you can report it in https://gpl-violations.org/ or https://www.gnu.org/licenses/gpl-violation.html, or similar place. Also note, that the most eager ones, who look that somebody do something illegal, are competitors.


[espeak-ng:master] New Comment on Issue #912 GPLv3 license violation
By valdisvi:

This site of Speak NG project is used for collaboration of eSpeak NG users and developers. It is not meant neither for court nor police. If mentioned software violates license, you can report it in https://gpl-violations.org/ or https://www.gnu.org/licenses/gpl-violation.html, or similar place. Also note, that the most eager ones, who look that somebody is doing something illegal, are competitors.


[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By rhdunn:

The word buffer size and phoneme list being of comparable size (800 for the source size and 1000 for the phoneme list size) is predicated on the assumption that the number of phonemes will never be longer than the number of characters. This only really works if the word doesn't contain a lot of numbers or emoji that significantly increase the phoneme count compared to the word/character count.

Some things we could do: 1. Change the word splitting/detection algorithm to handle this case better -- e.g. when a - is followed by a number. 2. Flush the phonemes upto the previous word (so it is still affected by the next word) when it reaches a threshold (e.g. within 100 of the last phonemes) -- that is, set the word separator ph value to null (end of phonemes); call Generate(); copy the remaining phonemes (including the space) to the start of the phoneme list. 3. Make phoneme_list a circular buffer -- this would allow us to avoid the copying to shift the phonemes to the start of the list -- it would be worth doing a performance check on the result to see if this doesn't adversely affect performance. 4. Make phoneme_list a dynamic array (malloc/calloc, using realloc/reallocarray to resize, where on failure (e.g. ENOMEM) the original buffer should be unchanged and an espeak errno error should be returned from the API call). -- this should be conditional to support memory-constrained devices like riscos, and enabled when compiling for one of those platforms.

I would start by creating an API around the phoneme list -- e.g. phoneme_list_add, phoneme_list_current, phoneme_list_get, phoneme_list_next, etc. (which should be added based on how the phoneme_list is used) -- That way, we can add guards and/or change it to a circular/dynamic buffer more easily.


[espeak-ng:master] reported: Temporary fix for issue #945 #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Pull Request #948 Temporary fix for issue #945
By rhdunn:

Aborting is the wrong thing to do here, as it would cause applications like NVDA, Orca, or screen-dispatcher to stop working unexpectedly. That would a) make it hard to know what is going on, b) annoy users of those applications, and c) possibly prevent those users from using the device. Thus, this is just as worse as the crash.


[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

I pushed a temporary fix. Please check the pull request.

Verified against 8.4.0; make all (-O0), make check. I've pulled to 11.1.0 and verifed ; fine there too.


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request opened by jbowler:

#948 Temporary fix for issue #945

Signed-off-by: John Bowler jbowler@...


[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

I pushed a temporary fix. Please check the pull request.


[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

I pushed a temporary fix. Please check the pull request.

Verified against 8.4.0; make all (-O0), make check. I'll pull to 11.1.0 and verify but that seems slightly irrelevant.


Pull Request Closed #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request closed by jbowler:

#946 Workround for #945; check for 'ru sum strings'

The change made to 'tests/translate.test' to detect bug #824 also demonstrates what has been described to me as a compiler dependency in the output; on my system every occurence of "a " in the phoneme output is actually output as "a# ".

This is reported in issue #945 (I could find no prior report) however the issue does not seem to be fixed anytime soon; it has been around since the test was added in 30214437fc0bb0d067bb60cd550e192edcd2a626 on Dec. 20, 2020.

The test terminates 'make check' in the case in question and therefore obscures any following errors and makes it very difficult to debug unrelated changes to any part of espeak-ng or the build system.

This change adds support for tests/common:test_phon for a MESSAGE argument "ru sum strings" which was added to the test_phon call by 17e6bd0672421467554dcca95f04c2a63b70c510 on the 16th inst. (although that change doesn't seem to do anything).

If "ru sum strings" is passed every occurence of "a# " in the output received from espeak-ng is changed to "a ". Since "a# " does not occur in the correct output this makes no difference to the original check.

Obviously I've only tested this on my system, I know the issue is known but since there did not seem to be a bug report until I entered #945 I don't know what output other systems generate (other than the correct output). The change is intended to allow "make check" to skip this known error and therefore detect other issues.

Signed-off-by: John Bowler jbowler@...


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

Add this line after the declaration of n_ph_list2 in translate.c to stop malware:

#define n_ph_list2 (*(n_ph_list2 > N_PHONEME_LIST ? abort(),0 : &n_ph_list2))

I think even a function call to a meaningful error message would be safe at that point too, in practice, but abort() or _exit() are certain to be safe.


[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

Add this line after the declaration of n_ph_list2 in translate.c to stop malware:

#define n_ph_list2 (*(n_ph_list2 > N_PHONEME_LIST ? abort(),0 : &n_ph_list2))

I think even a function call to a meaningful error message would be safe at that point too, in practice, but abort() or _exit() are certain to be safe.

With those two changes (though I think the first is irrelevant; the #define is what counts) "make check" passes with gcc 11.1.0 if I message the relevant test_phon ru line as "broken" and 'ignore' the ssml audio checksum mismatch. So this can be done safely without annoying other developers.


[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

The attached patch does not protect against malware (crackers) but it should detect the overwrite with moderate to good reliability. translate.c.patch.txt

Some manner of malware protection could be achieved using a random byte for ucheck, though not much because the damage would already be done...


[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

It seems to have happening because of all the Latin "C" characters; they switch the phoneme table. There are checks in TranslateWord2 for overflow of ph_list2 in some places but not all. Places that do check check for space for four phonemes, but when the space runs out the code just drops through to more code that assumes there is space for four more phonemes. I suspect the "expected.txt" is actually heavily truncated as a result - there's no proper error recovery if the buffer space runs out, the code just keeps on skipping some phonemes and writing others.


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

Here's the debug approach with 8.4.0 compiled -O0, starting with a breakpoint on TranslateClause:

(gdb) run -xq -v ru -f /tmp/fragment.txt
Starting program: /home/jbowler/src/espeak-ng/install-8.4.0-debug/bin/espeak-ng -xq -v ru -f /tmp/fragment.txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff74bc640 (LWP 2248)]

Thread 1 "espeak-ng" hit Breakpoint 1, TranslateClause (tr=0x555555594810, 
    tone_out=0x7fffffffd2b4, voice_change=0x7fffffffd2b8)
    at src/libespeak-ng/translate.c:1985
1985    {
(gdb) watch ((char*)&count_words)[2]
Hardware watchpoint 4: ((char*)&count_words)[2]
(gdb) c
Continuing.

Thread 1 "espeak-ng" hit Hardware watchpoint 4: ((char*)&count_words)[2]

Old value = 0 '\000'
New value = 21 '\025'
SetPlist2 (p=0x7ffff7fbe820 <count_words>, phcode=21 '\025')
    at src/libespeak-ng/translate.c:1226
1226            p->stresslevel = 0;
(gdb) print /x count_words
$6 = 0x15006a
(gdb) next
1227            p->tone_ph = 0;
(gdb) print /x count_words
$7 = 0x15006a
(gdb) next
1228            p->synthflags = embedded_flag;
(gdb) print /x count_words
$8 = 0x15006a
(gdb) next
1229            p->sourceix = 0;
(gdb) print /x count_words
$9 = 0x150000
(gdb) next
1230            embedded_flag = 0;
(gdb) print /x count_words
$10 = 0x150000
(gdb) next
1231    }
(gdb) print /x count_words
$11 = 0x150000


[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By rhdunn:

Great! Thanks for doing the investigation. I agree that a# should be the correct phoneme in the output.


[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

The problem is a write-beyond-end of ph_list2. This contains 1000 elements and with both 8.4.0 and 11.1.0 the code seems to write beyond the end. The difference is that the list is followed by the global count_words in 8.4.0 whereas in 11.1.0 it is followed by option_punctlist. There does seem to be code to truncate the processing; the string in question is generating many more phonemes than there are characters/bytes in the input, but it doesn't seem to be quite correct. With 8.4.0 the overwrite extends into the 'replace_phonemes' global, with 11.1.0 it stops before that.

Here are copies of two sets of debug output obtained at a breakpoint on SubstitutePhonemes, first with 11.1.0 (which, I believe, is producing the correct output despite the overwrite):

(gdb) run -xq -v ru -f 
[fragment.txt](https://github.com/espeak-ng/espeak-ng/files/6528761/fragment.txt)

Starting program: /home/jbowler/src/espeak-ng/install/bin/espeak-ng -xq -v ru -f /tmp/fragment.txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff74bb640 (LWP 13887)]

Thread 1 "espeak-ng" hit Breakpoint 1, SubstitutePhonemes (
    plist_out=0x7fffffff1b00) at src/libespeak-ng/phonemelist.c:56
56              int n_plist_out = 0;
(gdb) print n_ph_list2
$33 = 998
(gdb) print ph_list2[1000]
$34 = {synthflags = 0, phcode = 21 '\025', stresslevel = 0 '\000', 
  sourceix = 0, wordstress = 0 '\000', tone_ph = 11 '\v'}
(gdb) print ph_list2+1000
$35 = (PHONEME_LIST2 *) 0x7ffff7fba3e0 <option_punctlist>
(gdb) print option_punctlist
$36 = L"\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\xb000000\x150000\x2b000000\x90000˞\x90000˞", '\000' <repeats 25 times>
(gdb) print count_words
$37 = 232

With 8.4.0, however, we can see that "option_punctlist" should be all zero (uninitialized/unwritten) but that count_words has been damaged:

(gdb) run -xq -v ru -f /tmp/fragment.txt
Starting program: /home/jbowler/src/espeak-ng/install-8.4.0-debug/bin/espeak-ng -xq -v ru -f /tmp/fragment.txt
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff74bc640 (LWP 13879)]

Thread 1 "espeak-ng" hit Breakpoint 1, SubstitutePhonemes (
    plist_out=0x7fffffff1ac0) at src/libespeak-ng/phonemelist.c:56
56              int n_plist_out = 0;
(gdb) print n_ph_list2
$67 = 1000
(gdb) print ph_list2[1000]
$68 = {synthflags = 126, phcode = 21 '\025', stresslevel = 0 '\000', 
  sourceix = 0, wordstress = 0 '\000', tone_ph = 11 '\v'}
(gdb) print ph_list2+1000
$69 = (PHONEME_LIST2 *) 0x7ffff7fbe820 <count_words>
(gdb) print option_punctlist
$70 = L'\000' <repeats 59 times>
(gdb) print count_words
$71 = 1376382
(gdb) print /x count_words
$72 = 0x15007e

I experimented with English and a file which has 314 bytes (though slightly fewer characters) which expand to slightly more than 998 phonemes; equation.txt. It does not show the problem; both builds truncate the output at 998 phonemes. Likewise if I translate that file into Russian ru-eq.txt (Google Translate) except that in that case the output isn't truncated, it is split.

I increase N_PHONEME_LIST to 10000, which avoids the need to split the string, and got fragment.out.txt; I'm pretty sure this is the correct string however in scenarios where the bug might show the string should be split into two TranslateClause outputs (they are separated by new line).

Anyway, without the increase, the first overwrite is at line 1226 of translate.c inside SetPlist2

(gdb) print p
$12 = (PHONEME_LIST2 *) 0x7ffff7fba3e0 <option_punctlist>

Someone who understands how the truncation/splitting of overlong input is meant to work probably needs to look at it now. It's easy to trap on more recent gcc's because option_punctlist is not actually modified, so a simple "watch" catches the problem (and it's a hardware watch, so it is fast). Catching it with 8.x or earlier is more of a problem since word-count changes; I'll see if I can trap it with (char*)word_count+2 (since word_count shouldn't go over 65535).

481 - 500 of 4679