[espeak-ng:master] reported: The change to tests/translate.test to detect #824 causes unexplained failures on some builds #github

espeak-ng@groups.io Integration <espeak-ng@...>

[espeak-ng:master] New Comment on Issue #945 The change to tests/translate.test to detect #824 causes unexplained failures on some builds
By jbowler:

The fundamental problem is that the code in TranslateWord2 is not reliably checking for end-of-output-buffer. Unless you do that the only suggestion above that works is (4). Even then there has to be code that, somehow, checks for the need for a new allocation - even with (4) each time a phoneme is added to the end of the list some code has to check for a new allocation. So I'm agreeing with your last paragraph but I think (3) offers some possibilities that maybe avoid this.

Suggestion (3) is easy and safe; it doesn't modify any code and it reliably detects overflow but it has no recovery. In the test case and in my equation.txt a circular buffer would get overwritten in the absence of some form of checking. Nevertheless it's an ingenious suggestion that has the merit that it self-evidently avoids the overflow (so it removes the exploit).

If (2) can be done and you combine it with (3) that might work. I think something like the test case would still fail - in the test case the "words" are ridiculously long because of the paucity of space (ASCII 0x20) characters which are the word separators. That said (2) seems attractive because, why wait to produce output? The current code does seem to search backward from end-of-clause to handle stress, but is that really necessary? How many phoneme look-ahead is really required before any given phoneme is output. Arabic might be a problem; IRC it is necessary to see the whole word to know how to select the glyphs, I don't know if that applies to phonemes too.

I don't understand (1); that's an ASCII \055 hyphen yet "-1" is a number and there is no pause after the "-" in English, whereas "2-1" is three words and there are two pauses. "+1" is also a number, "2-+1" and so on are valid. The rules in other languages may be completely different and actually speaking equations, even simple linear arithmetic, is effectively another language.

I suspect you are fine with a simplistic number parser that just recognizes decimal fractions. If someone wants to work out how to pronounce an engineering or scientific format number, e.g. "1.23E6", or something more mathematical using base 10 superscripts "1.23×106" that strikes me as much bigger and separate thing. My equation.txt example shows that mathematical/computer usages such as the word "iff" (pronounced "if and only if" ;-) and ">number" or "<number" are not currently handled.

If espeak-ng was using a simple circular buffer of size 1024 (i.e. slightly larger than at present) it wouldn't suffer the overwrite and it wouldn't fail in any case where it doesn't at present. I think I could relatively easily submit a patch for this without adding a complete class-based implementation of PHONEME_LIST2. (2) requires changes I don't understand but sounds simple, potentially it would allow ph_list2 to be reduced to, say, 128 phonemes; antidisestablishmentarianism only has 12 phonemes ;-)

Join espeak-ng@groups.io to automatically receive all group messages.