I do agree with you, all the more that the tokenization source code is really hard to read currently, and it often mixes languages. A clean and customizable tokenization algorithm would be great as it would also allow to provide a structured output from espeak-ng. This PR is really an emergency hack to fix an annoying corner case.

