[espeak-ng:master] reported: Option to keep punctuation signs in IPA output ? #github
email@example.com Integration <espeak-ng@...>
[espeak-ng:master] New Comment on Issue #275 Option to keep punctuation signs in IPA output ?
Hi Neil, I had a look at it when opening this issue but it soon looked like a too difficult task for me. IIRC, the problem was that I could not find a way to do it easily with the current architecture : espeak first uses the punctuation to perform a structure analysis and split sentences into chains of clauses and it drops the punctuation at that early stage. Thus, we could mark the clauses as "beginning with that punctuation sign" or "ending with this punctuation sign", but this would not be sufficient because there could be clause-non-affecting punctuation signs in the middle of the clauses. The problem is that keeping those signs would parasite the next stage, where clauses are transcribed into phonemes. Additionally, if you want to keep spaces and carriage returns, it makes things even more difficult.
For my needs, I work in a scripted environment so I have implemented some kind of hacky solution around espeak, with a pre-processor and a post-processor. The pre-processor inserts characters in the input around the punctuation, like tags (using far encoded chars from the unicode like U+F8E0, etc) , and it saves the list of encountered tags/punctuation in an array for later. Additionally, I declare each one of these tag characters in the dictionaries as words, their image as phonemes being themselves, and finally I declare these phonemes in the phoneme files as virtual phonemes, having (again) themselves as an ipa output. That way, these "tags" are carried all along the chain down to the ipa and because they are declared virtual, they do not seem to parasite the clause analysis or the interaction between phonemes. At the end, in the post-processor, I replace back the tags by the original punctuation signs I had saved in an array with the pre-processor.
As I said this is totally ugly (happily there seems to be no vomiting smiley on github).
I couldn't see any other way of doing it with the current architecture, and that's why I've referenced #369 : with a different architecture, a more structured analysis and representation of the input (like a chain of tokens), that would probably be far easier. Seems to me it's quite a lot of work but imho the way to go.
I don't know if it really helps. I'd be happy to have a more accurate opinion from the maintainers too, I'm definitely not at ease with espeak's source code.