Date   
Pull Request Updated #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request updated by BenTalagan:

#676 Rule alignment fixes for non compliant platforms / Fix for emscripten demo

This is a fix for #584, but the PR scope may be potentially larger : without this fix, the handling of compiled rules is not guaranteed to be compliant across platforms, since casting to int* may happen on non aligned char* , which has to be avoided.

Some minor options also have to be added to the emscripten compilation workflow to make it work again with newer versions.

[espeak-ng:master] reported: Rule alignment fixes for non compliant platforms / Fix for emscripten demo #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Pull Request #676 Rule alignment fixes for non compliant platforms / Fix for emscripten demo
By BenTalagan:

Hum, checks failed, but I've verified locally and it looks like that they were already broken before these changes. Is it normal?

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request opened by BenTalagan:

#676 Rule alignment fixes for non compliant platforms / Fix for emscripten demo

This is a fix for #584, but the PR scope may be potentially larger : without this fix, the handling of compiled rules is not guaranteed to be compliant across platforms, since casting to int* may happen on non aligned char* , which has to be avoided.

Some minor options also have to be added to the emscripten compilation workflow to make it work again with newer versions.


[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By BenTalagan:

@rhdunn : Thanks for your answer ! I have prepared a PR (#676), and limited myself to add a function to test sequential bytes to zero. It's very close to what was intended originally and non intrusive (the original code only tests four bytes, but after that they are still read one by one, not 4 by 4).

[espeak-ng:master] reported: emscripten demo broken, probably highlights underlying problem linked to dictionary compilation #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By rhdunn:

Thanks for the analysis. It looks like a version of Read4Bytes (https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/readclause.c#L280) for a const char * is needed to fix this -- renaming Read4Bytes to fread_uint32 and create a read_uint32 function. The code would then need to be audited to avoid direct casting to unsigned int *.

[espeak-ng:master] reported: emscripten demo broken, probably highlights underlying problem linked to dictionary compilation #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By BenTalagan:

Ok, after fixing the condition in FindReplacementChars, it seems I can get back a working generation/transcription with emscripten. I'd still need some expertise to tell me if I'm missing some potential similar alignment problems.

[espeak-ng:master] reported: emscripten demo broken, probably highlights underlying problem linked to dictionary compilation #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By BenTalagan:

After implementing a temp fix :

while (p[0] != 0 && p[1] != 0 && p[2] != 0 && p[3] != 0) {
				p++;
			}

the parsing of the rules looks ok now, but the translation is still messed up. Found at least one suspicious place (within commit 55c6403) :

https://github.com/espeak-ng/espeak-ng/blob/48719ad642f8a27d352983ab5964463a8c1e033e/src/libespeak-ng/translate.c#L1793-L1799

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By BenTalagan:

After taking time to investigate, I think I have found the problem. It comes from the following lines :

https://github.com/espeak-ng/espeak-ng/blob/48719ad642f8a27d352983ab5964463a8c1e033e/src/libespeak-ng/dictionary.c#L153-L154

They behave differently when compiled with llvm and emscripten. Under llvm, like with gcc, this will have what I would call an 'expected' behaviour : the cast to unsigned int from any position in the char* buffer will take into account the fact that we are not aligned to a multiple of 4 bytes. Under emscripten it doesn't : shifting by n+0, n+1, n+2 or n+3 bytes leads indifferently to the same result when casting to an int. One of the rules of the 'en' dictionary falls under this case, so the condition of having 4 successive bytes at 0 is not met and the rule parser explodes.

@rhdunn, I'd like your opinion on that issue : should we implement a simple fix for this (like testing the four bytes instead of casting to unsigned int), are there any other part of the code that may be concerned?


[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By BenTalagan:

After taking some time to investigate, I think I have found the problem. It comes from the following lines :

https://github.com/espeak-ng/espeak-ng/blob/48719ad642f8a27d352983ab5964463a8c1e033e/src/libespeak-ng/dictionary.c#L153-L154

They behave differently when compiled with llvm and emscripten. Under llvm, like with gcc, this will have what I would call an 'expected' behaviour : the cast to unsigned int from any position in the char* buffer will take into account the fact that we are not aligned to a multiple of 4 bytes. Under emscripten it doesn't : shifting by n+0, n+1, n+2 or n+3 bytes leads indifferently to the same result when casting to an int. One of the rules of the 'en' dictionary falls under this case, so the condition of having 4 successive bytes at 0 is not met and the rule parser explodes.

@rhdunn, I'd like your opinion on that issue : should we implement a simple fix for this (like testing the four bytes instead of casting to unsigned int), are there any other part of the code that may be concerned?


[espeak-ng:master] New Comment on Issue #584 emscripten demo broken, probably highlights underlying problem linked to dictionary compilation
By BenTalagan:

Add : after reading a bit on the net, it really looks like this should be rewritten. Some refs :

https://stackoverflow.com/questions/26995151/how-to-cast-char-array-to-int-at-non-aligned-position

https://stackoverflow.com/questions/13881487/should-i-worry-about-the-alignment-during-pointer-casting

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

2 New Commits:

[espeak-ng:master] By BenTalagan <ben_talagan@...>:
3e0150a34fd4: Fixing ungetc bad behavior under macOS Catalina by avoiding to ungetc a different char from the last getc

Modified: src/libespeak-ng/compiledata.c


[espeak-ng:master] By Reece H. Dunn <msclrhd@...>:
48719ad642f8: Merge remote-tracking branch 'BenTalagan/master'

Modified: src/libespeak-ng/compiledata.c


[espeak-ng/espeak-ng] Pull request closed by rhdunn:

#675 Fixing ungetc bad behavior under macOS Catalina

This is a fix for (#674). For archiving purpose, the problem was the following : it seems that the ungetc implementation under Catalina has interferences with ftell/fseek when ungetc pushes back a character which is different from the one that is preceding the current file pointer.

The fix consists in avoiding such a situation.


[espeak-ng:master] New Comment on Pull Request #675 Fixing ungetc bad behavior under macOS Catalina
By rhdunn:

Merged. Thanks.


[espeak-ng:master] Label added to issue #674 Build fails on MacOS Catalina by BenTalagan.


[espeak-ng:master] Issue #674 Build fails on MacOS Catalina closed by BenTalagan.

[espeak-ng:master] reported: Build fails on MacOS Catalina #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

PR ready (#675) :-) Thanks a lot for having taken such time to help!

Pull Request Opened #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request opened by BenTalagan:

#675 Fixing ungetc bad behavior under macOS Catalina

This is a fix for (#674). For archiving purpose, the problem was the following : it seems that the ungetc implementation under Catalina has interferences with ftell/fseek when ungetc pushes back a character which is different from the one that is preceding the current file pointer.

The fix consists in avoiding such a situation.

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

That works on my machine, so feel free to create a patch.

Are there any other problems?


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

I don't think so. I remember having a problem with emscripten a few months ago (#584), the compiled js was unable to parse correctly the bundled data. I don't know, it might be related (or not). Will give it a try again later, but I will prepare a PR for now.


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

Great. Thanks.

[espeak-ng:master] reported: Build fails on MacOS Catalina #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Ok! I think I got it :

	while (isspace(c))
		c = get_char();
   
	item_terminator = ' ';
	if ((c == ')') || (c == '(') || (c == ','))
		item_terminator = c;

	if ((c == ')') || (c == ','))
		c = ' ';
	else if(!feof(f_in))
		unget_char(c);

This will allow to compile the full phoneme files. This is my result :

Refs 4021,  Reused 3068
Compiled phonemes: 0 errors.
touch dictsource/en_extra
  DICT      espeak-ng-data/en_dict
Can't read dictionary file: '/Users/ben/poub/espeak-ng/espeak-ng-data/en_dict'
Using phonemetable: 'en'
Compiling: 'en_list'
	5458 entries
Compiling: 'en_emoji'
	1690 entries
Compiling: 'en_extra'
	0 entries
Compiling: 'en_rules'
	6743 rules, 103 groups (0)

Is it ok ?

[espeak-ng:master] reported: Build fails on MacOS Catalina #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

I think it's this part :

https://github.com/espeak-ng/espeak-ng/blob/f4c2ad3b7f3c29b40cd6029c5f82f2ba1156fff7/src/libespeak-ng/compiledata.c#L790-L801

Line 798 will modify the character. I tried to invert the last line, but pushing back the ')' will result now in an infinite loop. I guess pushing back a space was a trick to get rid of the parsing of the ')'.

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

One additional note, I could not find the source for Catalina. But in precedent versions of macOS, the code of ungetc is different depending on the fact that we push back the same character or not.

https://opensource.apple.com/source/Libc/Libc-1272.250.1/stdio/FreeBSD/ungetc.c.auto.html

In the first case, it's a simple rewind of the file pointer. In the second case, a buffer is used. That could explain why I see different behaviors depending on the fact that we push back the same character that was read and why it can interfere with ftell.


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

One additional note, I could not find the source for Catalina. But in precedent versions of macOS, the code of ungetc is different depending on the fact that we push back the same character or not.

ungetc.c

In the first case, it's a simple rewind of the file pointer. In the second case, a buffer is used. That could explain why I see different behaviors depending on the fact that we push back the same character that was read and why it can interfere with ftell.


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

Interesting. Thanks.

I wonder what is causing espeak to unget a character different to the previously read character. Maybe addressing that will fix the issue you are seeing on the Mac (and possibly on other BSD-based platforms).

Github push to espeak-ng:espeak-ng #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

1 New Commit:

[espeak-ng:master] By Valdis Vitolins <valdis.vitolins@...>:
f4c2ad3b7f3c: docs: add missing languages to the list

Modified: docs/languages.md

Github push to espeak-ng:espeak-ng #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

1 New Commit:

[espeak-ng:master] By Valdis Vitolins <valdis.vitolins@...>:
656bb42c39e9: uz: add initial support for Uzbek language

Added: dictsource/uz_list
Added: dictsource/uz_rules
Added: espeak-ng-data/lang/trk/uz
Added: phsource/ph_uzbek
Modified: CHANGELOG.md
Modified: Makefile.am
Modified: phsource/phonemes

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

Will do, thanks.


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Note : one other potential problem I see is that f_in can be switched to another file in the stack (thus the buffered byte for one file may interfere with another file). I don't know if it should be taken into account or not.

[espeak-ng:master] reported: Build fails on MacOS Catalina #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Ok! Just tell me when you're good and if you want me to perform some tests.

[espeak-ng:master] reported: Build fails on MacOS Catalina #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

I'm aware of that limitation, but IIUC, the compile phoneme code does not need a larger ungetc buffer -- it should only be restoring the last read character on a false branch of a loop (e.g. when reading a comment line).

I've adjusted for fseek in the subsequent comment.

I think this is similar to how the Mac logic is behaving, given the errors. Therefore, if we can figure out how to make this work (possibly in relation to the ftell call), it should hopefully shed some light on what needs to be modified to get the Mac implementation to work.

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Can you check what ungetc is returning, and if it is returning EOF then what is the errno value and associated message?

Ok, I've had a look at this. The result of ungetc looks ok : it always returns the value of the character that was ungot, even in the suspicious cases.


[espeak-ng:master] New Comment on Issue #652 Incorrect pronounciation of atelier
By valdisvi:

I added this as another word in en_list. Does atelier.wav.zip sounds right?


[espeak-ng:master] New Comment on Issue #655 Esperanto: pronunciation of A
By valdisvi:

I changed definition of a sound. Does this Esperanto.wav.zip sounds better?


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

However, I can see some potential problems with your implementation :

  • Multiple sequential calls to ungetc will not work (the buffer depths is 1)
  • Used in conjunction with fseek (like in UngetItem) or ftell (like at the start of NextItem) this may do strange things

Maybe one possible implementation would be to work with one big buffer instead of a file stream ?


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

However, I can see some potential problems with your implementation :

  • Multiple sequential calls to ungetc will not work (the buffer depth is 1)
  • Used in conjunction with fseek (like in UngetItem) or ftell (like at the start of NextItem) this may do strange things

Maybe one possible implementation would be to work with one big buffer instead of a file stream ?


[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

However, I can see some potential problems with your implementation :

  • Multiple sequential calls to ungetc will not work (the buffer depth is 1)
  • If used in conjunction with fseek (like in UngetItem) or ftell (like at the start of NextItem), this may do strange things

Maybe one possible implementation would be to work with one big buffer instead of a file stream ?