Topics

[espeak-ng:master] reported: Build fails on MacOS Catalina #github


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

That's strange. It should work on 64-bit.

The error messages relate to compiling the phsource/phonemes file, but the line numbers look wrong. See CompilePhonemeFiles in src/libespeak-ng/compiledata.c.

"Expected a number" is from https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/compiledata.c#L819.

"Expected ')'" is from https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/compiledata.c#L877.

The "The phoneme feature is not recognised" messages come from the ENS_UNKNOWN_PHONEME_FEATURE error code. It is coming from a call to the phoneme_add_feature function in phoneme.c at https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/compiledata.c#L2024.

Looking at that code, we know several things: 1. the values passed to phoneme_add_feature appear to be missing the first character (e.g. "engthmod" instead of "lengthmod"); 2. from the code, NextItem (https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/compiledata.c#L740) is returning a value less than 0 as that is what this if branch is checking; 3. this appears to be after processing whitespace.

I would start by printing values and control flow in NextItem to see what is happening there. NOTE: item_string is the string value of the token being read.


My intuition given the above information would be that the issue is at https://github.com/espeak-ng/espeak-ng/blob/master/src/libespeak-ng/compiledata.c#L760 in that it is checking for \n when identifying the end of a comment, so is likely incorrectly processing the file when using Mac (\r) line endings.

There are likely other cases like that, which would also explain why the line numbers are wrong in the output.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Hello Reece and thanks a lot for having taken the time to provide some clues. I am currently investigating ; adding a printf("%s\n", item_string); at : https://github.com/espeak-ng/espeak-ng/blob/fe7aa87422689dc33f7ed4f332de9383a3ff3834/src/libespeak-ng/compiledata.c#L810

will yield the following output :

... a few dozens of numbers
35
5
1
1
75
1
150
1
200
1
10
1
10
3
welStarts
phonemes(129): Expected a number
phonemes(129): Expected ')'
... etc ... (after that the parsing is messy)

So everything looks good until the parsing gets messy, the strangest thing being that the first occurence of a bad item parsing yields welStarts which looks like a chunk of 'NextVowelStarts'. I'm continuing my investigation.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Second remark, investigating on the LF ('\n') and CR ('\r') track : I have only found two phoneme files with '\r', namely ph_nahuatiand ph_setswana. I have regularized these files with LF but this has no impact ; so the problem might be elsewhere.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Ok, further investigation. Adding some debug here :

https://github.com/espeak-ng/espeak-ng/blob/fe7aa87422689dc33f7ed4f332de9383a3ff3834/src/libespeak-ng/compiledata.c#L789

printf("%d: %s : %d\n", type, item_string, strlen(item_string));

And also at the start of each phone compilation gives me this :

... everything looks ok before ...
Compile phoneme: _;_
7: pause : 5
7: starttype : 9
5: _ : 1
7: endtype : 7
5: _ : 1
7: lengthmod : 9
3: 1 : 1
7: length : 6
3: 200 : 3
7: endphoneme : 10
7: phoneme : 7
2: _^_ : 3
Compile phoneme: _^_
7: pause : 5
7: starttype : 9
5: _ : 1
7: endtype : 7
5: _ : 1
7: lengthmod : 9
3: 1 : 1
7: length : 6
3: 10 : 2
7: endphoneme : 10
7: phoneme : 7
2: _X1 : 3
Compile phoneme: _X1
7: pause : 5
7: starttype : 9
5: _ : 1
7: endtype : 7
5: _ : 1
7: lengthmod : 9
3: 1 : 1
7: length : 6
3: 10 : 2
7: endphoneme : 10
7: phoneme : 7
2: ? : 1
Compile phoneme: ?
7: vls : 3
7: glt : 3
7: stp : 3
7: lengthmod : 9
3: 3 : 1
7: nolink : 6
7: Vowelin : 7
7: glstop : 6
7: Vowelout : 8
7: Vowelout : 8
7: glstop : 6
7: WAV : 3
7: WAV : 3
2: ustop/null : 10
7: brk : 3
7: FMT : 3
2: r3/r_trill : 10
7: EndSwitch : 9
7: VowelEnding : 11
2: w/xw : 4
4: welStarts : 9
phonemes(129): Expected a number
phonemes(129): Expected ')'
... etc ...

The parsing of the ? does not go well, it starts to read some tokens multiple times (Vowelout and WAV) and after that it does not manage to fall back on its feet (no endphoneme). For memory, here is the phoneme file around that point :

phoneme  _X1  //  a language specific action
  pause
  starttype _ endtype _
  lengthmod 1
  length 10
endphoneme 

phoneme ?  // glottal stp
  vls glt stp
  lengthmod 3   // ??
  nolink
  Vowelin  glstop
  Vowelout glstop
  WAV(ustop/null)
endphoneme

The funny thing is that the next token brk is quite far in the file.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

My suspicions go to the ungetc function :

This is how I instrumented it :

static unsigned int get_char()
{
	unsigned int c;
	c = fgetc(f_in);
	if (c == '\n')
		linenum++;

        printf("Got '%c'\n", c);

	return c;
}

static void unget_char(unsigned int c)
{
	ungetc(c, f_in);
	if (c == '\n')
		linenum--;

        printf("Ungot '%c'\n", c);
}

For the parsing of the ? phoneme this is what I get :

Compile phoneme: ?
Got '/'
Got '/'
Got ' '
Got 'g'
Got 'l'
Got 'o'
Got 't'
Got 't'
Got 'a'
Got 'l'
Got ' '
Got 's'
Got 't'
Got 'p'
Got '
'
Got ' '
Got ' '
Got 'v'
Got 'l'
Got 's'
Got ' '
Got 'g'
7: vls -> 'g'
Ungot 'g'
Got 'g'
Got 'l'
Got 't'
Got ' '
Got 's'
7: glt -> 's'
Ungot 's'
Got 's'
Got 't'
Got 'p'
Got '
'
Got ' '
Got ' '
Got 'l'
7: stp -> 'l'
Ungot 'l'
Got 'l'
Got 'e'
Got 'n'
Got 'g'
Got 't'
Got 'h'
Got 'm'
Got 'o'
Got 'd'
Got ' '
Got '3'
7: lengthmod -> '3'
Ungot '3'
Got '3'
Got ' '
Got ' '
Got ' '
Got '/'
3: 3 -> '/'
Ungot '/'
Got '/'
Got '/'
Got ' '
Got '?'
Got '?'
Got '
'
Got ' '
Got ' '
Got 'n'
Got 'o'
Got 'l'
Got 'i'
Got 'n'
Got 'k'
Got '
'
Got ' '
Got ' '
Got 'V'
7: nolink -> 'V'
Ungot 'V'
Got 'V'
Got 'o'
Got 'w'
Got 'e'
Got 'l'
Got 'i'
Got 'n'
Got ' '
Got ' '
Got 'g'
7: Vowelin -> 'g'
Ungot 'g'
Got 'g'
Got 'l'
Got 's'
Got 't'
Got 'o'
Got 'p'
Got '
'
Got ' '
Got ' '
Got 'V'
7: glstop -> 'V'
Ungot 'V'
Got 'V'
Got 'o'
Got 'w'
Got 'e'
Got 'l'
Got 'o'
Got 'u'
Got 't'
Got ' '
Got 'g'
7: Vowelout -> 'g'
Ungot 'g'
Got 'V'
Got 'o'
Got 'w'
Got 'e'
Got 'l'
Got 'o'
Got 'u'
Got 't'
Got ' '
Got 'g'
7: Vowelout -> 'g'
Ungot 'g'
Got 'g'
Got 'l'
Got 's'
Got 't'
Got 'o'
Got 'p'
Got '
'
Got ' '
Got ' '
Got 'W'
7: glstop -> 'W'
Ungot 'W'
Got 'W'
Got 'A'
Got 'V'
Got '('
7: WAV -> '('
Ungot '('
Got 'W'
Got 'A'
Got 'V'
Got '('
7: WAV -> '('
Ungot '('
Got '('
Got 'u'
Got 's'
Got 't'
Got 'o'
Got 'p'
Got '/'
Got 'n'
Got 'u'
Got 'l'
Got 'l'
Got ')'
2: ustop/null -> ')'
Ungot ' '
Got ' '
Got 'b'
Got 'r'
Got 'k'
Got '
'
Got ' '
Got ' '
Got 'F'
7: brk -> 'F'
Ungot 'F'
Got 'F'
Got 'M'
Got 'T'
Got '('
7: FMT -> '('
Ungot '('
Got '('
Got 'r'
Got '3'
Got '/'
Got 'r'
Got '_'
Got 't'
Got 'r'
Got 'i'
Got 'l'
Got 'l'
Got ')'
2: r3/r_trill -> ')'
Ungot ' '
Got ' '
Got ' '
Got ' '
Got ' '
Got 'E'
Got 'n'
Got 'd'
Got 'S'
Got 'w'
Got 'i'
Got 't'
Got 'c'
Got 'h'
Got '
'
Got '
'
Got ' '
Got ' '
Got ' '
Got ' '
Got 'V'
7: EndSwitch -> 'V'

We can clearly see that after vowelout and WAV we don't get what we have unget. Worse, there's a full jump after the ustop/null that sends us way beyond the current position.

What's your opinion on this ?

Edit: This might be a wrong guess, it looks like there is also the UngetItem functions which may interfere here. Still investigating.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

What I am seeing with the version on commit 07012f60736016e533a0428360797b08047a22e6 (which is working for me on Debian linux) is:

Compile phoneme: ?
7: vls : 3
7: glt : 3
7: stp : 3
7: lengthmod : 9
3: 3 : 1
7: nolink : 6
7: Vowelin : 7
7: glstop : 6
7: Vowelout : 8
7: Vowelout : 8
7: glstop : 6
7: WAV : 3
7: WAV : 3
2: ustop/null : 10
7: endphoneme : 10
7: phoneme : 7
2: : : 1
Compile phoneme: :


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

Those are what I am seeing as well.

So it looks like it could be the code before reading the item_string in NextItem that skips whitespace. Maybe the comment skipping logic? Try something like:

            while (!feof(f_in) && ((c = get_char()) != '\n') && (c != '\n'))
                ;

The Mac get_char may be normalising '\n` characters.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

ftell is called at the start of NextItem, so if my hypothesis is correct, it could mess with ungetc.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

I'm not sure why ungetc is not working properly on Mac in this case. It is most likely a bug in their implementation.

Probably, you must be right. I'm a bit surprised that it was broken recently. Or maybe, the way espeak uses ftell + ungetc + fseek is not really legit ?


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

Adding:

@@ -881,6 +886,7 @@ static int NextItemBrackets(int type, int control)
 static void UngetItem()
 {
        fseek(f_in, f_in_displ, SEEK_SET);
+       f_in_ungetc = EOF;
        linenum = f_in_linenum;
 }

for the seek behaviour of ungetc results in the errors looking like:

phonemes(124): The phoneme feature is not recognised: 'owelout'.
phonemes(359): The phoneme feature is not recognised: 'owelout'.
phonemes(359): The phoneme feature is not recognised: '2'.
phonemes(359): The phoneme feature is not recognised: '1600'.

Note that seeking to f_in_displ - 1 does not work. Maybe it is only occasionally off by one (i.e. when it is trying to read a keyword like Vowelout or length). I haven't yet tracked down where this offset adjustment would be needed.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By rhdunn:

I'm aware of that limitation, but IIUC, the compile phoneme code does not need a larger ungetc buffer -- it should only be restoring the last read character on a false branch of a loop (e.g. when reading a comment line).

I've adjusted for fseek in the subsequent comment.

I think this is similar to how the Mac logic is behaving, given the errors. Therefore, if we can figure out how to make this work (possibly in relation to the ftell call), it should hopefully shed some light on what needs to be modified to get the Mac implementation to work.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Ok! Just tell me when you're good and if you want me to perform some tests.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

I think it's this part :

https://github.com/espeak-ng/espeak-ng/blob/f4c2ad3b7f3c29b40cd6029c5f82f2ba1156fff7/src/libespeak-ng/compiledata.c#L790-L801

Line 798 will modify the character. I tried to invert the last line, but pushing back the ')' will result now in an infinite loop. I guess pushing back a space was a trick to get rid of the parsing of the ')'.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

Ok! I think I got it :

	while (isspace(c))
		c = get_char();
   
	item_terminator = ' ';
	if ((c == ')') || (c == '(') || (c == ','))
		item_terminator = c;

	if ((c == ')') || (c == ','))
		c = ' ';
	else if(!feof(f_in))
		unget_char(c);

This will allow to compile the full phoneme files. This is my result :

Refs 4021,  Reused 3068
Compiled phonemes: 0 errors.
touch dictsource/en_extra
  DICT      espeak-ng-data/en_dict
Can't read dictionary file: '/Users/ben/poub/espeak-ng/espeak-ng-data/en_dict'
Using phonemetable: 'en'
Compiling: 'en_list'
	5458 entries
Compiling: 'en_emoji'
	1690 entries
Compiling: 'en_extra'
	0 entries
Compiling: 'en_rules'
	6743 rules, 103 groups (0)

Is it ok ?


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #674 Build fails on MacOS Catalina
By BenTalagan:

PR ready (#675) :-) Thanks a lot for having taken such time to help!