[espeak-ng:master] new issue: Chinese dictionary multiple match #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by rongcuid:
#606 Chinese dictionary multiple match

Problem:

If dictionary defines word rules, such as (地 面) di4mian4, the characters are matched multiple times:

ESPEAK_DATA_PATH=$PWD/espeak-ng-data LD_LIBRARY_PATH=src:${LD_LIBRARY_PATH} ./src/.libs/espeak-ng -v zh -x -X "地面"
Replace: 地 面   di4mian4 
Translate 'di4mian4'
  1	   94:	d [t]

  1	  132:	i [i]

 22	  299:	4 [51]

  1	  165:	m [m]

  1	  132:	i [i]
 22	  138:	ia [iA]
 65	  139:	ia (DnK [iE]

  1	  169:	n [n]

 22	  299:	4 [51]

Replace: 面   mian4 
Translate 'mian4'
  1	  165:	m [m]

  1	  132:	i [i]
 22	  138:	ia [iA]
 65	  139:	ia (DnK [iE]

  1	  169:	n [n]

 22	  299:	4 [51]

ti53m'iE51n_| m'iE51n_|

I already tried my best to isolate the problem, and I think that the problem comes from dictionary.c:LookupDict2, which sets the global variable dictionary_skipwords. In a GDB session, I notice that dictionary_skipwords is set to 1, instead of an expected 2, which means that each character would be looked up and translated.

The code is difficult to follow, so this is as far as I can go. I can't quite understand what LookupDict2 does, so I cannot go deeper into the problem.

Note: I worked on my fork, which has a minor change to load listx after list:

https://github.com/rongcuid/espeak-ng/commit/20967bc54ac33a49bf4bbd63a8cc77254e32013f

That is done so that zh_listx is actually loaded. In this repo, zh_listx is not loaded and has no effect. If you want to fix this issue without trying my commit, then you may need to modify zh_extra to add in the entry I described.

The branch in my fork has some improvements on Mandarin, but I cannot file a pull request until this problem is solved.

Join espeak-ng@groups.io to automatically receive all group messages.