[espeak-ng:master] reported: Unable to compile zhy dictionay on Windows #github


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By jaacoppi:

Short answer: Try "yue" for Cantonese. Note that Cantonese support is very basic. There are open issues about how to deal with mixed chinese characters and latin letters. We welcome all contributions to both Cantonese and Mandarin.

Long answer:

Confusingly, Mandarin and Cantonese have multiple names across the codebase.

docs/languages.md and espeak-ng --voices list them as: - "yue" for Cantonese - "cmn" for Mandarin

When calling espeak-ng, both "cmn" and "zh" are accepted for Mandarin. Even more strangely, the files related to Mandarin are in dictsource/zhy* and zh* for Cantonese. This doesn't make any sense to me.

The history behind these names can probably be found in git log. I suspect it having something to do with BCP-47 classification standard mentiond in the documentation. I think the naming convention should be clarified.

Hope this helps.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By rhdunn:

Yes, that's correct. The zhy name is not a valid BCP 47 name. The IANA language subtag registry (based on ISO 636-* for language codes) lists yue for Cantonese and cmn for Mandarin. Other voices have a similar change as well.

In the voice files, the old espeak names are listed as options for compatibility reasons.

The code still refers to them by the old names because they haven't been refactored to align with the changes. The naming of the phoneme files is not consistent either (espeak used the language names, e.g. ph_dutch, but not for cantonese and mandarin for some reason). I just hadn't got around to addressing it, as other things like emoji support were higher priority and I got burned out after that.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By feerrenrut:

Thanks for the explanations. We are currently splitting the "zhy_dict" to get the language to switch to. I'll add an exception for that language.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By feerrenrut:

Thanks for the explanations. We are currently splitting the "zhy_rules" to get the language to switch to. I'll add an exception for that language.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By jaacoppi:

Would it be easier for NVDA if there was yue_dict and cmn_dict instead of current zhy_dict and zh_dict?

The change for us should be easy since it's mostly about renaming files, not about changing code.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By feerrenrut:

I misspoke in my last comment (now edited), we are deriving the language codes from the *_rules files not from *_dict files. But if the same offer applies to those files, that would also simplify this.

I have a potential work-around for this nvaccess/nvda#12370 however perhaps we are only creating one dictionary when we should be creating two?


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By jaacoppi:

I'll try to rename yue_ and cmn_ files this weekend or next week. Looks like it's a bit more complicated than I thought because changes are needed in Android and Windows project files and in the language configuration files.

The dictionary files for Cantonese and Mandarin are different, you'll have to create both.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By feerrenrut:

Ok, thanks @jaacoppi. We might go ahead with the work around for now. But it would be good to be able to remove it in the future. On the other hand, perhaps we should have an explicit listing of the voices to use with language rules to produce dictionaries.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By feerrenrut:

So to confirm, I should compile the dictionaries using voice lang yue with zhy rules as well as using voice lang cmn with zhy rules to produce two dictionaries. Doing this seems to produce zhy_dict and zh_dict. If that is the case, it might be better for us just to have an explicit mapping rather than iterate over the files.


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By jaacoppi:

So to confirm, I should compile the dictionaries using voice lang yue with zhy rules as well as using voice lang cmn with zhy rules to produce two dictionaries. Doing this seems to produce zhy_dict and zh_dict. If that is the case, it might be better for us just to have an explicit mapping rather than iterate over the files.

Yes, except for the typo (cmn uses zh rules, not zhy).

To say this in another way, at the moment: 1) "espeak-ng --compile=yue" uses zhyrules (and other zhy* files) to produce zhy_dict (Cantonese) 2) both "espeak-ng --compile=cmn" and "espea-ng --compile=zh" use zhrules (and other zh* files) to produce zh_dict (Mandarin)

Once I'm done with the restructuring: 1) --compile=zh will be deprecated (so make sure to start using --compile=cmn for Mandarin to avoid problems in the future) 2) zhy* files will be renamed yue* (you'll need to rename them in your build scripts) 3) zh* files will be renamed cmn* (you'll need to rename them in your build scripts)


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By jaacoppi:

I've refactored zhy to yue without problems. Two questions for @rhdunn before I push the changes: 1) Are the extended dictionaries and original espeak compatibility still relevant? Do we keep them or can we simplify and just use one _lsit file per language like for most languages? See commit f67221120 to refresh your memory.

2) the codebase has instances of "zh" (like the switch case in tr_languages.c and the language tags in espeak-ng-data/lang/sit/cmn and espeak-ng-data/lang/sit/zhy. There's "language zh-cmn", "language zh 8" and so on. Can I get rid of everything that mentions zh so that the code explicitly uses either Mandarin or Cantonese, not generic Chinese?


espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #933 Unable to compile zhy dictionay on Windows
By rhdunn:

The extended dictionaries are still relevant. When enabled, they add a lot of entries generated from a dictionary (which I believe is from the Unicode unihan database, but I'm not 100% sure on that). This allows distributions that want to save space to ignore the listx files. It also keeps the generated lists separate from the custom exception lists. -- Ideally, the listx files should be autogenerated from the unihan list, but I haven't figured out how to do that yet, what version was used for the original list (to compare when using a script to generate the list), and what changes (if any) were made to that process.

Ideally, espeak compatibility where possible is important. Especially around the use of espeak. So keeping the old names in the lang files allows users/applications that have e.g. zh set as their TTS voice for orca/spech-dispatcher/etc. to still work when using espeak-ng. Likewise if/when they are using the espeak API. Therefore, the language zh 8, etc. should stay, but changing the dictionary/language file names should be OK as users don't directly interact with those.