By rhdunn:

The generation of the xx_emoji files is what the make file is effectively doing -- they are there for maintainers when generating the files. If you can make that process easier, go ahead.

IIRC, not all CLDR files represent languages supported in espeak/espeak-ng, and they can have different names in the CLDR to the ones in espeak. Thus, if a new CLDR file appears with tts entries, it is not necessarily easy to automate adding it.

As for basing them on the en_emoji file, that is so they are in the correct order and grouped. That could be handled by the script, as the ordering is in unicode character order and the grouping is by the unicode Block property. The additional groups (country flags, keycaps, family, gendered sequences, etc.) are harder to group in an automated way, but could be doable.

I would start by generating the en_emoji file from the CLDR 33.1 data, as that is the version espeak currently uses. You should ideally not see any differences. Likewise when extending it to the other languages. You should then be able to update it.

Another thing to note is that the CLDR has various locales for the emoji names (e.g. en_AU, fr_CA, and de_CH). Ideally, the ones that are different from the base locale should be included as variants in the xx_emoji file. Currently, there are no variant numbers allocated for some of the locales, so would need adding as appropriate.

What I would like to be able to do is have lines like "test tEst $lang=en-CA" (I'm not sure on the syntax for that) and use that in the emoji files so you could then have country-dependent variants of the emoji names (and other things like letter pronunciations) more easily defined. I have various issues open relating to supporting this.

