Date   

[espeak-ng:master] reported: Can I contribute if I am not a programmer? #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #291 Can I contribute if I am not a programmer?
By orschiro:

Thanks for your intro! This is all very new to me.

Let's start from the end with a concrete example.

I want to improve the narrate feature of Firefox which relies on espeak. I want the voice to sound more pleasant and more human to listen to.

Where do I start to make this better? Or do I need to be a programmer for this?

image


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #291 Can I contribute if I am not a programmer?
By valdisvi:

Thanks for your interest!

Anybody can contribute to pronunciation rules for language, which he or she knows (e.g. German, Polish, English etc.). You can submit just issues about incorrect pronunciation in its phonetic notation, e.g. "car" should be said as "k'A@".

But, better would be considering translation rules, which are described in dictionary.md file. Configuration files for e.g. English are: en_rules and en_list. You can test quite recent development version of espeak-ng online page.

There are some open issues about documentation and languages, but these are quite technical and require understanding of espeak-ng infrastructure.

eSpeakNG uses formant synthesis to produce sound, therefore recorded voice can be used only to get voice spectrum, which then can be approximately emulated by eSpeakNG. There are other open source synthesizers, which use recorded voice, e.g. Festival, but also for these, making of recordings is just first step in data analysis process of particular language and voice.

eSpeakNG requires specific set of technical skills to improve it, therefore list of its contributors is not long. But, if there would be considerable interest, we could create e.g. Patreon page to pay money for its contributors.


[espeak-ng:master] New Comment on Issue #293 Last audio chunk seems to be cut off in the emscripten js demo
By valdisvi:

Can't it be related to issue #278?


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #293 Last audio chunk seems to be cut off in the emscripten js demo
By BenTalagan:

Note : augmenting the size of the buffer makes the cut happen sooner in the generated audio. Diminishing it makes it vanish. So the last chunk is probably being dropped.


[espeak-ng:master] New Comment on Issue #293 Last audio chunk seems to be cut off in the emscripten js demo
By BenTalagan:

Note : augmenting the size of the buffer makes the cut happen sooner in the generated audio. Diminishing it makes it vanish. So the last chunk is probably really being dropped.


[espeak-ng:master] New Comment on Issue #293 Last audio chunk seems to be cut off in the emscripten js demo
By BenTalagan:

Note : augmenting the size of the PushAudioNode buffer makes the cut happen sooner in the generated audio. Diminishing it makes it vanish. So the last chunk is probably really being dropped.


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

1 New Commit:

[espeak-ng:master] By Ben <ben@...>:
c98ed977f29c: Adding IPA output to emscripten js library and demo

Modified: emscripten/demo.html
Modified: emscripten/espeakng_glue.cpp
Modified: emscripten/espeakng_glue.idl
Modified: emscripten/js/demo.js
Modified: emscripten/js/espeakng.js
Modified: emscripten/post.js


[espeak-ng/espeak-ng] Pull request closed by rhdunn:

#292 Adding IPA output to emscripten js library and demo

This adds the IPA output feature to the emscripten js library and demo.

It uses the espeak_SetPhonemeTrace to dump the result of espeak_Synth to a temporary virual file in the virtual file system of emscripten and then the content is yielded up to the javascript.


[espeak-ng:master] New Comment on Issue #289 Add phonemes/IPA output to the emscripten js library
By rhdunn:

I have merged the PR.

Thanks for implementing this feature.


[espeak-ng:master] Issue #289 Add phonemes/IPA output to the emscripten js library closed by BenTalagan.


[espeak-ng:master] New Issue Created by BenTalagan:
#289 Add phonemes/IPA output to the emscripten js library

Hello all,

One really nice feature would be imho to be able to retrieve phonemes for a synthesis thanks to the js library (in fact I am more specifically interested by the IPA output).

Is it hard to do ? It seems to me that everything is available since espeak_TextToPhonemes is exposed through the API right ? So it should just be converted to an asynchronous method belonging to the js worker then (by extending the existing glue) ?

Thanks!


[espeak-ng:master] Label added to issue #289 Add phonemes/IPA output to the emscripten js library by BenTalagan.


[espeak-ng:master] New Comment on Issue #289 Add phonemes/IPA output to the emscripten js library
By BenTalagan:

Awesome. Thanks, Reece!


[espeak-ng:master] reported: Add phonemes/IPA output to the emscripten js library #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #289 Add phonemes/IPA output to the emscripten js library
By BenTalagan:

Ok, found the answer to my question : tasks executed from the worker seem to be queued in the message queue of the worker and executed one by one.

But since each command may change the configuration of espeak, each command should prepare its own configuration carefully and do some cleanup at the end (this wasn't my case for the IPA synthesis output, enabling the IPA output was also switching on the console output without disabling it at the end).

I have written a PR : #292 for this feature, now the demo will generate IPA output too.


[espeak-ng:master] new issue: Last audio chunk seems to be cut off in the emscripten js demo #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by BenTalagan:
#293 Last audio chunk seems to be cut off in the emscripten js demo

Hi,

I've noticed that the last audio chunk in the emscripten js library seems to be cut. That behaviour can be observed at the online demo here : https://www.readbeyond.it/espeakng/ .

If you enter :

Hey, speakers

you will not hear the last phoneme(s), but if you enter

Hey, speakers .

with some punctuation, you will hear them.

It's not happening with the command line version of espeak-ng.

I'm not sure if it's a problem in the API or in the way it is used by the emscripten glue (or even at a higher level, in the demo.js callback).


Pull Request Updated #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request updated by BenTalagan:

#292 Adding IPA output to emscripten js library and demo

This adds the IPA output feature to the emscripten js library and demo.

It uses the espeak_SetPhonemeTrace to dump the result of espeak_Synth to a temporary virual file in the virtual file system of emscripten and then the content is yielded up to the javascript.


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request opened by BenTalagan:

#292 Adding IPA output to emscripten js library and demo

This adds the IPA output feature to the emscripten js library and demo.

It uses the espeak_SetPhonemeTrace to dump the result of espeak_Synth to a temporary virual file in the virtual file system of emscripten.


[espeak-ng:master] New Comment on Issue #289 Add phonemes/IPA output to the emscripten js library
By BenTalagan:

Ok, found the answer to my question : tasks executed from the worker seem to be queued in the message queue of the worker and executed one by one.

But since each command may change the configuration of espeak, each command should prepare its own configuration carefully and do some cleanup at the end (this wasn't my case for the synthesis output, and the IPA output was enabling console output without disabling it).

I have written a PR : #292 for this feature.


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #289 Add phonemes/IPA output to the emscripten js library
By BenTalagan:

Ok, I'm trying to add this feature by myself and make a Pull Request. I just need a piece of advice on this. It's kind of working, I'm using the espeak_SetPhonemeTrace function from the API, plus the virtual file system from emscripten to write the result and get it to and from a file. This is more or less the heart of the feature, really simple) :

    espeak_SetPhonemeTrace(phoneme_conf, f_phonemes_out);
    espeak_Synth(aText, 0, 0, POS_CHARACTER, 0, 0, NULL, NULL);
    espeak_SetPhonemeTrace(phoneme_conf, NULL);

My question concerns the architecture of the demo ; currently, I am doing this by adding the feature as an asynchronous method to the espeakNG worker (like it's done for the rest of the demo). The problem that arrises is that I'm using the same instance of the worker to call both the synthesize (audio) & the synthesize_ipa (text) methods, the one created by :

 tts = new eSpeakNG('js/espeakng.worker.js',...)

I'm not totally sure about what happens under the hood of the demo, but it seems to me that if this instance is shared, a text synthesis and an audio synthesis both running at the same time may interfere, right (i.e. it's not a good idea to turn on and off the phoneme trace while an audio synthesis is currently running) ? Should I create a second eSpeakNG instance dedicated to IPA synthesis?

Thanks!


[espeak-ng:master] New Comment on Issue #289 Add phonemes/IPA output to the emscripten js library
By BenTalagan:

Ok, I'm trying to add this feature by myself and make a Pull Request. I just need a piece of advice on this. It's kind of working, I'm using the espeak_SetPhonemeTrace function from the API, plus the virtual file system from emscripten to write the result and get it to and from a file. This is more or less the heart of the feature, really simple :

    espeak_SetPhonemeTrace(phoneme_conf, f_phonemes_out);
    espeak_Synth(aText, 0, 0, POS_CHARACTER, 0, 0, NULL, NULL);
    espeak_SetPhonemeTrace(phoneme_conf, NULL);

My question concerns the architecture of the demo ; currently, I am doing this by adding the feature as an asynchronous method to the espeakNG worker (like it's done for the rest of the demo). The problem that arrises is that I'm using the same instance of the worker to call both the synthesize (audio) & the synthesize_ipa (text) methods, the one created by :

 tts = new eSpeakNG('js/espeakng.worker.js',...)

I'm not totally sure about what happens under the hood of the demo, but it seems to me that if this instance is shared, a text synthesis and an audio synthesis both running at the same time may interfere, right (i.e. it's not a good idea to turn on and off the phoneme trace while an audio synthesis is currently running) ? Should I create a second eSpeakNG instance dedicated to IPA synthesis?

Thanks!


[espeak-ng:master] new issue: Can I contribute if I am not a programmer? #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by orschiro:
#291 Can I contribute if I am not a programmer?

My apologies if this is the wrong place to ask this question.

Basically, can I – and if how – contribute to this great project if I am not a programmer?

Can I read something aloud, for instance, that can help improve espeak?

Thanks!


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by rhdunn:
#290 Use proper structures to represent the hash chain used in dictionaries.

The compiledict.c and dictionary.c code read/write a hash chain data structure. The problem with this is that: 1. the structure is managed using opaque char * buffers, instead of a proper hash_chain_entry data structure; 1. the data structure is partially written in compile_dictlist_file (pointer to the current/previous entry in the list) and partially in compile_line (length and flags).

This leads to bugs like issue #287 and #271. It makes the code harder to read and maintain.

The fix for this is to: 1. make compile_line return the content length and flags as output parameters -- make compile_dictlist_file write that data to the hash chain entry buffer; 1. create a proper hash_chain_entry struct that is used by both compiledict.c and dictionary.c -- use something like: ``` hash_chain_entry *e = malloc(sizeof(hash_chain_entry) + length); e->next = hash_chains[hash]; e->length = length; e->flags = flags;

char *data = (char *)(e+1);
memcpy(data, dict_line, length);

hash_chains[hash] = e;
```
  1. factor the data structure code into a hashchain.[ch] file, so share the code and use names like hash_chain_clear.


[espeak-ng:master] New Issue Created by rhdunn:
#290 Use proper structures to represent the hash chain used in dictionaries.

The compiledict.c and dictionary.c code read/write a hash chain data structure. The problem with this is that: 1. the structure is managed using opaque char * buffers, instead of a proper hash_chain_entry data structure; 1. the data structure is partially written in compile_dictlist_file (pointer to the current/previous entry in the list) and partially in compile_line (length and flags).

This leads to bugs like issue #287 and #271. It makes the code harder to read and maintain.

The fix for this is to: 1. make compile_line return the content length and flags as output parameters -- make compile_dictlist_file write that data to the hash chain entry buffer; 1. create a proper hash_chain_entry struct that is used by both compiledict.c and dictionary.c -- use something like: ``` hash_chain_entry *e = malloc(sizeof(hash_chain_entry) + length); e->next = hash_chains[hash]; e->length = length; e->flags = flags;

char *data = (char *)(e+1);
memcpy(data, dict_line, length);

hash_chains[hash] = e;
```
  1. factor the data structure code into a hashchain.[ch] file, so share the code and use names like hash_chain_clear.


[espeak-ng:master] Label added to issue #290 Use proper structures to represent the hash chain used in dictionaries. by rhdunn.


[espeak-ng:master] Label added to issue #290 Use proper structures to represent the hash chain used in dictionaries. by rhdunn.


[espeak-ng:master] new issue: Option to keep punctuation signs in IPA output ? #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by BenTalagan:
#275 Option to keep punctuation signs in IPA output ?

Hi all,

It seems to me not trivial at all to implement, but what do you think of having an option to keep the punctuation signs (or, more generally speaking, uninterpreted signs) within the output when dumping IPA? As well, the general structure of the treated document (line feeds+paragraphs) could be kept, instead of dumping the clauses one after each other.

Currently, I'm work-arounding the issue with a set of fragile scripts by inserting tokens inside the input original document and removing their images from the IPA output, but I think it's really hacky and not as good as having the feature inside the core of the engine (because these tokens may interfere with the general analysis done by espeak and influence its processing).

Cheers!


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] Label added to issue #276 'ucd/ucd.h' file not found by Erhannis.


[espeak-ng:master] Issue #276 'ucd/ucd.h' file not found closed by Erhannis.


[espeak-ng:master] New Issue Created by rhdunn:
#271 Fix building emoji for ml, my, ne, pa and ta

The following languages cause a segfault when compiling/using the dictionary with emoji support added to them:

  • [ ] ml (Malayalam)
  • [ ] my (Myanmar/Burmese)
  • [ ] ne (Nepali)
  • [ ] pa (Punjab)
  • [ ] ta (Tamil)


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

1 New Commit:

[espeak-ng:master] By Reece H. Dunn <msclrhd@...>:
22270bd25952: Fix reading hash table entries > 128. This is related to ebfa320956169e3419234b72fee51bd596867661, but when reading the hash chain entry length, not writing it. If char is signed, then before this change the length would be negative, causing problems loading the dictionary.

Modified: src/libespeak-ng/dictionary.c


[espeak-ng:master] New Comment on Issue #287 Buffer overflow when compiling dictionaries
By rhdunn:

I have also fixed the equivalent issue when reading dictionary files, which should address issue #271, allowing me to add back emoji support for those languages.


[espeak-ng:master] Label added to issue #287 Buffer overflow when compiling dictionaries by feerrenrut.


[espeak-ng:master] Label added to issue #287 Buffer overflow when compiling dictionaries by feerrenrut.


[espeak-ng:master] New Issue Created by feerrenrut:
#287 Buffer overflow when compiling dictionaries

Background:

While updating the NVDA espeak-ng submodule to commit fb97d1bd7564c2ff7c305cf7fcbdd29132234846 we have run into some problems compiling the dictionaries. The build system was occasionally halting with a python crash. After some investigation I found that a missing new line at the end of 'ar_listx' was causing a buffer overrun, which I have worked around with https://github.com/nvaccess/espeak-ng/commit/0994206f710a4defc1eecfb78ab70ff57c58fcda.

For this problem (missing new line character); assuming there is no technical reason that a new line character must be present, I suggest that the dictionary compilation is modified to accept files with missing new lines. Otherwise, to save time in debugging and accidental newlines I suggest that a missing newline is detected and reported during dictionary compilation.

There still seems to be a crash or sometimes the process runs indefinitely. Further investigation has led me to find that in LoadDictionary length is sometimes negative. Trying to match up the signed / unsignedness of 'length' (particularly in compile_dictlist_end) I found in some cases it is actually negative when written. On my system I can reliably reproduce this when running through 'compile_dictlist_end()' for the 'bn' dictionary, at hash number 497 we eventually get a length value of -117 (signed) / 4294967179 (unsigned).

I don't really understand how length comes to be -117. I added some asserts (and then built espeak with /SDL /RTC1 /MDd /Od /D_DEBUG MSVC flags) to make debugging this a little easier, see the full diff of espeak: https://github.com/espeak-ng/espeak-ng/compare/master...nvaccessfixEspeakCrashDuringDictCompilation

Also the branch of NVDA that is used to build the fixEspeakCrashDuringDictCompilation is updateEspeak-debugBranch specifically commit: https://github.com/nvaccess/nvda/commit/6bf896248798ee041c034372cc0b18efa5c611dc

Key points:

  • Missing new line at the end of 'ar_listx' causes buffer overrun
  • Signed / unsigned mismatch of length in compile_dictlist_end / LoadDictionary
  • This value appears to be either too large or negative, indicating a problem.

To reproduce locally:

  • On a windows machine (with the required NVDA dependencies)
  • Clone https://github.com/nvaccess/nvda repository
  • Checkout 6bf896248798ee041c034372cc0b18efa5c611dc
  • Build nvda with python scons.py source


[espeak-ng:master] Label added to issue #289 Feature : Add phonemes/IPA output to the emscripten js library by BenTalagan.


[espeak-ng:master] New Issue Created by BenTalagan:
#289 Add phonemes/IPA output to the emscripten js library

Hello all,

One really nice feature would be imho to be able to retrieve phonemes for a synthesis thanks to the js library (in fact I am more specifically interested by the IPA output).

Is it hard to do ? It seems to me that everything is available since espeak_TextToPhonemes is exposed through the API right ? So it should just be converted to an asynchronous method belonging to the js worker then (by extending the existing glue) ?

Thanks!


[espeak-ng:master] reported: Buffer overflow when compiling dictionaries #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #287 Buffer overflow when compiling dictionaries
By rhdunn:

Thanks for the investigation you have done here, I have used that to identify and hopefully fix the problem. I have merged your fix for dictionary files with no trailing newlines, but have not merged the rest of that branch.

In the emoji for bn, some of the entries were longer than 128 bytes, so were overflowing dict_line. I added a check for that overflow in d5d980862e3a487db22f8ef7e761f0890f1b6bed and fixed that overflow in 2a00ca79f635d8e7979ad09846901d59a4ea4ee0, resulting in the issue you are seeing.

The espeak code stores the length of the line in the first byte, and is using char to point to that line buffer. If char is signed, this will result in negative values like you are seeing. I have addressed that in ebfa320956169e3419234b72fee51bd596867661 by casting the buffer to uint8_t before reading the value.


Github push to espeak-ng:espeak-ng #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

5 New Commits:

[espeak-ng:master] By Reef Turner <reef@...>:
0994206f710a: address buffer overrun when dict listx file has no trailling newline See ar_listx as an example.

Modified: src/libespeak-ng/compiledict.c


[espeak-ng:master] By Reece H. Dunn <msclrhd@...>:
921229259d58: Use int to store the value from GetFileLength. This fixes the clang warning: comparison of unsigned expression < 0 is always false [-Wtautological-compare] Reported by Reef Turner

Modified: src/libespeak-ng/dictionary.c
Modified: src/libespeak-ng/synthdata.c


[espeak-ng:master] By Reece H. Dunn <msclrhd@...>:
0383e3525a1a: Merge commit '0994206f710a4defc1eecfb78ab70ff57c58fcda'

Modified: src/libespeak-ng/compiledict.c


[espeak-ng:master] By Reece H. Dunn <msclrhd@...>:
e7ac4b819daa: hash_counts is never used, so remove it. Reported by Reef Turner

Modified: src/libespeak-ng/compiledict.c


[espeak-ng:master] By Reece H. Dunn <msclrhd@...>:
ebfa32095616: Fix storing the line length in the hash chain. The length is stored as the first byte in the output from compile_line. As the data pointer is char, if char is signed then length could be negative resulting in undefined behaviour. This commit fixes the issue by reading and writing that byte as a uint8_t. This bug was caused by 2a00ca79f635d8e7979ad09846901d59a4ea4ee0. Previously, the entries could only be a maximum of 128 bytes, and would not be negative on platforms with signed chars. That commit was made to support long emoji entries, especially for non-Latin languages where the utf-8 representations could be longer than 128 bytes. This change also adds some documentation to make it clearer what is going on. NOTE: The code should really be using actual struct objects instead of writing to opaque char buffers. Reported by Reef Turner

Modified: src/libespeak-ng/compiledict.c


[espeak-ng:master] new issue: Feature : Add phonemes/IPA output to the emscripten js library #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Issue Created by BenTalagan:
#289 Feature : Add phonemes/IPA output to the emscripten js library

Hello all,

One really nice feature would be imho to be able to retrieve phonemes for a synthesis thanks to the js library (in fact I am more specifically interested by the IPA output).

Is it hard to do ? It seems to me that everything is available since espeak_TextToPhonemes is exposed through the API right ? So it should just be converted to an asynchronous method belonging to the js worker then (by extending the existing glue) ?

Thanks!


Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

2 New Commits:

[espeak-ng:master] By Valdis Vitolins <valdis.vitolins@...>:
669b9f16c2f8: Improvements for Sindhi and Urdu Languages by Ejaz Shah

Modified: dictsource/sd_list
Modified: dictsource/sd_rules
Modified: dictsource/ur_list
Modified: dictsource/ur_rules


[espeak-ng:master] By Valdis Vitolins <valdis.vitolins@...>:
fb64332f6606: Merge branch 'master' of https://github.com/espeak-ng/espeak-ng

Modified: android/jni/Android.mk
Modified: dictsource/af_list
Modified: dictsource/af_rules
Modified: dictsource/en_list
Modified: dictsource/en_rules
Modified: dictsource/fa_list
Modified: dictsource/fa_rules
Modified: src/ucd-tools/.gitignore
Modified: src/ucd-tools/CHANGELOG.md
Modified: src/ucd-tools/Makefile.am
Modified: src/ucd-tools/configure.ac
Modified: src/ucd-tools/src/case.c
Modified: src/ucd-tools/src/categories.c
Modified: src/ucd-tools/src/ctype.c
Modified: src/ucd-tools/src/include/ucd/ucd.h
Modified: src/ucd-tools/src/proplist.c
Modified: src/ucd-tools/src/scripts.c
Modified: src/ucd-tools/src/tostring.c
Modified: src/ucd-tools/tests/printcdata.c
Modified: src/ucd-tools/tests/printucddata.c
Modified: src/ucd-tools/tools/case.py
Modified: src/ucd-tools/tools/categories.py
Modified: src/ucd-tools/tools/printdata.py
Modified: src/ucd-tools/tools/scripts.py


[espeak-ng/espeak-ng] Pull request closed by rhdunn:

#288 Improvements for Sindhi and Urdu Languages by Ejaz Shah


Pull Request Opened #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request opened by valdisvi:

#288 Improvements for Sindhi and Urdu Languages by Ejaz Shah


Re: Updated Dictionary Files for Sindhi and Urdu

Valdis Vitolins
 

Thanks for contribution!

You can review merged changes here:
https://github.com/valdisvi/espeak-ng/commit/669b9f16c2f84d3a6654c2fe82
06b8142642e8de

Valdis

I have made some more changes in the attached sd_list, sd_rules,
ur_list 
and ur_rules. I hope they can be included in eSpeak-ng.