Topics

Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #694 Phone separator sometimes adds additional separators trailing the end of a word
By RhyanJohnson:

Also around the conjunction 'which': He wanted to see her about a plan **which** he had in his head. gives h*iː w*ɑː*n*t*ᵻ*d t*ə s*iː h*ɜː*ɹ ɐ*b*aʊ*t ɐ p*l*æ*n** w*ɪ*tʃ h*iː h*ɐ*d ɪ*n h*ɪ*z h*ɛ*d


[espeak-ng:master] New Comment on Issue #694 Phone separator sometimes adds additional separators trailing the end of a word
By RhyanJohnson:

Also around the conjunction 'which': He wanted to see her about a plan which he had in his head. gives h*iː w*ɑː*n*t*ᵻ*d t*ə s*iː h*ɜː*ɹ ɐ*b*aʊ*t ɐ p*l*æ*n** w*ɪ*tʃ h*iː h*ɐ*d ɪ*n h*ɪ*z h*ɛ*d

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng:master] New Comment on Issue #694 Phone separator sometimes adds additional separators trailing the end of a word
By RhyanJohnson:

Single trailing separator: editions of the evening papers gives ɪ*d*ɪ*ʃ*ə*n*z ʌ*v*ð*ɪ* iː*v*n*ɪ*ŋ p*eɪ*p*ɚ*z


[espeak-ng:master] New Comment on Issue #694 Phone separator sometimes adds additional separators trailing the end of a word
By RhyanJohnson:

Single trailing separator (after 'of the'): editions of the evening papers gives ɪ*d*ɪ*ʃ*ə*n*z ʌ*v*ð*ɪ* iː*v*n*ɪ*ŋ p*eɪ*p*ɚ*z


[espeak-ng:master] New Comment on Issue #694 Phone separator sometimes adds additional separators trailing the end of a word
By RhyanJohnson:

Single trailing separator (on phonemic 'of the'): editions of the evening papers gives ɪ*d*ɪ*ʃ*ə*n*z ʌ*v*ð*ɪ* iː*v*n*ɪ*ŋ p*eɪ*p*ɚ*z

espeak-ng@groups.io Integration <espeak-ng@...>
 

3 New Commits:

[espeak-ng:master] By John Bowler <jbowler@...>:
bfef0120683a: Fixed UTF8 BOM and consequent damage to !v files

The six modified files all had spurious characters introduced apparently
as a result of files with the UTF8 BOM marker, U-FFFE, which is
conventionally used at the start of text files to indicate a UTF-8 file
and is invisible under normal circumstances (e.g. the file is opened as
a text file).

None of these files are recognized by espeak-ng on Linux systems because
the 'language variant' line is seen by espeak-ng as starting with a new
character.

'gustave' is an uncorrupted file, it correctly starts with the BOM in
UTF-8 (three bytes), however even though it is correct espeak-ng does
not read it (this may be a separate bug!)

'marcelo' somehow got the BOM character replaced by a literal '?',
notice that 'git diff' on these changes will, indeed, show the removed
character in 'gustave' as a literal '?'. Notice also that the character
in question, the BOM, is actually the Unicode 'zero width no-break
space', so it is pretty invisible.

The remaining files seem to have suffered major corruption possible as a
result of dostounix style convertions. The line endings, normally <lf>
on Unix or <cr><lf> on Windows, had been converted to <cr><lf><lf> and
the BOM had been replaced by a <tab> character.

Signed-off-by: John Bowler <jbowler@...>

Modified: espeak-ng-data/voices/!v/Andy
Modified: espeak-ng-data/voices/!v/AnxiousAndy
Modified: espeak-ng-data/voices/!v/Lee
Modified: espeak-ng-data/voices/!v/gustave
Modified: espeak-ng-data/voices/!v/linda
Modified: espeak-ng-data/voices/!v/marcelo


[espeak-ng:master] By Valdis Vitolins <valdis.vitolins@...>:
53f65b51eb46: Add a test to ensure source files do not have Byte Order Mark

Added: tests/bom.test
Modified: .gitignore
Modified: Makefile.am


[espeak-ng:master] By Valdis Vitolins <valdis.vitolins@...>:
a99937f6c43e: Remove Byte Order Marks from source files

Modified: android/res/values-es-rUS/strings.xml
Modified: dictsource/af_list
Modified: dictsource/af_rules
Modified: dictsource/an_list
Modified: dictsource/bn_list
Modified: dictsource/bn_rules
Modified: dictsource/bs_list
Modified: dictsource/bs_rules
Modified: dictsource/da_list
Modified: dictsource/da_rules
Modified: dictsource/de_list
Modified: dictsource/de_rules
Modified: dictsource/en_list
Modified: dictsource/es_list
Modified: dictsource/fr_list
Modified: dictsource/gn_list
Modified: dictsource/hr_list
Modified: dictsource/hr_rules
Modified: dictsource/ht_list
Modified: dictsource/it_list
Modified: dictsource/it_listx
Modified: dictsource/ja_list
Modified: dictsource/ja_rules
Modified: dictsource/kn_list
Modified: dictsource/lfn_rules
Modified: dictsource/ne_list
Modified: dictsource/nl_list
Modified: dictsource/pa_list
Modified: dictsource/pa_rules
Modified: dictsource/pt_list
Modified: dictsource/sr_list
Modified: dictsource/sr_rules
Modified: espeak-ng-data/voices/!v/sandro
Modified: phsource/ph_italian
Modified: src/windows/data.vcxproj
Modified: src/windows/data.vcxproj.filters
Modified: src/windows/espeak-ng.sln
Modified: src/windows/espeak-ng.vcxproj
Modified: src/windows/espeak-ng.vcxproj.filters
Modified: src/windows/installer/installer.wixproj
Modified: src/windows/libespeak-ng.vcxproj
Modified: src/windows/libespeak-ng.vcxproj.filters


[espeak-ng/espeak-ng] Pull request closed by valdisvi:

#693 Fixed UTF8 BOM and consequent damage to !v files

The six modified files all had spurious characters introduced apparently as a result of files with the UTF8 BOM marker, U-FFFE, which is conventionally used at the start of text files to indicate a UTF-8 file and is invisible under normal circumstances (e.g. the file is opened as a text file).

None of these files are recognized by espeak-ng on Linux systems because the 'language variant' line is seen by espeak-ng as starting with a new character.

'gustave' is an uncorrupted file, it correctly starts with the BOM in UTF-8 (three bytes), however even though it is correct espeak-ng does not read it (this may be a separate bug!)

'marcelo' somehow got the BOM character replaced by a literal '?', notice that 'git diff' on these changes will, indeed, show the removed character in 'gustave' as a literal '?'. Notice also that the character in question, the BOM, is actually the Unicode 'zero width no-break space', so it is pretty invisible.

The remaining files seem to have suffered major corruption possible as a result of dostounix style convertions. The line endings, normally on Unix or on Windows, had been converted to and the BOM had been replaced by a character.

Signed-off-by: John Bowler jbowler@...


[espeak-ng:master] New Comment on Pull Request #693 Fixed UTF8 BOM and consequent damage to !v files
By valdisvi:

Thanks for contribution! Note that files you fixed were not only files with Byte order marks. I also added test for that and fixed other files.