Updates to Github #github

espeak-ng@groups.io Integration <espeak-ng@...>
 

[espeak-ng/espeak-ng] Pull request opened by TediPapajorgji:

#729 formatted -X output in an easier way to parse by phonemizer

So the whole idea here is that we need to be able to map phonemized words back to their original text. When running -X with espeak, the original output looked like this:

tedi@Tedis-MacBook-Pro bin % ./espeak-ng  "this is a to be test" -X
Found: 'this' [DIs]  $u+ $strend $verbsf $nounf
Found: 'is' [Iz]   $pastf $only
Flags:  a   $nounf
Translate 'a'
  1	a        [a]
 26	_) a (_  [a#]

Found: 'to be
' [t@bi]   $pastf
Translate 'test'
  1	t        [t]

  1	e        [E]

  1	s        [s]

  1	t        [t]

 DIs Iz a# t@bi t'Est
tedi@Tedis-MacBook-Pro bin %

The new output will look like this:

tedi@Tedis-MacBook-Pro bin % ./espeak-ng "this is a to be test" -X
this~|||~DIs
is~|||~Iz
to be~|||~t@bi

The modified output is much easier to parse by the python wrapper contained inside Phonemizer because Phonemizer can now read in each line from the espeak-ng standard output, then split each line using the ~|||~ delimiter (e.g. string.split("~|||~") ). The text before the the delimiter is input text, while the text after the delimiter is ASCII IPA phonemized text, therefore the goal of being able to map phonemized text back to its original text has been achieved.

This is required for SSML processing inside Resembletron. The reason being that users apply tags to whole text words, the text words are then phonemized - if there is a 1 to 1 relationship between # of text words to # of phonemized words, then the tags can be translated easily. (E.g. hello, after phonemization becomes həlˈəʊ). Now assume the case where there is not a 1 to 1 relationship between # of text words to # of phonemised words, then a mapping of phonemized words to text words is required (which is now possible with this modification) For example, the words "to be" become one phonemized word "təbi" in certain sentences - with this word to phonemized word mapping, we can identify the original phonemized text.


[espeak-ng:master] New Comment on Pull Request #729 formatted -X output in an easier way to parse by phonemizer
By TediPapajorgji:

Sorry.... wrong repository...


[espeak-ng/espeak-ng] Pull request closed by TediPapajorgji:

#729 formatted -X output in an easier way to parse by phonemizer

So the whole idea here is that we need to be able to map phonemized words back to their original text. When running -X with espeak, the original output looked like this:

tedi@Tedis-MacBook-Pro bin % ./espeak-ng  "this is a to be test" -X
Found: 'this' [DIs]  $u+ $strend $verbsf $nounf
Found: 'is' [Iz]   $pastf $only
Flags:  a   $nounf
Translate 'a'
  1	a        [a]
 26	_) a (_  [a#]

Found: 'to be
' [t@bi]   $pastf
Translate 'test'
  1	t        [t]

  1	e        [E]

  1	s        [s]

  1	t        [t]

 DIs Iz a# t@bi t'Est
tedi@Tedis-MacBook-Pro bin %

The new output will look like this:

tedi@Tedis-MacBook-Pro bin % ./espeak-ng "this is a to be test" -X
this~|||~DIs
is~|||~Iz
to be~|||~t@bi

The modified output is much easier to parse by the python wrapper contained inside Phonemizer because Phonemizer can now read in each line from the espeak-ng standard output, then split each line using the ~|||~ delimiter (e.g. string.split("~|||~") ). The text before the the delimiter is input text, while the text after the delimiter is ASCII IPA phonemized text, therefore the goal of being able to map phonemized text back to its original text has been achieved.

This is required for SSML processing inside Resembletron. The reason being that users apply tags to whole text words, the text words are then phonemized - if there is a 1 to 1 relationship between # of text words to # of phonemized words, then the tags can be translated easily. (E.g. hello, after phonemization becomes həlˈəʊ). Now assume the case where there is not a 1 to 1 relationship between # of text words to # of phonemised words, then a mapping of phonemized words to text words is required (which is now possible with this modification) For example, the words "to be" become one phonemized word "təbi" in certain sentences - with this word to phonemized word mapping, we can identify the original phonemized text.


[espeak-ng/espeak-ng] Pull request updated by TediPapajorgji:

#729 formatted -X output in an easier way to parse by phonemizer

WRONG REPOSITORY SORRY!

So the whole idea here is that we need to be able to map phonemized words back to their original text. When running -X with espeak, the original output looked like this:

tedi@Tedis-MacBook-Pro bin % ./espeak-ng  "this is a to be test" -X
Found: 'this' [DIs]  $u+ $strend $verbsf $nounf
Found: 'is' [Iz]   $pastf $only
Flags:  a   $nounf
Translate 'a'
  1	a        [a]
 26	_) a (_  [a#]

Found: 'to be
' [t@bi]   $pastf
Translate 'test'
  1	t        [t]

  1	e        [E]

  1	s        [s]

  1	t        [t]

 DIs Iz a# t@bi t'Est
tedi@Tedis-MacBook-Pro bin %

The new output will look like this:

tedi@Tedis-MacBook-Pro bin % ./espeak-ng "this is a to be test" -X
this~|||~DIs
is~|||~Iz
to be~|||~t@bi

The modified output is much easier to parse by the python wrapper contained inside Phonemizer because Phonemizer can now read in each line from the espeak-ng standard output, then split each line using the ~|||~ delimiter (e.g. string.split("~|||~") ). The text before the the delimiter is input text, while the text after the delimiter is ASCII IPA phonemized text, therefore the goal of being able to map phonemized text back to its original text has been achieved.

This is required for SSML processing inside Resembletron. The reason being that users apply tags to whole text words, the text words are then phonemized - if there is a 1 to 1 relationship between # of text words to # of phonemized words, then the tags can be translated easily. (E.g. hello, after phonemization becomes həlˈəʊ). Now assume the case where there is not a 1 to 1 relationship between # of text words to # of phonemised words, then a mapping of phonemized words to text words is required (which is now possible with this modification) For example, the words "to be" become one phonemized word "təbi" in certain sentences - with this word to phonemized word mapping, we can identify the original phonemized text.

Join espeak-ng@groups.io to automatically receive all group messages.