Topics

Convert .trs to .TextGrid?

Antje Schweitzer
 

Dear list,

I've recently come across a corpus which provides annotations in a
format called Transcriber format (filename extension is .trs).

I found a perl script called trans2praat.pl which is supposed to be able
to convert .trs to TextGrids here:
http://wwwhomes.uni-bielefeld.de/~gibbon/EGA/Tools/index.html, but it is
from 2007 and does not work as expected (produces empty tiers).

Does anyone happen to know how to best convert .trs to praat TextGrids?

Any help or hints would be greatly appreciated!

Best regards,
Antje


Here's some more information in case you might be able to help:

The Transcriber format seems to code considerable information including
speaker identities, turn structure, transcriptions of speech and other
non-speech events including time stamps.

The project which utilizes (maybe even has come up with?) the .trs
format seems to be the Transcriber software:

https://sourceforge.net/projects/trans/?source=typ_redirect

I would like to convert the .trs files to TextGrids, retaining as much
information as possible (at least the transcription of the speech
including times).

Here's the content of one such example .trs file, from a database called
N4 database:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Trans SYSTEM "trans-13.dtd">
<Trans scribe="(unknown)" audio_filename="de_0001" version="2"
version_date="020227">
<Speakers>
<Speaker id="spk1" name="DE_1_A2W" check="no" type="male"
dialect="nonnative" accent="" scope="global"/>
</Speakers>
<Episode>
<Section type="report" startTime="0" endTime="3.6006875">
<Turn startTime="0" endTime="3.6006875" speaker="spk1" mode="planned"
fidelity="high" channel="studio">
<Sync time="0"/>

<Event desc="en" type="language" extent="begin"/>
this is alfa two whisky roger out
</Turn>
</Section>
</Episode>
</Trans>

For this file (and all others I've tried), the trans2praat.pl script
outputs an empty TextGrid.

Thanks for any help!


--
Dr. Antje Schweitzer
IMS Uni Stuttgart
0711-685 81376
http://www.ims.uni-stuttgart.de/~schweitz

Daniel Hirst
 

Antje

You might like to take a look at the SPPAS home page: http://www.sppas.org/

SPPAS is free, open-source software for aligning speech with transcriptions but it also has a set of tools for converting between different formats - including .trs and .praat

best

daniel

ps - let us know if it works!!

On 15 May 2018, at 11:24, Antje Schweitzer antje.schweitzer@... [praat-users] <praat-users-noreply@...> wrote:


Dear list,

I've recently come across a corpus which provides annotations in a
format called Transcriber format (filename extension is .trs).

I found a perl script called trans2praat.pl which is supposed to be able
to convert .trs to TextGrids here:
http://wwwhomes.uni-bielefeld.de/~gibbon/EGA/Tools/index.html, but it is
from 2007 and does not work as expected (produces empty tiers).

Does anyone happen to know how to best convert .trs to praat TextGrids?

Any help or hints would be greatly appreciated!

Best regards,
Antje

Here's some more information in case you might be able to help:

The Transcriber format seems to code considerable information including
speaker identities, turn structure, transcriptions of speech and other
non-speech events including time stamps.

The project which utilizes (maybe even has come up with?) the .trs
format seems to be the Transcriber software:

https://sourceforge.net/projects/trans/?source=typ_redirect

I would like to convert the .trs files to TextGrids, retaining as much
information as possible (at least the transcription of the speech
including times).

Here's the content of one such example .trs file, from a database called
N4 database:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Trans SYSTEM "trans-13.dtd">
<Trans scribe="(unknown)" audio_filename="de_0001" version="2"
version_date="020227">
<Speakers>
<Speaker id="spk1" name="DE_1_A2W" check="no" type="male"
dialect="nonnative" accent="" scope="global"/>
</Speakers>
<Episode>
<Section type="report" startTime="0" endTime="3.6006875">
<Turn startTime="0" endTime="3.6006875" speaker="spk1" mode="planned"
fidelity="high" channel="studio">
<Sync time="0"/>

<Event desc="en" type="language" extent="begin"/>
this is alfa two whisky roger out
</Turn>
</Section>
</Episode>
</Trans>

For this file (and all others I've tried), the trans2praat.pl script
outputs an empty TextGrid.

Thanks for any help!

--
Dr. Antje Schweitzer
IMS Uni Stuttgart
0711-685 81376
http://www.ims.uni-stuttgart.de/~schweitz

Thorsten Brato
 

Dear Antje,



Alternatively to SPPAS you could also try ELAN, which can import from
Transcriber (http://www.mpi.nl/corpus/html/elan/ch04s03s01.html) and export
to Praat.



Best wishes,

Thorsten



____________________

Dr. Thorsten Brato

Universität Regensburg

Institut für Anglistik und Amerikanistik

Englische Sprachwissenschaft

Universitätsstraße 31

93053 Regensburg

Tel.: +49 941 943 3503






[Non-text portions of this message have been removed]

Robert Fromont
 

You could also try this Java program:

I wrote it quite a long time ago, but it seems to still work.

Good luck!
Robert.

Antje Schweitzer
 

Dear praat users,

thanks for your support regarding my question on how to best convert
.trs files to .TextGrid files! Here's a summary of the options that were
suggested to me and that I tested. I tested them on .trs files from the
N4 (NATO Native and Non Native) database only, so no guarantee that they
work for other .trs files!

Option 1: SPPAS http://www.sppas.org/.

GUI-based tool for annotation and analysis and more that supports import
and export of a variety of formats (TextGrid, PitchTier, IntensityTier,
Elan files, Annotation Pro files, Phonedit files, Sclite files, HTK,
Subtitles, csv, and txt, and can also import ANVIL, Xtrans, and, most
important to me today: Transcriber .trs files).

The developer version can already handle the .trs files from the N4
database; the next stable version 1.9.6 including this is expected in a
few days. Thanks to Daniel Hirst for pointing me there, and thanks to
the developer Brigitte Bigi, who was extremely quick in adapting the
tool so it now successfully converts all .trs files from the N4 database.

Big advantage of SPPAS for my use case: it can read all .trs files from
a directory in one go and also export them all to .TextGrid in one go,
no need to manually convert single files.

Option 2:

LaBB-CAT
https://sourceforge.net/projects/labbcat/files/utilities/trs2grid.jar/download

Thanks to Robert Fromont for this hint. (And for writing the tool, too).
Can be run as a GUI, but, more convenient for my use case, as a shell
command to process many files at once, like this:

java -jar trs2grid.jar *.trs

Worked like a charm for me, and the command line version is super easy
and convenient. No installation needed, just the .jar file to download.

Option 3:

ELAN http://www.mpi.nl/corpus/html/elan/

Thanks to Thorsten Brato for this hint. ELAN does indeed correctly
convert my .trs files. However I am not sure if it is possible to
convert many files at once. From what I saw, each .trs file has to be
opened together with the accompanying media file before it is possible
to convert. Opening many files at once did not work. But single files
were correctly exported to .TextGrid.

That's it! Hope this is useful to some of you. Thanks for pointing me to
these tools!

Best regards,
Antje

 


Dear list,

I've recently come across a corpus which provides annotations in a
format called Transcriber format (filename extension is .trs).

I found a perl script called trans2praat.pl which is supposed to be able
to convert .trs to TextGrids here:
http://wwwhomes.uni-bielefeld.de/~gibbon/EGA/Tools/index.html, but it is
from 2007 and does not work as expected (produces empty tiers).

Does anyone happen to know how to best convert .trs to praat TextGrids?

Any help or hints would be greatly appreciated!

Best regards,
Antje

Here's some more information in case you might be able to help:

The Transcriber format seems to code considerable information including
speaker identities, turn structure, transcriptions of speech and other
non-speech events including time stamps.

The project which utilizes (maybe even has come up with?) the .trs
format seems to be the Transcriber software:

https://sourceforge.net/projects/trans/?source=typ_redirect

I would like to convert the .trs files to TextGrids, retaining as much
information as possible (at least the transcription of the speech
including times).

Here's the content of one such example .trs file, from a database called
N4 database:

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE Trans SYSTEM "trans-13.dtd">
<Trans scribe="(unknown)" audio_filename="de_0001" version="2"
version_date="020227">
<Speakers>
<Speaker id="spk1" name="DE_1_A2W" check="no" type="male"
dialect="nonnative" accent="" scope="global"/>
</Speakers>
<Episode>
<Section type="report" startTime="0" endTime="3.6006875">
<Turn startTime="0" endTime="3.6006875" speaker="spk1" mode="planned"
fidelity="high" channel="studio">
<Sync time="0"/>

<Event desc="en" type="language" extent="begin"/>
this is alfa two whisky roger out
</Turn>
</Section>
</Episode>
</Trans>

For this file (and all others I've tried), the trans2praat.pl script
outputs an empty TextGrid.

Thanks for any help!

--
Dr. Antje Schweitzer
IMS Uni Stuttgart
0711-685 81376
http://www.ims.uni-stuttgart.de/~schweitz

--
Dr. Antje Schweitzer
IMS Uni Stuttgart
0711-685 81376
http://www.ims.uni-stuttgart.de/~schweitz