I can't tell for sure but from the look
of those screenshots it appears that the lines are not 'ended
with " ', but in fact the closing double quote that is supposed to
terminate the quoted string values has at some point been
converted into a "curly quote" or "smart quote" instead of a true
double quote character. This means that the CSV parser does not
see the character as a proper terminator for the quoted string,
and thus complains that the quoted field has not been properly
terminated. If you convert the curly ” back into a normal " then
it should load properly.
"CSV" is a specific file format with
rules that the data must follow:
- Values are separated by the "column separator" (by default a
comma character)
- The "quote character" (by default double quote) must
be placed around any value that contains the column separator,
the quote character itself, or a line break. Items that do
not contain any of these may also be quoted but do not
have to be.
- Quote characters within a quoted string must be doubled
(i.e. this is a "quoted" example -> "this is a ""quoted""
example")
If what you have is not really a "CSV file" that follows these
quoting rules but just a text file with one item per line and no
line breaks within a single item, then you should still be able to
import it using the CSV populator with a little trick. You would
need to change the column separator and quote character to some
obscure Unicode characters that are guaranteed not to appear
anywhere in any of the actual values, such as \uE100 and \uE101 (a
couple of random characters I've pulled from the "private use"
area of the Unicode table - the column separator and quote
character boxes accept these \uNNNN escape sequences). That way
the CSV reader will see each line of the file as a single "column"
and not get confused by mismatched quotes.
Ian
On 25/09/2022 07:42, bluesunny wrote:
Hello Greenwood,
As you told me, I tried 'Populate from CSV file' on my
corpus and got an error message like below.
So, I tried to change 'Quote Character' from " to ' and
have succeeded to convert multiple rows of my csv file to
multiple separate documents, but the text was somewhat is
missing after the comma like this.
This is the whole text of the row that I mentioned in my
csv file. As you can see, the content is missing after the
first comma.
Can you suggest any solution for this? I found out that all
the text rows in the csv file ended with " but have no idea
how to solve this problem.
Thank you very much for your help.
Sincerely yours,
Yes, if you create a new document (i.e. right click on
language resources and choose to make a new document) then
you will always end up with a single document. If you want
to take a single file and produce multiple documents then
you need to first create a corpus, and then use a
populator of some form (populators usually turn up on the
right click menu of the corpus). The only one I know of
that will work with the data you have would be to use the
CSV populator as it specifically has an option for
creating one document per row,
Mark
On 22/09/2022 08:29, yr Noh wrote:
Dear Greenwood,
Thanks a lot for your reply.
Actually, I changed the 'csv' file into 'xlsx'
file format and loaded that file on LR and made a
corpus with it.
Each row (individual post) is distinguished
like this.
Is it necessary to use 'Format: CSV plugin' if
I want to make each row to separate documents?
Sincerely yours,
YR
I'm not sure how you loaded the CSV file into
GATE but if you use the CSV populator from the
"Format: CSV" plugin, then you can specify that
each row should be used to create a separate
document. You can find full details of how to do
this in the manual: https://gate.ac.uk/userguide/sec:creole:csv
Hope that helps,
Mark
Hi,
My project is about developing an ontology and
processing annotation on social postings using the
ontology.
I crawled hundreds of social posts in one csv file
format and procedded annotations using OntoRoot
gazetteer.
I want to analyze each social posts(each row)
individually, but the GATE developer seems to
recognize the csv file as a single document
Is there any waty to seperate rows in csv file?
Do I have to upload each single posts on LR
individually? (so time consuming..)
Please help me with this problem.
Thank you!
Sincerely,
YR
--
Ian Roberts | Department of Computer Science
i.roberts@... | University of Sheffield, UK
|
|
Hello Greenwood, As you told me, I tried 'Populate from CSV file' on my corpus and got an error message like below.
So, I tried to change 'Quote Character' from " to ' and have succeeded to convert multiple rows of my csv file to multiple separate documents, but the text was somewhat is missing after the comma like this.
This is the whole text of the row that I mentioned in my csv file. As you can see, the content is missing after the first comma. Can you suggest any solution for this? I found out that all the text rows in the csv file ended with " but have no idea how to solve this problem. Thank you very much for your help.
Sincerely yours,
toggle quoted message
Show quoted text
Yes, if you create a new document (i.e. right click on language
resources and choose to make a new document) then you will always
end up with a single document. If you want to take a single file
and produce multiple documents then you need to first create a
corpus, and then use a populator of some form (populators usually
turn up on the right click menu of the corpus). The only one I
know of that will work with the data you have would be to use the
CSV populator as it specifically has an option for creating one
document per row,
Mark
On 22/09/2022 08:29, yr Noh wrote:
Dear
Greenwood,
Thanks
a lot for your reply.
Actually,
I changed the 'csv' file into 'xlsx' file format and
loaded that file on LR and made a corpus with it.
Each
row (individual post) is distinguished like this.
Is
it necessary to use 'Format: CSV plugin' if I want to make
each row to separate documents?
Sincerely
yours,
YR
I'm not sure how you loaded the CSV file into GATE but if
you use the CSV populator from the "Format: CSV" plugin,
then you can specify that each row should be used to
create a separate document. You can find full details of
how to do this in the manual: https://gate.ac.uk/userguide/sec:creole:csv
Hope that helps,
Mark
Hi,
My project is about developing an ontology and processing
annotation on social postings using the ontology.
I crawled hundreds of social posts in one csv file format
and procedded annotations using OntoRoot gazetteer.
I want to analyze each social posts(each row)
individually, but the GATE developer seems to recognize
the csv file as a single document
Is there any waty to seperate rows in csv file?
Do I have to upload each single posts on LR individually?
(so time consuming..)
Please help me with this problem.
Thank you!
Sincerely,
YR
|
|
Yes, if you create a new document (i.e. right click on language
resources and choose to make a new document) then you will always
end up with a single document. If you want to take a single file
and produce multiple documents then you need to first create a
corpus, and then use a populator of some form (populators usually
turn up on the right click menu of the corpus). The only one I
know of that will work with the data you have would be to use the
CSV populator as it specifically has an option for creating one
document per row,
Mark
On 22/09/2022 08:29, yr Noh wrote:
toggle quoted message
Show quoted text
Dear
Greenwood,
Thanks
a lot for your reply.
Actually,
I changed the 'csv' file into 'xlsx' file format and
loaded that file on LR and made a corpus with it.
Each
row (individual post) is distinguished like this.
Is
it necessary to use 'Format: CSV plugin' if I want to make
each row to separate documents?
Sincerely
yours,
YR
I'm not sure how you loaded the CSV file into GATE but if
you use the CSV populator from the "Format: CSV" plugin,
then you can specify that each row should be used to
create a separate document. You can find full details of
how to do this in the manual: https://gate.ac.uk/userguide/sec:creole:csv
Hope that helps,
Mark
Hi,
My project is about developing an ontology and processing
annotation on social postings using the ontology.
I crawled hundreds of social posts in one csv file format
and procedded annotations using OntoRoot gazetteer.
I want to analyze each social posts(each row)
individually, but the GATE developer seems to recognize
the csv file as a single document
Is there any waty to seperate rows in csv file?
Do I have to upload each single posts on LR individually?
(so time consuming..)
Please help me with this problem.
Thank you!
Sincerely,
YR
|
|
Dear Greenwood,
Thanks a lot for your reply. Actually, I changed the 'csv' file into 'xlsx' file format and loaded that file on LR and made a corpus with it. Each row (individual post) is distinguished like this.
 Is it necessary to use 'Format: CSV plugin' if I want to make each row to separate documents?
Sincerely yours, YR
toggle quoted message
Show quoted text
I'm not sure how you loaded the CSV file into GATE but if you use
the CSV populator from the "Format: CSV" plugin, then you can
specify that each row should be used to create a separate
document. You can find full details of how to do this in the
manual: https://gate.ac.uk/userguide/sec:creole:csv
Hope that helps,
Mark
Hi,
My project is about developing an ontology and processing
annotation on social postings using the ontology.
I crawled hundreds of social posts in one csv file format and
procedded annotations using OntoRoot gazetteer.
I want to analyze each social posts(each row) individually, but
the GATE developer seems to recognize the csv file as a single
document
Is there any waty to seperate rows in csv file?
Do I have to upload each single posts on LR individually? (so time
consuming..)
Please help me with this problem.
Thank you!
Sincerely,
YR
|
|
I'm not sure how you loaded the CSV file into GATE but if you use
the CSV populator from the "Format: CSV" plugin, then you can
specify that each row should be used to create a separate
document. You can find full details of how to do this in the
manual: https://gate.ac.uk/userguide/sec:creole:csv
Hope that helps,
Mark
toggle quoted message
Show quoted text
Hi,
My project is about developing an ontology and processing
annotation on social postings using the ontology.
I crawled hundreds of social posts in one csv file format and
procedded annotations using OntoRoot gazetteer.
I want to analyze each social posts(each row) individually, but
the GATE developer seems to recognize the csv file as a single
document
Is there any waty to seperate rows in csv file?
Do I have to upload each single posts on LR individually? (so time
consuming..)
Please help me with this problem.
Thank you!
Sincerely,
YR
|
|
Hi, My project is about developing an ontology and processing annotation on social postings using the ontology. I crawled hundreds of social posts in one csv file format and procedded annotations using OntoRoot gazetteer. I want to analyze each social posts(each row) individually, but the GATE developer seems to recognize the csv file as a single document Is there any waty to seperate rows in csv file? Do I have to upload each single posts on LR individually? (so time consuming..) Please help me with this problem. Thank you!
Sincerely, YR
|
|