Topics

Add-on issue - Getting text from a PDF

Stefano Bringhenti
 

Hello,

I am writing since I would like to get some insights about getting text from a PDF file. In particular, I would like that the add-on gets and changes spoken text from a PDF file while navigating it (through arrow keys). While the add-on perfectly works on any text editor (by omitting the default speach, using "event_caret" and a function which gets the text from the caret position, modifies it and then speaks it again), the same does not work for a PDF file, since the caret seems to not move at all while using the arrow keys. I think the problem could be solved using something different from the caret for PDF files, but I do not know how to change the add-on in order to let it works also for PDF files. If anyone knows how to solve the problem or some other add-on with a similar aim please write back.

Thanks in advance,
Stefano Bringhenti

Brian's Mail list account
 

The big issue on pdf files is how was it made and is it protected.
I'm not well versed in the coding myself, but I have been displaying pdfs in the pdf reader on the webbie site, and this is extremely simple and can cope with non protected text based tagged pdfs very well and one can cut and paste the text into any editor you like.
However if its a picture of text then it won't work and neither will the Adobe product, you nee to OCR it and even then you can end up with a reading order that is left to right top to bottom and if the reading order is changing inside a document it will never know, so garbage will be the result.



Brian

bglists@...
Sent via blueyonder.
Please address personal E-mail to:-
briang1@..., putting 'Brian Gaff'
in the display name field.
Newsgroup monitored: alt.comp.blind-users

----- Original Message -----
From: "Stefano Bringhenti" <stefano.bringhenti.work@...>
To: <nvda-devel@groups.io>
Sent: Tuesday, November 05, 2019 11:55 AM
Subject: [nvda-devel] Add-on issue - Getting text from a PDF


Hello,

I am writing since I would like to get some insights about getting text from a PDF file. In particular, I would like that the add-on gets and changes spoken text from a PDF file while navigating it (through arrow keys). While the add-on perfectly works on any text editor (by omitting the default speach, using "event_caret" and a function which gets the text from the caret position, modifies it and then speaks it again), the same does not work for a PDF file, since the caret seems to not move at all while using the arrow keys. I think the problem could be solved using something different from the caret for PDF files, but I do not know how to change the add-on in order to let it works also for PDF files. If anyone knows how to solve the problem or some other add-on with a similar aim please write back.

Thanks in advance,
Stefano Bringhenti

James Scholes
 

Modifying speech sequences is what the speech dictionary is for. It supports regular expressions to allow you to adapt it to your more advanced needs. PDF documents are often rendered inside a virtual buffer, making what you're trying to do potentially difficult. Is there a reason the speech dictionary won't work?

Regards,

James Scholes

On 05/11/2019 at 11:55 am, Stefano Bringhenti wrote:
Hello,
I am writing since I would like to get some insights about getting text from a PDF file. In particular, I would like that the add-on gets and changes spoken text from a PDF file while navigating it (through arrow keys). While the add-on perfectly works on any text editor (by omitting the default speach, using "event_caret" and a function which gets the text from the caret position, modifies it and then speaks it again), the same does not work for a PDF file, since the caret seems to not move at all while using the arrow keys. I think the problem could be solved using something different from the caret for PDF files, but I do not know how to change the add-on in order to let it works also for PDF files. If anyone knows how to solve the problem or some other add-on with a similar aim please write back.
Thanks in advance,
Stefano Bringhenti

Stefano Bringhenti
 

Thanks for your answers. Regarding the PDF, as far as I am concerned the particular file I am working on should have the text generating the document embedded, so there should be a way of getting such text  (and the problem is: how?). I will give a look to WebbIE, to understand if it can be useful to somehow solve the problem.  Concerning the speech dictionary, the problem is that the modifications on the text I would like to implement cannot be expressed in terms of regular expressions. I will see if it is somehow possible to deal with the virtual buffer.

Stefano Bringhenti