Topics

Introducing our Google Summer of Code 2020 student: Shubham Jain


Noelia Ruiz
 

Hello:

Imo, the approach chosen in the current project is good. For me,
trying to recognize images and objects is not contradictory with
common or less frequent interfaces, since images can be associated or
contain long pieces of text (which we could understand as a common
interface like readonly edit box or a document), with links, buttons
(for example sometimes OCR can be used to activate buttons of a dialog
in an inaccessible program), or even several of these kind of
controls, such as image maps in webpages.
I think that developers need to provide semantic information and
screen readers can use this throught known apis and standards like
HTML, to show headings for example, so we can navigate with h key.
Of course images, the same as audio transcriptions and maybe other
elements, need to be correctly described and human and manual
activities are likely required to produce an accurate result to be
presented in screen readers. But this is not always possible, so I
think that this project goes in the right way, since this doesn't mean
that images are not used as commont elements of interfaces. I think
images and its objects and text can be contained in these elements, so
this is not a separate or a different thing respect common elements,
but a subset of them and text is better recognized, so detecting the
content of images for me is right.
Regards

2020-05-09 17:09 GMT+02:00, Rui Fontes <rui.fontes@...>:

I agree! Such technic and, perhaps an attempt to decipher the control
name given by the developer, will make the use of some programs much
easier!


Rui Fontes


Às 12:56 de 09/05/2020, Marlon Brandão de Sousa escreveu:

I would focus instead on common interface elements recognition, for
example buttons, checkboxes, label associations and etc.


Although less glamorous with the final user, for me screen readers
will have to use this approach sooner or latter because nobody can
keep up with the pace of technology and accessibility will be each
time more broken in the sense that the time new technology arises and
the time they keep up with accessibility before being replaced by
newer technology is inversely proportional, which means that the time
between one technology becoming accessibility mature and being
replaced by newer imature technology will be each time smaller while
the time for new technology to become mature in terms of accessibility
will be equals or greater than it is today, given that more resources
tend to be allocated in new stuff development than in becoming current
stuff mature.


This is a marketing tendency and there is nothing we can do about it,
think about how accessibility and usability as a whole has decreased
in Apple systems because the pressure to release new features is
imposed by the marketing and each time greater.


Today Microsoft is spending lots of resources in accessibility. This
has made lives of screen readers for Windows easier than ever, but who
knows how much time this will least. It might be forever, it might be
for six months before the company redirects efforts to other
priorities. What if a foreign company arises and starts imposing
pressure for new stuff on Microsoft for Windows matters, just like the
marketing is moving faster and faster on the mobile arena?


Fact of life is the only thing we can assume that will be considered
are the visual interfaces for sighted people. These will never become
inaccessible to the sighted for obvious reasons and my understanding
is that they are standardized enough to be recognizable (a button is
relatively the same in qt, gtk, win32, windows forms or exposed by a
remote desktop screen) because people can recognize it as a button and
when it is clicked it behaves like a button. If sighted people can
recognize it as a button, then should image recognition IA, because
unless screen readers start to use a IA approach they won't be able to
resist in the long run.


Of course this doesn't solve all the possible problems, system focus,
context information, OS events and such wouldn't be but at least one
could focus more on scripts to correlation stuff than on querying apps
to extract visual element descriptions, which ultimately depends on
developers that, history shows, are usually either because they lack
knowledge, resources or will, unable to keep up in time.


On 05/05/2020 15:49, @ShubhamJain wrote:
Thank you for the introduction Reef!

I am very excited to be working on this project and getting to know
the community better! As Reef mentioned, you can find details about
the project at the above link or you can just contact me.

The ultimate goal of this project is to help and benefit the
community and users, and so, I would love any and all feedback, tips
and guidance you might have to offer!
It will not be possible for this project to be a success without your
input!

Looking forward to working with all of you!

regards,
Shubham Jain



Rui Fontes
 

I agree! Such technic and, perhaps an attempt to decipher the control name given by the developer, will make the use of some programs much easier!


Rui Fontes


Às 12:56 de 09/05/2020, Marlon Brandão de Sousa escreveu:

I would focus instead on common interface elements recognition, for example buttons, checkboxes, label associations and etc.


Although less glamorous with the final user, for me screen readers will have to use this approach sooner or latter because nobody can keep up with the pace of technology and accessibility will be each time more broken in the sense that the time new technology arises and the time they keep up with accessibility before being replaced by newer technology is inversely proportional, which means that the time between one technology becoming accessibility mature and being replaced by newer imature technology will be each time smaller while the time for new technology to become mature in terms of accessibility will be equals or greater than it is today, given that more resources tend to be allocated in new stuff development than in becoming current stuff mature.


This is a marketing tendency and there is nothing we can do about it, think about how accessibility and usability as a whole has decreased in Apple systems because the pressure to release new features is imposed by the marketing and each time greater.


Today Microsoft is spending lots of resources in accessibility. This has made lives of screen readers for Windows easier than ever, but who knows how much time this will least. It might be forever, it might be for six months before the company redirects efforts to other priorities. What if a foreign company arises and starts imposing pressure for new stuff on Microsoft for Windows matters, just like the marketing is moving faster and faster on the mobile arena?


Fact of life is the only thing we can assume that will be considered are the visual interfaces for sighted people. These will never become inaccessible to the sighted for obvious reasons and my understanding is that they are standardized enough to be recognizable (a button is relatively the same in qt, gtk, win32, windows forms or exposed by a remote desktop screen) because people can recognize it as a button and when it is clicked it behaves like a button. If sighted people can recognize it as a button, then should image recognition IA, because unless screen readers start to use a IA approach they won't be able to resist in the long run.


Of course this doesn't solve all the possible problems, system focus, context information, OS events and such wouldn't be but at least one could focus more on scripts to correlation stuff than on querying apps to extract visual element descriptions, which ultimately depends on developers that, history shows, are usually either because they lack knowledge, resources or will, unable to keep up in time.


On 05/05/2020 15:49, shubhamdjain7@... wrote:
Thank you for the introduction Reef!

I am very excited to be working on this project and getting to know the community better! As Reef mentioned, you can find details about the project at the above link or you can just contact me.

The ultimate goal of this project is to help and benefit the community and users, and so, I would love any and all feedback, tips and guidance you might have to offer!
It will not be possible for this project to be a success without your input!

Looking forward to working with all of you!

regards,
Shubham Jain


Marlon Brandão de Sousa
 

I would focus instead on common interface elements recognition, for example buttons, checkboxes, label associations and etc.


Although less glamorous with the final user, for me screen readers will have to use this approach sooner or latter because nobody can keep up with the pace of technology and accessibility will be each time more broken in the sense that the time new technology arises and the time they keep up with accessibility before being replaced by newer technology is inversely proportional, which means that the time between one technology becoming accessibility mature and being replaced by newer imature technology will be each time smaller while the time for new technology to become mature in terms of accessibility will be equals or greater than it is today, given that more resources tend to be allocated in new stuff development than in becoming current stuff mature.


This is a marketing tendency and there is nothing we can do about it, think about how accessibility and usability as a whole has decreased in Apple systems because the pressure to release new features is imposed by the marketing and each time greater.


Today Microsoft is spending lots of resources in accessibility. This has made lives of screen readers for Windows easier than ever, but who knows how much time this will least. It might be forever, it might be for six months before the company redirects efforts to other priorities. What if a foreign company arises and starts imposing pressure for new stuff on Microsoft for Windows matters, just like the marketing is moving faster and faster on the mobile arena?


Fact of life is the only thing we can assume that will be considered are the visual interfaces for sighted people. These will never become inaccessible to the sighted for obvious reasons and my understanding is that they are standardized enough to be recognizable (a button is relatively the same in qt, gtk, win32, windows forms or exposed by a remote desktop screen) because people can recognize it as a button and when it is clicked it behaves like a button. If sighted people can recognize it as a button, then should image recognition IA, because unless screen readers start to use a IA approach they won't be able to resist in the long run.


Of course this doesn't solve all the possible problems, system focus, context information, OS events and such wouldn't be but at least one could focus more on scripts to correlation stuff than on querying apps to extract visual element descriptions, which ultimately depends on developers that, history shows, are usually either because they lack knowledge, resources or will, unable to keep up in time.


On 05/05/2020 15:49, shubhamdjain7@... wrote:
Thank you for the introduction Reef!

I am very excited to be working on this project and getting to know the community better! As Reef mentioned, you can find details about the project at the above link or you can just contact me.

The ultimate goal of this project is to help and benefit the community and users, and so, I would love any and all feedback, tips and guidance you might have to offer!
It will not be possible for this project to be a success without your input!

Looking forward to working with all of you!

regards,
Shubham Jain


 

Hi,

His name is Oliver and email address is oliver.edholm@.... Also, if you want to introduce yourself to add-ons community, the NVDA Add-ons list can be found at:

https://nvda-addons.groups.io/g/nvda-addons

 

I advise talking to Oliver first so he can give you some feedback about the proposal.

Cheers,

Joseph

 

From: nvda-devel@groups.io <nvda-devel@groups.io> On Behalf Of Shubham Jain
Sent: Thursday, May 7, 2020 10:54 AM
To: nvda-devel@groups.io
Subject: Re: [nvda-devel] Introducing our Google Summer of Code 2020 student: Shubham Jain

 

Hello again Joseph,

I would love to talk to the author of the add-on! I would be very grateful if you could connect me to him.
Thank you so much!

regards,
Shubham Jain

_._,_._,_


Groups.io Links:

You receive all messages sent to this group.

View/Reply Online (#45022) | Reply To Group | Reply To Sender | Mute This Topic | New Topic

Your Subscription | Contact Group Owner | Unsubscribe [joseph.lee22590@...]

_._,_


Shubham Jain
 

Hello again Joseph,

I would love to talk to the author of the add-on! I would be very grateful if you could connect me to him.
Thank you so much!

regards,
Shubham Jain


 

Hi,

I see. In this case, would you like to talk to the author of this add-on (unless you already talked to the author) to get some advice and early feedback? If yes, I’ll forward this conversation to NVDA add-ons list.

Cheers,

Joseph

 

From: nvda-devel@groups.io <nvda-devel@groups.io> On Behalf Of Shubham Jain
Sent: Thursday, May 7, 2020 10:41 AM
To: nvda-devel@groups.io
Subject: Re: [nvda-devel] Introducing our Google Summer of Code 2020 student: Shubham Jain

 

Hello Joseph!

Image Describer is a big inspiration for this project.


Shubham Jain
 

Hello Joseph!

Image Describer is a big inspiration for this project.


Shubham Jain
 

Hi Pawel!

Yes, chrome already supports getting image descriptions and it is amazing! Given the amount of resources companies like Google has at their disposal, it is impossible for our models to be as good but I think it is a step in the right direction. We can always improve any implementation in the future when computation is easier. Besides, it is always best to not rely on 3rd party services as their terms might change. With an inbuilt implementation, we can provide this function in not just chrome but other browsers  and with local photo-viewing apps!


 

Hi,
Image Describer used to be a useful NVDA add-on for this purpose until things changed.
Cheers,
Joseph

-----Original Message-----
From: nvda-devel@groups.io <nvda-devel@groups.io> On Behalf Of Pawel Urbanski
Sent: Thursday, May 7, 2020 7:05 AM
To: nvda-devel@groups.io
Subject: Re: [nvda-devel] Introducing our Google Summer of Code 2020 student: Shubham Jain

If only Chrome did not have this feature already. You just tell it to automatically get descriptions for images that have no alt attribute specified...


On 07/05/2020, enes sarıbaş <enes.saribas@...> wrote:
Hello Shubham,

Even a few cases I believe could possibly cover all or most of the
capchas outside of google with no audio. Therefore, I believe this
would make the lives of NVDA users even better with such a feature.

I wish you the best on your project

Enes

On 5/7/2020 1:47 AM, Shubham Jain wrote:
Hello Enes,

Thank you for your interest in my project!
There has been a lot of research and work on solving captchas using
ML techniques and it is certainly possible. However, these models are
specific to the types of captchas they have been trained on and will
not work with other types. I understand that having it work in only a
few specific cases is still a big advantage. Hopefully, Google's
Captcha v3 will do away with this problem entirely, as it does not
rely on any explicit user input.
Yours is a brilliant idea nonetheless and definitely one to look into!

regards,
Shubham Jain



Pawel Urbanski
 

If only Chrome did not have this feature already. You just tell it to
automatically get descriptions for images that have no alt attribute
specified...

On 07/05/2020, enes sarıbaş <enes.saribas@...> wrote:
Hello Shubham,

Even a few cases I believe could possibly cover all or most of the
capchas outside of google with no audio. Therefore, I believe this would
make the lives of NVDA users even better with such a feature.

I wish you the best on your project

Enes

On 5/7/2020 1:47 AM, Shubham Jain wrote:
Hello Enes,

Thank you for your interest in my project!
There has been a lot of research and work on solving captchas using ML
techniques and it is certainly possible. However, these models are
specific to the types of captchas they have been trained on and will
not work with other types. I understand that having it work in only a
few specific cases is still a big advantage. Hopefully, Google's
Captcha v3 will do away with this problem entirely, as it does not
rely on any explicit user input.
Yours is a brilliant idea nonetheless and definitely one to look into!

regards,
Shubham Jain



enes sarıbaş
 

Hello Shubham,

Even a few cases I believe could possibly cover all or most of the capchas outside of google with no audio. Therefore, I believe this would make the lives of NVDA users even better with such a feature.

I wish you the best on your project

Enes

On 5/7/2020 1:47 AM, Shubham Jain wrote:
Hello Enes,

Thank you for your interest in my project!
There has been a lot of research and work on solving captchas using ML techniques and it is certainly possible. However, these models are specific to the types of captchas they have been trained on and will not work with other types. I understand that having it work in only a few specific cases is still a big advantage. Hopefully, Google's Captcha v3 will do away with this problem entirely, as it does not rely on any explicit user input.
Yours is a brilliant idea nonetheless and definitely one to look into!

regards,
Shubham Jain


Shubham Jain
 

Hello Enes,

Thank you for your interest in my project!
There has been a lot of research and work on solving captchas using ML techniques and it is certainly possible. However, these models are specific to the types of captchas they have been trained on and will not work with other types. I understand that having it work in only a few specific cases is still a big advantage. Hopefully, Google's Captcha v3 will do away with this problem entirely, as it does not rely on any explicit user input.
Yours is a brilliant idea nonetheless and definitely one to look into!

regards,
Shubham Jain


enes sarıbaş
 

Hello Shubham,

Your project is very interesting, and will be a game changer for NVDA users' access to information. I apologize if I am too overeager, but I would like to know if such an image recognition system could be utilized to solve capchas  that cannot be solved due to the inavailability of audio cues.


On 5/5/2020 2:00 PM, shubhamdjain7@... wrote:
Hello Noelia!

I am glad that you find my project interesting :)
It will certainly be a challenge to detect all images and generate useful and correct descriptions for them on browsers. I wish that this project will be useful and comfortable to use, just like the existing features.
The links you have shared are very helpful, thank you so much!

Looking forward to more collaboration with you.

regards,
Shubham Jain


Shubham Jain
 

Hello Noelia!

I am glad that you find my project interesting :)
It will certainly be a challenge to detect all images and generate useful and correct descriptions for them on browsers. I wish that this project will be useful and comfortable to use, just like the existing features.
The links you have shared are very helpful, thank you so much!

Looking forward to more collaboration with you.

regards,
Shubham Jain


Shubham Jain
 

Thank you for the introduction Reef!

I am very excited to be working on this project and getting to know the community better! As Reef mentioned, you can find details about the project at the above link or you can just contact me.

The ultimate goal of this project is to help and benefit the community and users, and so, I would love any and all feedback, tips and guidance you might have to offer!
It will not be possible for this project to be a success without your input!

Looking forward to working with all of you!

regards,
Shubham Jain


Noelia Ruiz
 

Hello Shubham, I am a spanish user of NVDA and your project is really
interesting. Imo, NVDA's interface (the possibility of using
customized gestures to display messages in different ways like braille
and speech) is comfortable and useful, and knowing the meaning of
images is very important. Of course they should be described by people
in most cases.
Imo a possible challenge maybe the possibility to detect images
without an alt text so that we can request them to be described via
NVDA.
I followed a course about web accessibility recommended by Quentin in
the In-Process blog of NV Access and there I learned about Wave, an
extension for web browsers which can report the presence of images not
detected by screen readers (when they don't have an alternative text):
https://wave.webaim.org/extension/

I had this difficulty using a wonderful add-on created time ago (not
maintained as far as I know) by Larry Wang:
https://addons.nvda-project.org/addons/onlineOCR.en.html
Hope you enjoy and thanks for your interest in these projects.

2020-05-05 17:44 GMT+02:00, Reef Turner <reef@...>:

Hi all,

On behalf of NV Access I would like to officially welcome Shubham Jain to
the NVDA project, as our Google Summer of Code 2020 student.

Between now and the end of August, Shubham will be working on "Image
captioning and Object recognition modules for NVDA".  You can read about NV
Access on Google Summer of Code, and Shubham's project abstract at
https://summerofcode.withgoogle.com/projects/#6039693356957696 (
https://summerofcode.withgoogle.com/projects/#6039693356957696 )

Please join me in making Shubham welcome!

Regards,
Reef.




Akash Kakkar
 

Welcome Shubham.

On 5/5/20, Reef Turner <reef@...> wrote:
Hi all,

On behalf of NV Access I would like to officially welcome Shubham Jain to
the NVDA project, as our Google Summer of Code 2020 student.

Between now and the end of August, Shubham will be working on "Image
captioning and Object recognition modules for NVDA".  You can read about NV
Access on Google Summer of Code, and Shubham's project abstract at
https://summerofcode.withgoogle.com/projects/#6039693356957696 (
https://summerofcode.withgoogle.com/projects/#6039693356957696 )

Please join me in making Shubham welcome!

Regards,
Reef.




Reef Turner
 

Hi all,

On behalf of NV Access I would like to officially welcome Shubham Jain to the NVDA project, as our Google Summer of Code 2020 student.

Between now and the end of August, Shubham will be working on "Image captioning and Object recognition modules for NVDA".  You can read about NV Access on Google Summer of Code, and Shubham's project abstract at https://summerofcode.withgoogle.com/projects/#6039693356957696

Please join me in making Shubham welcome!

Regards,
Reef.