Re: Introducing our Google Summer of Code 2020 student: Shubham Jain

Noelia Ruiz


Imo, the approach chosen in the current project is good. For me,
trying to recognize images and objects is not contradictory with
common or less frequent interfaces, since images can be associated or
contain long pieces of text (which we could understand as a common
interface like readonly edit box or a document), with links, buttons
(for example sometimes OCR can be used to activate buttons of a dialog
in an inaccessible program), or even several of these kind of
controls, such as image maps in webpages.
I think that developers need to provide semantic information and
screen readers can use this throught known apis and standards like
HTML, to show headings for example, so we can navigate with h key.
Of course images, the same as audio transcriptions and maybe other
elements, need to be correctly described and human and manual
activities are likely required to produce an accurate result to be
presented in screen readers. But this is not always possible, so I
think that this project goes in the right way, since this doesn't mean
that images are not used as commont elements of interfaces. I think
images and its objects and text can be contained in these elements, so
this is not a separate or a different thing respect common elements,
but a subset of them and text is better recognized, so detecting the
content of images for me is right.

2020-05-09 17:09 GMT+02:00, Rui Fontes <rui.fontes@...>:

I agree! Such technic and, perhaps an attempt to decipher the control
name given by the developer, will make the use of some programs much

Rui Fontes

Às 12:56 de 09/05/2020, Marlon Brandão de Sousa escreveu:

I would focus instead on common interface elements recognition, for
example buttons, checkboxes, label associations and etc.

Although less glamorous with the final user, for me screen readers
will have to use this approach sooner or latter because nobody can
keep up with the pace of technology and accessibility will be each
time more broken in the sense that the time new technology arises and
the time they keep up with accessibility before being replaced by
newer technology is inversely proportional, which means that the time
between one technology becoming accessibility mature and being
replaced by newer imature technology will be each time smaller while
the time for new technology to become mature in terms of accessibility
will be equals or greater than it is today, given that more resources
tend to be allocated in new stuff development than in becoming current
stuff mature.

This is a marketing tendency and there is nothing we can do about it,
think about how accessibility and usability as a whole has decreased
in Apple systems because the pressure to release new features is
imposed by the marketing and each time greater.

Today Microsoft is spending lots of resources in accessibility. This
has made lives of screen readers for Windows easier than ever, but who
knows how much time this will least. It might be forever, it might be
for six months before the company redirects efforts to other
priorities. What if a foreign company arises and starts imposing
pressure for new stuff on Microsoft for Windows matters, just like the
marketing is moving faster and faster on the mobile arena?

Fact of life is the only thing we can assume that will be considered
are the visual interfaces for sighted people. These will never become
inaccessible to the sighted for obvious reasons and my understanding
is that they are standardized enough to be recognizable (a button is
relatively the same in qt, gtk, win32, windows forms or exposed by a
remote desktop screen) because people can recognize it as a button and
when it is clicked it behaves like a button. If sighted people can
recognize it as a button, then should image recognition IA, because
unless screen readers start to use a IA approach they won't be able to
resist in the long run.

Of course this doesn't solve all the possible problems, system focus,
context information, OS events and such wouldn't be but at least one
could focus more on scripts to correlation stuff than on querying apps
to extract visual element descriptions, which ultimately depends on
developers that, history shows, are usually either because they lack
knowledge, resources or will, unable to keep up in time.

On 05/05/2020 15:49, @ShubhamJain wrote:
Thank you for the introduction Reef!

I am very excited to be working on this project and getting to know
the community better! As Reef mentioned, you can find details about
the project at the above link or you can just contact me.

The ultimate goal of this project is to help and benefit the
community and users, and so, I would love any and all feedback, tips
and guidance you might have to offer!
It will not be possible for this project to be a success without your

Looking forward to working with all of you!

Shubham Jain

Join to automatically receive all group messages.