Blindness and AI - Opportunities and Risks

Welcome to the presentation "Lindness and Artificial Intelligence.". First of all, thank you for the opportunity to speak here. My name is Domingos de Oliveira. I have been blind since birth and have only a small amount of residual vision. For several years, I have been working in the field of digital accessibility – that is, I advise, provide training, and test applications and systems. Initially, I worked freelance for many years, but now I am a permanent employee at adesso. Over the years, this has given me excellent access to technology – both from the perspective of a digital accessibility expert and from the personal perspective of someone affected.

Naturally, the advances in this area are particularly exciting for me. Firstly, because I support them professionally, and secondly, because I myself rely on assistive devices on a daily basis. Therefore, I follow with great curiosity what developments are taking place and what improvements they can bring to me, both in my professional and private life. For this very reason, I am delighted to be able to speak about this topic today.

I have brought three topics with me:

  1. What is already possible today.
  2. What problems and challenges still exist – especially with regard to developments in the last year.
  3. What we can do to further improve the situation.

What works today

I would like to start with some examples of what already works quite well and is already used regularly by many blind people.

A good example is the automatic generation of image descriptions. This technology has been around for a while – Facebook and Instagram introduced it early on, as did Microsoft Office and Chrome. For a long time, however, the results were rather mediocre to simply unusable. It couldn't be seriously recommended to anyone.

This has improved significantly. Today, various systems – such as ChatGPT or other models – allow the uploading of images. These can then be described in detail upon request, often in much greater detail than before. I also find the option to ask questions particularly attractive. Using the chat function, you can request specific information, so you are no longer limited to a rigid, predefined image description.

There are many areas of application for this:

  • In everyday life, for example, via social media or messaging services like WhatsApp, when you receive images and don't know what they show.
  • In a professional environment, when you receive infographics, presentations, or diagrams. While many colleagues are willing to describe content, they often don't provide the level of detail you need, and follow-up questions aren't always welcome. This is precisely where the new technology opens up a great opportunity to access information independently and flexibly.

Challenges

Let's look at some problems and challenges.

A central issue is data handling. In many AI applications, we don't know exactly what happens to the information we enter. For example, we know that chats are sometimes used to further train models. Images are probably also uploaded and stored – where exactly and for how long remains unclear. Is this data permanently stored? Is it analyzed in a personalized way? None of this is transparent.

This becomes particularly tricky when sensitive content is involved – such as private photos, pictures of children, or very personal information. For example: Do I really want an AI system to know whether I have a positive coronavirus test? Or which private pictures I am looking at? These questions demonstrate the extent of the problem of a lack of data sovereignty.

A second problem concerns the reliability of the answers. AI systems tend to formulate results as if they were unequivocally correct – even if they are incorrect. This may be easier to verify for sighted users. For blind people, it's more difficult: I can't easily determine whether information is correct, completely wrong, or somewhere in between.

That's risky. If I knew that information was only 15% reliable, I wouldn't rely on it. But if answers sound like they're 80–90% accurate, I'm more likely to believe them—and that can lead to false conclusions.

Another example that's now working quite well is the description of surroundings. This has been around for several years: You take a photo, and a few seconds later, you receive a description. From what I've observed, it's usually been quite useful.

What's new, however, is that such descriptions are now possible in real time. A current example is the collaboration between Meta and Ray-Ban: Together, they have developed XR glasses that display information about the surroundings while the user is wearing them. It's relatively affordable and is already being used by many blind people in the US – whether for walks or everyday activities.

The potential is huge: Those who travel alone a lot can benefit enormously. While sighted friends or companions can provide descriptions, this support is lacking when traveling alone. Glasses that do this would therefore be a valuable aid.

But here, too, the question remains: Is the information correct? For example, I read from a tester that the glasses told her she was on the street – even though she was actually on the sidewalk. Such errors can not only be confusing in everyday life, but in the worst case, dangerous.

Of course, we don't know exactly why errors occur in such cases. It's probably because the AI ​​was primarily trained with US image data – sidewalks and streets there sometimes look different than they do here. This can lead to misunderstandings, which in the worst case can even be life-threatening if one relies too heavily on such information.

Another problem is the delay in processing. The glasses themselves have limited computing power to remain lightweight and portable. Therefore, the image data is first transmitted to the internet, processed there, and then sent back. Even if this often only takes a few seconds, it can be critical in everyday life.

Imagine: I'm walking down the street, inadvertently stepping onto the road – and the glasses only inform me about it three seconds later. By then, it may already be too late in an emergency. The same applies to obstacles: If something suddenly appears in my path, I need the information immediately, not after a delay.

Of course, this is still better than no support at all. With training, it might even be possible to deal with such inaccuracies. But it shows that we are still a long way from a truly reliable solution.

A second point is data protection. As mentioned several times, the large basic models run in the cloud. This means that all the data I enter—be it text, photos, or even sensitive content like medical records—is sent to external servers. What exactly happens to this data there is unclear. For some, this may not be a major concern, but others absolutely do not want to share such information. This is a major hurdle, especially for people who work with sensitive documents or very private photos. Therefore, it would be extremely important to make transparent what happens to the data—and to develop alternatives that offer more control and data protection.

A third issue is data bias. The large language and image models were not developed with the needs of blind people in mind. They were trained on vast amounts of data that served entirely different purposes. The fact that this accidentally creates functions that are useful for blind people—such as image descriptions—is more of a byproduct.

The problem: These descriptions do not necessarily meet the needs of blind users. For example, an online shop needs an image description like "white sneaker with red sole" for sighted customers. A blind person, on the other hand, first needs the information that it's a shoe. The same applies to attributions of people: Terms like "young" or "old" may seem harmless at first, but they are subjective and potentially error-prone.

These differences show that we need to think about standards: What does a good, accessible image or video description actually look like? And how can we ensure that AI systems adhere to these standards instead of outputting random, distorted information?

Another difficult aspect is the question: Who actually decides what is true? If an AI describes a person as "young" or "old," for example, that's a very subjective attribution. You might not want to receive similar categories or ratings from a machine—or you might at least want to decide for yourself whether you agree with that assessment or not.

The fundamental problem remains: Blind people cannot independently verify the accuracy of information. You can listen to a description for plausibility, but you can't determine with certainty whether it's correct or not. This lacks an important foundation for so-called supervised learning, in which humans train AI through feedback. Because if I don't even know whether an image description is correct, I can't give the AI ​​reliable feedback.

This always carries the risk of relying on false or inaccurate information – and deriving actions from it that can be problematic.

I see another problem in the assistive technology industry itself. Marketing for new products is often exaggerated or simply inappropriate. A lot is promised that doesn't work at all in reality – or only works to a very limited extent. This leads to blind people developing false expectations and ultimately being disappointed. I think more honesty and transparency are needed here. If boundaries are clearly defined, users can decide for themselves whether they want to try out a tool.

What can we do?

This brings me to the last point: What can we do?

In my view, it's crucial that AI models are specifically trained for the needs of blind people. It's not enough to take large, general models and then "tack on" a few additional functions. The needs are too specific to be addressed casually.

Of course, this is a challenge: There probably isn't as much data available for this area as in other application areas. But if we want to develop truly useful systems in the long term, there's no way around taking these needs seriously and incorporating them specifically into development. Another important point is the question of where the models run.

In my view, AI systems should be run locally on the end device—that is, on the user's smartphone or computer—as much as possible. This would have two advantages: First, latency is reduced because no internet connection is required. Second, data leakage is reduced because processing takes place directly on the device, thus retaining significantly more control over one's own data.

One example of this is Apple: With "Apple Intelligence," they are taking precisely this approach and shifting many AI functions to the iPhone itself. Of course, the computing capacity there is limited, but smaller models can already be run locally—and this development should be further encouraged.

Furthermore, it would be very helpful if AI systems could assess and communicate their own security.

If an AI says, for example, "There's a 90% probability that this is green, and a 10% probability that it could also be blue," then I, as a blind person, can deal with it much better. I then know how high the uncertainty is and can decide whether to rely on the information, consult a second source, or ask a sighted person for help. This transparency would be a huge benefit.

And finally, one point is particularly important to me: training and education for users.

AI doesn't work automatically for everyone. Many people – including blind people – are hesitant to engage with the technology. Sometimes they lack the knowledge, sometimes the experience, sometimes simply the confidence to use it. Therefore, targeted training courses should be offered that demonstrate how to use AI meaningfully and productively. This would help many people find out whether and how these technologies are useful for them.

More on Blindness