screen reader and Speech output - why is it so fast and sounds artificial?
In my workshops, I run the screen reader in the background. On the one hand, I naturally want to know which slide I am currently on. On the other hand, it is also exciting for the participants to see how blind persons can work on the computer at all. Participants almost always ask me why I set the speech output so quickly. So here is the ultimate answer.
- What to read
- Hide known info
- Memory and pattern recognition
- Why do many speech outputs sound artificial?
- More on screen readers
What to read
Sighted readers usually read at a speed of 200 to 300 words. There are significant outliers up and down here. Functional illiterates read much slower. Experienced readers such as journalists or professors read much faster.
In general, it is said that one must be able to read at least 150 words per minute in order to read comprehensively. 150 words is the average speed of speaking and reading aloud.
From this you can see that reading speed is much faster than speaking speed. Most of us would be bored if we read a text ourselves at the speed we would read it aloud.
Blind persons are no different. As soon as they have experience in computer usage, they switch up the default setting of their speech output. It doesn't have to be 100 per cent like mine, but I don't know of any experienced user who runs their speech output at the normal human speaking speed.
In my tape days - they're those plastic things with a funny tape spinning in them - I had a cassette recorder that let me control the playback speed. I had fun making my music sound like Mickey Mouse. It was also quite handy for the audio books. Many speakers are relatively slow and monotone, so even the most exciting material can be soporific. I'd argue that you can make almost any audiobook play 20-30 per cent faster without the listener being bothered or really noticing the difference. So you can turn a ten-hour audiobook into a seven-hour one. Many apps like Audible support faster playback. Probably so you can finish one audiobook faster and buy another one quickly.
Things look a little better with the voice-over. Even the best speakers have imperfections when reading aloud. The change in voice pitch and intonation is an annoying factor when the main thing is information and I want to read the text as quickly as possible. Moreover, the speakers themselves modulate the speed at which they read aloud.
Most human speakers stop at 50 per cent increase in speed. After that, you can still understand the speaker, but you have to concentrate enormously. And a lot is lost if you are distracted for a moment. It is also difficult when you have two persons whose respective speaking speeds vary greatly.
For this reason, and also because of the comfort, I prefer to read non-fiction books with a screen reader on my PC. It comes - according to my theory - quite close to the visual for accustomed screen reader users, at least much closer than listening to audio books.
In my estimation, it is no problem to set the screen reader to 350 words per minute in one's native language, i.e. somewhat faster than a good reader would read. It becomes more difficult if the content is very complex, contains a lot of new information or if you cannot concentrate. If the text is not in the native language, it is rather difficult to increase the reading speed much.
Hide known info
One of the most annoying things for a blind computer user is having to listen to information they already have. For example, I am looking for a particular piece of information in a spreadsheet. A sighted person skims columns and rows without much trouble. A blind person, in the worst case, has to have all the values of a column or row read out to him. And the screen reader interferes because it not only reads out the value for each cell, but also the position of the cell and the heading of the area. Then "13 per cent" becomes "13 per cent":"Column 13, row 9 Election result 2013 13 percent". And until you find the cell, you have to listen to it a few times. That's not a barrier, but of course it's annoying. And if you have to listen to it, then please let it be over as soon as possible.
The same applies to many areas. Of course I want to know that it's a heading, a menu, a checkbox etc. But please not in everyday speech speed. I would really go mad if I had to listen to all that in detail.
Memory and pattern recognition
Last but not least, we already know what it says. I have given my presentations a thousand times. Since I know what the slide says, the first two words from the title are often enough for me to fill in the rest from my memory.
Context helps us filter important information. If I hear "railway boss", the next word will probably be "Grube". That's why blind persons wish that offices and CEOs never change. They then always have to get used to something new. A little joke on the side.
Pattern recognition also helps us, of course. Texts and websites are often structured according to a certain pattern. Typical, for example, is navigation, search field, content, third column and footer. Without this scheme, we would have to newly learn each website. The navigation points are also still typically known in a similar way on many websites. I always get a big laugh during my screen reader demos when I come across some word I don't know and announce it to the round.
Why do many speech outputs sound artificial?
There is now a whole range of more natural sounding voices. You hear them on the train or bus, on voice assistants or on phone loops. And, of course, they also run on -smartphones.
These speech outputs sound good if you speed them up moderately. But especially at high speeds they quickly get out of tune. The overcorrect intonation becomes a disadvantage here.
The "old" speech outputs, on the other hand, consist of artificially generated phonemes. They are also slim, which makes them relatively performant, i.e. they do not reach performance limits as quickly as the naturally generated speech outputs.
Don't get me wrong: Braille is certainly a good thing. And if you are skilled, you can certainly manage 100 words per minute. Especially for structured info like tables.
A Braille display with two dimensions would be a huge relief. But especially for frequent readers like me, 100 words per minute is not enough. Especially since I can't get close to that speed and probably never will. I'm at about 60 words per minute, which has already cost me a lot of practice.
As a blind person, you are already slower in many work opportunities. I don't need to burden myself with this package as well.