Optimizing Texts for automatic transfer in Plain Language or Easy Reading

The automatic transfer of texts into other formats will become increasingly important in the future: This includes both automatic translations into foreign languages, but also transfer into understandable formats such as simple or easy language. This text is about how texts can be designed from the outset so that they can be better transferred automatically. In the following, I speak of understandable language and mean both Easy Reading and Plain Language.

Why is automatic comprehensibility the future??

The automatic transfer into understandable formats is the future. As much as I value the work of my colleagues in comprehensibility and find their arguments correct, in my view there is no alternative. It is often argued that only people can understand texts - correct - and that they are therefore the only ones capable of producing good and appealing translations - wrong. Due to the way they work, AIs do not have to understand texts in the same way that people understand them. You can see that from the very good summaries that they can already create today.

I don't think the other frequently put forward argument that AIs would remove stylistic peculiarities is a bad thing. Very few information texts are read because of their style. We are usually talking about texts from administration, health or consumer protection, and with the best will in the world, no personal style is recognizable or relevant to the reader.

The experts do not have the ability to write texts in an understandable way. They lack both the time and the opportunity to gain experience in this area. That is why training courses for professions that do not involve professional writing are not useful in my view.

Translation by comprehensibility professionals is expensive and time-consuming when you consider that two or three runs are often necessary before finalization. If, for example, there are specially trained models for the health sector, the experts can do what they do well: check the automatic transmissions for content correctness. Few people know that there are special startups in individual areas: I got to know Health Tech in my current job, I know Legal Tech from my work for the disabled sector.

There is a fundamental contradiction in comprehensibility that can not be solved: there is no such thing as the perfect form. Someone with a learning disability does not necessarily need easy language, nor does someone with a migration background. Illiterate people do not necessarily need the forms of understandable language that we would offer them. Ideally, every person should get exactly the format they need at that moment and exactly the information they need or want, and not something that we simply throw at them. However, it is impossible for providers to offer texts in multiple formats.

Another problem that the comprehensibility professionals remain silent about: while specialists have established themselves in foreign languages, this is not the case with understandable language, especially not in Germany. There are translators who translate texts from the medical or legal field. As far as I know, this does not exist in understandable language. Here, people believe that anyone can make everything understandable. Of course, there are differences between translation into other languages ​​and understandable formats. But in both cases, you have to 1. understand what the original says and 2. be able to translate the content correctly. In addition, you often have to add context to make it more understandable. For example, you sometimes have to explain what insulin does in the body. Just look at the nonsense that medical laypeople spread as influencers in their various formats to understand that we cannot do this without the appropriate expertise. AIs can do this better than any layperson because of their data feed. As I said above, I would use specially trained models for this, so that they do not invent nonsense or copy nonsense from laypeople.

A near future

There are writing assistants in software form that can already generate understandable texts. ChatGPT is already quite good, but special software for writing or specially trained models for translation can do much more. I think that there will soon be easy-to-use tools that automatically generate understandable texts or help with optimization.

The Microsoft Editor or the DeepL writing assistant are moving in this direction, but are still very superficial. Things like the length of sentences, words or paragraphs can already be optimized automatically. I believe that these tools will make suggestions for improvement, similar to today's spelling checks, which you can then accept or reject. They may then learn from this and also specifically for the organization or the person writing and get better over time. This will enable non-experts to write texts that are easier to understand.

In many areas we are talking about highly standardized texts anyway. If someone has diabetes, for example, the doctor does not come up with a completely new strategy, but adapts existing treatment methods or medication to the patient. So they probably work with text blocks that could be made understandable beforehand. The blocks are then simply clicked together and handed to the patient.

Why should you optimize texts beforehand when the software does it for you?

Because you save work on proofreading afterwards. Chunking, i.e. breaking down information into small chunks, is particularly important for the software. Text structures such as lists and headings make it easier to recognize important information. Long, drawn-out sentences or long compounds make analysis more difficult for the software.

I can imagine two parallel scenarios: We provide formats that are already understandable. Or the client's software automatically translates into understandable formats, just as you can now set the browser to translate all pages you call up into your native language. A good text structure is important for this.

Optimize texts for automatic transfers

The following rules seem useful to me for this purpose.

1. Use simple and clear language Avoid complex sentence structures: Use short and concise sentences. Avoid nested sentences and unnecessary subordinate clauses. Use clear terms: Choose clear and unambiguous words. Avoid synonyms and technical jargon if possible.

2. Consistency in vocabulary Consistent terminology: Always use the same word for the same term to avoid confusion. Provide definitions: If technical jargon is unavoidable, a clear definition should be provided.

3. Correct grammar and punctuation Pay attention to grammar and spelling: Incorrect sentences can be misinterpreted by the translation tool. Use standardized punctuation: Use common punctuation and avoid unusual punctuation that could distort the translation.

5. Clear structure of the text Paragraphs and headings: Use paragraphs, headings and lists to structure the text logically and clearly. Chronological order: Arrange information in a logical order to avoid misunderstandings.

6. Simple sentence constructions Subject-verb-object structure: Use the simple sentence structure subject-verb-object to increase comprehensibility. Active instead of passive: Use the active rather than the passive, as active sentences are more direct and easier to translate in many languages.

7. Avoid ambiguity Clear pronoun use: Avoid unnecessary pronouns or make sure that the reference of the pronoun is clear. Clear connections: Use clear connections between sentences, such as "because", "therefore", "so", to make the context clear.

8. Avoid slang and colloquial language Formal language: Use formal and neutral language, as colloquial language is often difficult to understand and can be problematic when translating. Avoid literary language and metaphors Literary language, i.e. language that has to be interpreted by the reader, should be avoided. Metaphors often cannot be interpreted correctly by AIs.

9. Technical notes Numbers and units of measurement: Write out numbers and units of measurement or use internationally understandable formats. Avoid abbreviations: If abbreviations are necessary, explain them when they first appear.

10. Review and testing Checking for content correctness by a suitable person will remain necessary. We all know the problem of hallucinating AIs. If they don't know something, they often make up something that sounds plausible. Only an expert can judge that.

What can be automated and what can't?

There are two areas that are particularly exciting: Applying automated rules: sentence length, word length and a few other rules can be checked fully automatically. And automated transfer.

What hasn't worked so far - as far as I know - is the automatic restructuring of written texts. It is often the case that texts are poorly structured. Even if it sometimes seems that way: AIs don't understand our texts and won't do so in the foreseeable future. ChatGPT writes the following about how it summarizes texts:

1. Understanding meaning: Word meaning and context: An AI analyzes the meaning of individual words in the context in which they occur. For example, it can recognize that the word "bank" has different meanings in different contexts (e.g. as a financial institution or as a place to sit). Semantic roles: The AI ​​tries to understand who does what. If a sentence says "The cat is chasing the mouse," then the AI ​​knows that the cat is performing the action and the mouse is the target. 2. Identifying main topics: Word frequency: Frequently used words or phrases in a text often indicate important topics. Position-based cues: Information that is at the beginning or end of a text is often important. This is especially used in news articles or scientific texts. Titles and headings: These often give a clear indication of the main topics. 3. Recognizing connections and priorities: Coherence and relationships: The AI ​​analyzes how different parts of the text are related to each other to find out what is important for overall understanding. Avoiding redundancy: The AI ​​tries to summarize similar or repeated information and emphasize only the new or most important points.

As has been stated many times, AIs fail completely when it comes to literature and poetry. AIs also do not understand allusions, which means that they cannot automatically be translated into another language or made more understandable.

AIs fail when it comes to contexts because they break down texts. If person X is introduced in a halfway reasonable way on page 10 and then picked up again on page 23, most people will remember that person. The AI ​​will not do that.

Let me be clear: AIs do not understand what we tell them, what they themselves say, or anything else. They are statistical models that have been trained with countless data. Personally, I don't care, what is important is what comes out at the end. Let's be honest: we didn't understand what our software does before, and we will understand it less and less, and we can't care, just as we are interested in the chemical and physical reactions when cooking, as long as it works and tastes good. In relation to our topic, this means that we must be aware of the possibilities and limitations of AI. It would be a mistake to leave everything to AI without checking it. But it would be an equally big mistake to do nothing at all.

Conclusion

Despite all these weaknesses, should we use writing assistants? My clear answer is yes. Think of assistants as tools. You can write texts by hand, on a typewriter or in a text editor, but you usually do it in a word processor. An assistant is also a tool, whether it says AI on it or not is irrelevant. You probably also use the spell checker, which is not perfect either.

The fact is that we will never get to the point where we can convert even a relevant part of our texts into understandable language using the manual methods of yesterday. We lack the people, the money and the time.

The future of automatic translation of language into understandable formats is uncertain. However, I am Pretty sure that in the not too distant future it will be possible to set preferred levels of intelligibility for automatic translations into understandable language either in the operating system or in the browser, just as we can define German as our native language today and receive more or less successful automatic translations from foreign languages. Edge and Chrome already offer these settings free of charge and with a small time delay. Firefox also offers automatic translation, but the quality is more reminiscent of the late 90s.

Why is it important that this happens on the client side? There is a user-oriented answer: people should be able to decide for themselves what they want to be understandable and not be dependent on the goodwill of the providers. And the legal argument: when we publish texts on the web, we are responsible for their content, even if they are automatically translated into something, as long as we provide this translation service. If, however, the user uses a tool, they are responsible for checking the correctness of the translation. It sounds as if I wanted to transfer responsibility. But in my view, there is no other way. I cannot provide several versions and rely on the reader finding the right version for themselves.

More on Texts