Semantic and Machine Readibility for Accessibility

Tim Berners-Lee described the idea of a semantic web very early on. Information should be marked in such a way that it could be read by machines or programs and reassembled. That goes a little in the direction of artificial intelligence.

however, the semantic web is based on meta information, so the programs don't "understand" what it's about. The meta information enables you to draw your own conclusions, no more, but also no less. The keyword here is information retrieval.

Metadata and semantic markup
Hierarchy and relationships
Microdata and XFN
The Semantic Web
Schema.org
The opportunities for accessibility

Metadata and semantic markup

Metadata contains information about information. The simplest form is the HTML tags themselves. For example, the h1 tag tells a piece of text that it is a level 1 heading. The tag span="h1" says that this is a span with class h1 - in other words, nothing relevant. The tags, which are designed for pure design, such as bold, italic and so on, are still used, but are frowned upon. Instead, e.g. strong is used, which can be formatted via a CSS class. So Strong indicates that this text is particularly important, but does not say how it should be displayed.

There is also the metadata in the head of the HTML file: description, keywords, lang - to name just a few of the more well-known ones. In the past, there was quite a wild growth in these metadata, so that they were hardly usable, also thanks to spamming. Since websites are mainly created using editorial systems today, hardly anyone cares about the information in the head. The Bublin Core standard is an attempt to bring order to chaos. Most current document formats such as Doc, ODF or PDF also have the option of entering information such as keywords, author and title. However, this has only really caught on with MP3 in the form of ID3 tags.

The meta description is occasionally displayed as a snippet in the search result pages of Google and other search engines, otherwise only the title tag plays an important role for almost all users, it is displayed in the browser header or in the tab bar. It is also worth mentioning the possibility of storing geodata in the meta tags, which simplifies the geographical localization of the site operator. That might or might not play a role with location-based services and the mobile Internet.

While HTML 4.01 and XHTML 1.0 had rather weak semantics, HTML5 and WAI ARIA should change that. In ARIA there are the so-called landmarks like navigation, content and so on. There will be something similar in HTML5. ARIA's landmarks are already supported on many sites, e.g. in Wordpress installations, on Yahoo and google and even partially on Facebook.

Hierarchy and relationships

In visual design, hierarchies are essentially communicated through differences in size, proximity of elements, and visual arrangement. These elements only work if the website is viewed exactly as the designer intended. If someone works with a high zoom, a different design or a different presentation, for example with screen readers, this no longer works.

In addition to the visual structure, a technical structure is therefore also necessary. Since HTML5 there is a macro structure to structure the website and a micro structure to structure texts.

In principle, the macro structure ensures that we can already tell from the HTML tags which area of the page we are in. Nav, Content, Section and Footer provide the necessary information.

Within texts there are paragraphs = p, headings = h1-h6, lists and blocks of quotations. They make it easier to find your way around, especially with long texts, such as articles on Wikipedia.

Additionally, unordered, nested lists are used to communicate the structure of nested navigation to the blind. Blind people otherwise only get the links themselves read out, but do not know whether they are in the main or sub-navigation, whether the navigation has 3 or 10 points and so on.

Microdata and XFN

XFN or FOAF has more or less established itself because it has long been anchored in the installable versions of Wordpress. If you create a blogroll or a list of links via the system, you can also provide information about your relationship to this link. For example, I can say this is one of my websites, this is my friend's website, and so on. In principle, FOAF is an extended form of XFN. FOAF can be used to display social relationships in a machine-readable manner.

Another standard is micro data. The information about attributes is assigned properties. In practice, this is mainly used for calendar and contact data: i.e. VCard or HCard. This is a nice service because it makes it easier to add data to the address book or calendar.

The Semantic Web

The idea of the Semantic Web is already old. Tim Berners-Lee already described it in 1999 in his "Web Report". It is about making content on websites machine-readable. A program should be able to extract information based on the metadata.

For example, each event has specific pieces of information: event date and time, event location, event title, event description, and keywords. This information is usually distributed somewhere on the website. So far, search engines have only worked on the basis of keywords. So today it is practically impossible to search the Internet for all events on the subject of "digital democracy". Instead, you would have to look for the keyword "digital democracy" in combination with a trade fair, congress, bar camp or something similar. In addition, one would have to try the phrase "digital democracy" in different variants. And yet it is almost certain that we would miss many events. On the other hand, you could say to a semantic search engine: "Find me all events on digital democracy in 2012" and the search engine would return a corresponding list.

The problem that has not yet been solved is the technical complexity and standardization. Someone would have to go and develop a standard, the W3C provides the technical basis for this with RDF(s), OWL and SparQL. These standards would have to be integrated into the current editorial systems in order to be used by the editors. That should be as simple as writing and formatting text. Apart from flagship projects such as Teseus or the semantic Wikipedia, the semantic web has not played a major role so far.

Schema.org

The same applies to the schema.org initiated by Google, Yahoo and Microsoft, which is comparable to Microdata. Oddly enough, SEOs haven't jumped on it en masse yet, there's no plugin to integrate this into WordPress, which goes to show the relevance of schema.org. However, there are already some schemas on the portal page, so the chances for schema.org should currently be better than those of RDFs.

The opportunities for accessibility

It is already foreseeable that, apart from HTML5 and ARIA, none of the systems mentioned above will prevail in the medium term due to a lack of interest from webmasters. Whether one actually needs semanticization at this level is another question. Currently, schema.org and RDFs seem to be more interesting for special areas, especially where information is stored in databases anyway and websites are generated dynamically from it, so the information is already structured anyway. Anyone who has studied would certainly appreciate a library with a semantic catalogue.

So far, semantic information in HTML or ARIA has mainly been used by screen readers . Stronger semanticization would make it much easier for the blind to jump to specific areas of a website such as the navigation, the content or the information column. While it has become established practice to use jump anchors and headings as tools, jump anchors do not work in all screen reader/browser combinations, and no real standard has emerged for headings as to how they should be used correctly.

Another advantage of semantic content is that it is easier to translate. This would make it easier to develop programs that translate texts from everyday language into plain language or sign language.