Skip to content

Month: October 2018

WL4102 Presentation: Standards

As part of WL4102, students were asked to prepare presentations with a focus on concepts, technologies or tools of their choice. For my presentation, I chose the topic ‘standards’. Following research into the topic, I began to write my script, timing myself as I did so to ensure that I would stick to the allotted five minutes on the day of the presentation. This made my script add up to approximately 850 words. The script is based on the following information, which I have arranged in the form of a blog post. I have also included the sources that informed my presentation and compiled a list that can be found at the bottom of this post. While speaking, I displayed an outline of my presentation on my blog to aid those in attendance in following the presentation. I have attached a screenshot below.

Standards in computing are sets of specifications, or guidelines, for developing a certain computer technology, be that hardware or software. 

The first electronic digital computer was completed over 70 years ago and for about 10 years after that, each new computer developed was a wholly unique design. This was very costly. In the late 1950s, computers called mainframes emerged. These were cheaper to produce and were produced in greater quantity. Each manufacturer developed its own standards to build a complete system, for example, control and programming methods. When computers gained commercial popularity in the latter part of the 20th century, they became smaller and were produced in much greater numbers. This is largely due to standards. A huge advantage of standards stems from the fact that they help in creating programs that work on different systems, or that are compatible with different systems. The increasing number of computers produced was possible largely due to manufacturers’ ability to draw on standards to manufacture a range of products relying on the same technologies. (Sources: History and Impact of Computer Standards [Anniversary Feature] – Computer: page 3, BBC Bitesize – GCSE Computer Science – Standards – Revision 1)

Some characteristics of standards, as detailed by Marvin Waschke, are that standards are established by custom, by general consent or by an authority. They are widely followed and are usually documented with great precision. The scope and applicability, or what the standard actually provides for, are usually widely understood. Examples include Unicode, the QWERTY keyboard layout and the file format MP3. Some of the major standards organisations include the International Organization for Standardization (ISO), which maintains standards of every kind, and the World Wide Web Consortium, which focuses on standards for web-based technologies. (Sources: Cloud Standards: Agreements That Hold Together Clouds – Marvin Waschke – Google Books: pages 26 and 27BBC Bitesize – GCSE Computer Science – Standards – Revision 1)

There are two types of standards, which relate to the source of the standard itself: de facto and de jure. Both come from Latin, with de facto meaning ‘in practice’ and de jure meaning ‘in law’. In the case of a de facto standard, a person or a company builds a system in a certain way and this system could be a success and be used more, either by the original creator or by others. It then evolves so that the majority of systems are developed in this way and a de facto standard exists. There are no standard bodies involved in this standard’s development. An example of this is IBM’s PC design. (Source: Cloud Standards: Agreements That Hold Together Clouds – Marvin Waschke – Google Books: pages 27-32)

De jure standards, on the other hand, are sanctioned by a standard body, like the World Wide Web Consortium, which maintains the XML Schema Definition (XSD) language. This standard defines the “legal building blocks of an XML document”. This ensures that XML is valid and well-formed. (Source: XML Schema Tutorial

De Jure standards are sometimes also backed by government bodies, such as pharmaceutical safety standards. De jure standards may start as de facto standards. One example of a de facto standard becoming a de jure standard is the C programming language. The language was originally detailed in a book by its creators, Brian Kerrigan and Dennis Ritchie, in 1978. It gained popularity and the American National Standards Institute (ANSI) published a C standard language in 1989. (Source: Cloud Standards: Agreements That Hold Together Clouds – Marvin Waschke – Google Books: pages 32-36)

Another distinction can be drawn between open and proprietary standards. Open standards are made openly available to the public, either through a Creative Commons License or by being unlicensed, along with any supporting material needed to fully understand the scope and applicability of the standard. An example of this is HTML. Proprietary standards are privately owned by an organisation or individual. The owner can control the use of the standard through the licensing terms. An example of this is a DOC file, or the Microsoft Word Document file format. Stacy Baird states that these standards allow for more efficiency in the development of a new product, as the procedural issues involved in open standard-setting organisations are avoided, such as the issue of reaching a consensus among everybody involved. (Sources: BBC Bitesize – GCSE Computer Science – Standards – Revision 3Opening Standards: The Global Politics of Interoperability – Google Bookspage 19)

An example of a technical standard is ISO 639, which is the international standard for language codes. The purpose of ISO 639 is to maintain internationally recognised codes to represent languages or language families. This standard is a de jure standard, as it is maintained by a standards organisation: the International Organization for Standardization. It is an open standard, as the codes can be used without having to deal with any licensing terms. This example shows that standards are diverse in how they are used as these language codes can be used in coding and can be used in a library setting. (Source: ISO 639 Language codes)

In summary, standards are sets of specifications or guidelines for developing a certain aspect of computer technologies. They provide a means of creating programs and products compatible with different systems. They can be de jure or de facto, open or proprietary, and can be used in a variety of settings. 

List of sources:

Leave a Comment

Concordances and Voyant Tools

Concordance programs allow the user to search for instances of a given word or phrase within a text or a corpus. In the most common concordance format, each concordance line displays an occurrence of the word as it appears in the text or database, along with the words occurring on either side of it. This shows the context in which the word appears. This format is called a KWIC concordance, or a key-word-in-context concordance.

Concordances provide a convenient format to analyse words in terms of their context and to examine the patterns in which they occur. This is useful for lexicographers as it is important to know how words occur in a certain language and their frequency when writing dictionaries. By studying concordances, one can also obtain data on collocations, which is very useful in discourse research (Sources: Using a Concordance for Discourse Research

I used Voyant Tools to experiment with concordances using a plain text file version of Luther’s Bible. I downloaded the text from the Oxford Text Archive and uploaded it to Voyant Tools. This created the concordance that can be viewed here. In the bottom right-hand corner of the screen, one can see the concordance lines created.

From the most frequent word in the text, it is clear that I encountered the issue of German character representation. Instead of the German word ‘dass’, which means ‘that’ and which would have been written with a scharfes S (ß) at the time, ‘daãÿ’ is displayed. The appearance of this word also brings up the issue of stop words. Stop words are words that should be excluded from the results of a concordance and are typically function words. In Voyant Tools, you can choose to use pre-existing stopword lists or create your own. In this case, as some function words, such as ‘daß’, were displayed differently to how they should appear, it would be difficult to use this function.

One can see in the cirrus, or the word cloud, that some predictable content words were frequent in the text, such as ‘gott’ (God) and ‘sohn’ (son). It is interesting to note that the word cloud does not show these with a capital G and S. As all German nouns start with capital letters, this is an important feature of the language that is left out of the word cloud.

Leave a Comment

WordNet and wordnets

Princeton’s WordNet is a lexical database showing semantic relationships between words in the English language. It focuses on nouns, verbs, adjectives and adverbs, as words within these word classes are all content words, meaning that they have meaning by themselves (as opposed to function words). Princeton’s WordNet takes these content words and groups them into ‘synsets’, which are groups of cognitive synonyms, or words with the same meaning or sense (Sources: PARTS OF SPEECH, WordNet | A Lexical Database for English).

Wordnets have emerged in other languages based on this concept, including in my languages of study – Irish, Spanish and German. Wordnets for each of these languages can be found by following these links:

  • EuroWordNet database: a multilingual database providing wordnets for several European languages, including Spanish and German. Free samples from each language can be downloaded here.
  • Líona Séimeantach na Gaeilge (LSG), or the Language Semantic Network: an Irish-language wordnet, providing a comprehensive database of Irish words and the semantic links between them.  The PDF version can be downloaded here.

The PDF version of the LSG displays the wordnet in alphabetical order. As in the Princeton WordNet, content words are presented in synsets, showing relationships between words. The word ‘comhchiall’ denotes synonymous words, ‘aicmí’ denotes the class to which the word belongs and ‘fo-aicmí’ the subclasses stemming from the word. ‘Gaolta’ shows a related word that is not synonymous. In this screenshot below from the PDF, for example, one can see that the word ‘teangeolaíocht’ (linguistics) is shown to be in the class of ‘eolaíocht’ (science) with one subset being ‘pragmataic’ (pragmatics). It is shown to be related to, but not synonymous with, ‘gramadach’ (grammar).

This shows that synsets on the LSG, like those in Princeton’s WordNet, have a hierarchical element. Using the relations expressed through hypernyms and hyponyms, the LSG shows where each word lies within the hierarchy of similar words in the synset. In the example above, ‘eolaíocht’ is a hypernym for ‘teangeolaíocht’, and ‘pragmataic’ a hyponym for ‘teangeolaíocht’.  Antonyms are not shown within the LSG PDF file, unlike in Princeton’s WordNet. 

The entries are linked to the synsets available in the Princeton WordNet, which its creator, Kevin Scannell, states is helpful for his work on English-Irish machine translation. The entries are not mapped directly, however, partly due to the distinctions within Irish that do not exist within English (such as the difference between ‘rua’ and ‘dearg’) (Sources: LSG: Home, LSG: Details).

Leave a Comment

Text databases relating to my studied languages

In this post, I have compiled a list of some of the text databases available online that relate to the three languages that I study as part of the BA World Languages (Irish, Spanish and German). The list includes links to databases dedicated to various topics, from general literature of the language to specific authors, and could be of help to students studying these languages and/or studying the cultures of the countries in which these languages are spoken.

Bunachair shonraí a bhfuil baint acu leis an nGaeilge mar ábhar:

Bases de datos que están relacionadas con la filología hispánica:

Datenbanken, die Germanistik oder Deutsch als Fremdsprache betreffen:


Leave a Comment