Skip to content

Category: Characters and Character Encoding

Understanding Charis SIL

Charis SIL is a font family based on the Unicode standard, a description of which you can find in my last blog post. In the Oxford Dictionary of the Internet, the term ‘font’ is defined as “[a] typographical term for a collection of alphabetic and other characters which are displayed in a certain style”. Fonts are similar within a font family but are displayed in different ways, such as bold or italic (Source: CSS fonts).

There are four fonts included in the Charis SIL font family at the moment: Charis SIL Regular, Charis SIL Bold, Charis SIL Italic, Charis SIL Bold Italic, and 3,600 glyphs are included. Glyphs are visual representations of a character/characters. For example, ß is a glyph representing the digraph of the two characters ‘ss’. The font family is designed with readability in mind as it is optimised at low resolutions and clear at high resolutions (Sources: Design – Charis SILCharacter Set Support- Charis SIL and Charis SIL). 

Charis SIL was developed by SIL Language Technology, a subset of the faith-based non-profit organisation SIL International. SIL Language Technology involves developing software, fonts and keyboards to help serve language communities and support sustainable language development (Source: About – Charis SIL). A central goal of Charis SIL is to provide a font family containing the glyphs needed for any Roman- or Cyrillic-based writing system (Source: Design – Charis SIL). 

As the Latin and Cyrillic scripts, i.e. writing systems, are used in thousands of languages, the Charis SIL font family provides fonts allowing for the writing of texts in many of the world’s languages. They state that the family also includes a number of symbols useful for linguistics and literacy work (Source: Charis SIL). This shows how related the font family and the fields of languages and linguistics are.

Leave a Comment

Understanding Unicode

Unicode is a computer standard that acts as a character coding system. The name comes from three goals of the standard: to be universal, to be uniform and to be unique (Source: Summary). This means that Unicode aims to give all characters in every world language a unique fixed-width number (called a code point).

Code points are stored in computers as one or more bytes, which is a unit of storage equivalent to eight bits (the smallest unit of data in a computer). Character encoding involves converting the bytes, stored in computer memory, back into the characters you want to display. This makes encoding an important part of ensuring the readability of a text. Without the connection formed between characters and their corresponding bytes, characters cannot be displayed correctly (Source: Character encodings for beginners).

The characters contained in the Unicode standard can be encoded by the character encoding formats UTF-8, UTF-16 or UTF-32. The difference between these forms is that UTF-8 uses 8-bit units, and UTF-16 and UTF-32 use 16-bit units and 32-bit units respectively. All three can be used to encode all of the characters in the Unicode Standard but can be used in different contexts. UTF-8 is most common on the web, UTF-16 is used by Java and Windows. UTF-8 and UTF-32 are both used by Linux and Unix systems (Sources: FAQ – UTF-8, UTF-16, UTF-32 & BOM, The Unicode Standard, Version 11.0: 2.5 Encoding Forms).

Unicode’s predecessor, ASCII, contained only 128 characters based on Western European languages, making it impossible to encode characters from world languages with other scripts (i.e. other writing systems) and impossible to encode all of the characters in some Western European scripts, such as ‘é’, making it mainly useful for texts written only in English (Source: BBC Bitesize – GCSE Computer Science – Hexadecimal and character sets – Revision 5).

Some scripts remain unsupported by Unicode, though it strives to be universal. The Script Encoding Initiative (SEI) at the UC Berkeley’s Department of Linguistics aims to prepare formal proposals for the encoding of scripts and script elements not currently supported (Source: script encoding initiative). This shows the interconnectedness of Unicode as a computer standard and the fields of languages and linguistics.

In the following video, Lobsang Monlam speaks about why and how he created a series of Unicode fonts for the Tibetan language:

Leave a Comment
css.php