The politics & basics of Unicode

From Tim Bray’s “On the Goodness of Unicode” (6 April 2003):

Unicode proper is a consortium of technology vendors that, many years ago in a flash of intelligence and public-spiritedness, decided to unify their work with that going on at the ISO. Thus, while there are officially two standards you should care about, Unicode and ISO 10646, through some political/organizational magic they are exactly the same, and if you’re using one you’re also using the other. …

The basics of Unicode are actually pretty simple. It defines a large (and steadily growing) number of characters – just under 100,000 last time I checked. Each character gets a name and a number, for example LATIN CAPITAL LETTER A is 65 and TIBETAN SYLLABLE OM is 3840. Unicode includes a table of useful character properties such as “this is lower case” or “this is a number” or “this is a punctuation mark”.