A hexadecimal number that represents a UCS or Unicode value is commonly preceded by “U ” as in U 0041 for the character “Latin capital letter A”.

The UCS characters U 0000 to U 007F are identical to those in US-ASCII (ISO 646 IRV) and the range U 0000 to U 00FF is identical to ISO 8859-1 (Latin-1).

These are similar to the non-spacing accent keys on a typewriter.

A combining character is not a full character by itself.

The most important accented characters, like those used in the orthographies of common languages, have codes of their own in UCS to ensure backwards compatibility with older character sets. Precomposed characters are available in UCS for backwards compatibility with older encodings that have no combining characters, such as ISO 8859.

The combining-character mechanism allows one to add accents and other diacritical marks to any character.

by Markus Kuhn This text is a very comprehensive one-stop information resource on how you can use Unicode/UTF-8 on POSIX systems (Linux, Unix).

You will find here both introductory information for every user, as well as detailed references for the experienced developer.

Current plans are that there will never be characters assigned outside the 21-bit code space from 0x000000 to 0x10FFFF, which covers a bit over one million potential future characters.

The ISO 10646-1 standard was first published in 1993 and defines the architecture of the character set and the content of the BMP.

It is an accent or other diacritical mark that is added to the previous character.

This way, it is possible to place any accent on any character.

Unicode now replaces ASCII, ISO 8859 and EUC at all levels.

