utf8 (GNU Dico Manual)

D.10 UTF-8

This section describes functions for handling UTF-8 strings. A UTF-8 character can be represented either as a multi-byte character or a wide character.

Multibyte character is a char * pointing to one or more bytes representing the UTF-8 character.

Wide character is an unsigned value identifying the character.

In the discussion below, a sequence of one or more multi-byte characters is called a multi-byte string. Multibyte strings terminate with a single ‘nul’ (0) character.

A sequence of one or more wide characters is called a wide character string. Such strings terminate with a single 0 value.

Character sizes
Iterating over UTF-8 strings
Conversions
Comparing UTF-8 strings
Character lookups
Functions for converting UTF-8 characters
Additional functions