Next: , Previous: , Up: UTF-8   [Contents][Index]


D.10.3 Conversions

The following functions convert between the two string representations.

Function: int utf8_mbtowc_internal (void *data, int (*read) (void*), unsigned int *pwc)

Internal function for converting a single UTF-8 character to a corresponding wide character representation. The character to convert is obtained by calling the function pointed to by read with data as its only argument. If that call returns a non-positive value, the function sets errno to ‘ENODATA’ and returns -1.

Function: int utf8_mbtowc (unsigned int *pwc, const char *r, size_t len)

Converts first len characters from the multi-byte string r to wide character representation. On success, returns 0 and stores the result in pwc. The result pointer is allocated using malloc(3).

On error (invalid multi-byte sequence encountered), returns -1 and sets errno to ‘EILSEQ’.

Function: int utf8_wctomb (unsigned char *r, unsigned int wc)

Stores the UTF-8 representation of the Unicode character wc in r[0..5]. Returns the number of bytes stored. If wc is out of range, return -1 and sets errno to ‘EILSEQ’.

Function: int utf8_wc_to_mbstr (const unsigned *word, size_t wordlen, char **retptr)

Converts first wordlen characters of the wide character string word to multi-byte representation. The result is returned in retptr. It is allocated using malloc(3).

Returns 0 on success. On error, returns -1 and sets errno to one of the following values:

ENOMEM

Not enough memory to allocate the return buffer.

EILSEQ

An invalid wide character is encountered.

Function: int utf8_mbstr_to_wc (const char *str, unsigned **wptr, size_t *plen)

Converts a multi-byte string from str to its wide character representation.

The result is returned in retptr. It is allocated using malloc(3).

Returns 0 on success. On error, returns -1 and sets errno to one of the following values:

ENOMEM

Not enough memory to allocate the return buffer.

EILSEQ

An invalid wide character is encountered.

Function: int utf8_mbstr_to_norm_wc (const char *str, unsigned **wptr, size_t *plen)

Converts a multi-byte string from str to its wide character representation, replacing each run of one or more whitespace characters with a single space character (ASCII 32).

The result is returned in retptr. It is allocated using malloc(3).

Returns 0 on success. On error, returns -1 and sets errno to one of the following values:

ENOMEM

Not enough memory to allocate the return buffer.

EILSEQ

An invalid wide character is encountered.


Next: Comparing UTF-8 strings, Previous: Iterating over UTF-8 strings, Up: UTF-8   [Contents][Index]