Previous: , Up: UTF-8   [Contents][Index]


D.10.7 Additional functions

Function: unsigned * utf8_wc_strdup (const unsigned *s)

Returns a pointer to a new wide character string which is a duplicate of the string s. Memory for the new string is obtained with malloc(3), and can be freed with free(3).

Function: unsigned * utf8_wc_quote (const unsigned *s)

Quotes occurrences of backslash and double-quote in s by prefixing each of them with a backslash. The return value is allocated using malloc(3).

Function: int utf8_quote (const char *str, char **sptr)

Quotes occurrences of backslash and double-quote in s by prefixing each of them with a backslash. On success stores the result (allocated with malloc(3)) in sptr, and returns 0. On error, returns -1 and sets errno to the one of the following:

ENOMEM

Not enough memory to allocate the return buffer.

EILSEQ

An invalid wide character is encountered.

Function: size_t utf8_wc_hash_string (const unsigned *ws, size_t n)

Compute a hash code of ws for a symbol table of n buckets.

Function: int dico_levenshtein_distance (const char *a, const char *b, int flags)

Computes Levenshtein distance between UTF-8 strings a and b. The flags argument is a bitwise or of one or more flags:

0

Default - compute Levenstein distance, treating both arguments literally.

DICO_LEV_NORM

Treat runs of one or more whitespace characters as a single space character (ASCII 32).

DICO_LEV_DAMERAU

Compute Damerau-Levenshtein distance. This distance takes into account transpositions.

Function: int dico_soundex (const char *word, char code[DICO_SOUNDEX_SIZE])

Computes the Soundex code for the given word. The code is stored in code. Returns 0 on success, -1 if word is not a valid UTF-8 string.

Define: DICO_SOUNDEX_SIZE

This macro definition expands to the size of Soundex code buffer, including the terminal zero.

Note that this function silently ignores all characters, except Latin letters.


Previous: Functions for converting UTF-8 characters, Up: UTF-8   [Contents][Index]