@@ -20,7 +20,7 @@ developers became involved because they wanted to add Unicode 7 support and othe
2020
2121(The original utf8proc package also includes Ruby and PostgreSQL plug-ins.
2222We removed those from utf8proc in order to focus exclusively on the C
23- library for the time being, but plan to add them back in or release them as separate packages .)
23+ library.)
2424
2525The utf8proc package is licensed under the
2626free/open-source [ MIT "expat"
@@ -69,7 +69,7 @@ The C library is found in this directory after successful compilation
6969and is named ` libutf8proc.a ` (for the static library) and
7070` libutf8proc.so ` (for the dynamic library).
7171
72- The Unicode version supported is 15.1 .0.
72+ The Unicode version supported is 16.0 .0.
7373
7474For Unicode normalizations, the following options are used:
7575
@@ -96,3 +96,50 @@ the [utf8proc issues page on Github](https://github.com/JuliaLang/utf8proc/issue
9696## See also
9797
9898An independent Lua translation of this library, [ lua-mojibake] ( https://github.com/differentprogramming/lua-mojibake ) , is also available.
99+
100+ ## Examples
101+
102+ ### Convert codepoint to string
103+ ``` c
104+ // Convert codepoint `a` to utf8 string `str`
105+ utf8proc_int32_t a = 223 ;
106+ utf8proc_uint8_t str[16 ] = { 0 };
107+ utf8proc_encode_char (a, str);
108+ printf ("%s\n", str);
109+ // ß
110+ ```
111+
112+ ### Convert string to codepoint
113+ ```c
114+ // Convert string `str` to pointer to codepoint `a`
115+ utf8proc_uint8_t str[] = "ß";
116+ utf8proc_int32_t a;
117+ utf8proc_iterate(str, -1, &a);
118+ printf("%d\n", a);
119+ // 223
120+ ```
121+
122+ ### Casefold
123+
124+ ``` c
125+ // Convert "ß" (U+00DF) to its casefold variant "ss"
126+ utf8proc_uint8_t str[] = " ß" ;
127+ utf8proc_uint8_t *fold_str;
128+ utf8proc_map (str, 0, &fold_str, UTF8PROC_NULLTERM | UTF8PROC_CASEFOLD);
129+ printf("%s\n", fold_str);
130+ // ss
131+ free(fold_str);
132+ ```
133+
134+ ### Normalization Form C/D (NFC/NFD)
135+ ```c
136+ // Decompose "\u00e4\u00f6\u00fc" = "äöü" into "a\u0308o\u0308u\u0308" (= "äöü" via combining char U+0308)
137+ utf8proc_uint8_t input[] = {0xc3, 0xa4, 0xc3, 0xb6, 0xc3, 0xbc}; // "\u00e4\u00f6\u00fc" = "äöü" in UTF-8
138+ utf8proc_uint8_t *nfd= utf8proc_NFD(input); // = {0x61, 0xcc, 0x88, 0x6f, 0xcc, 0x88, 0x75, 0xcc, 0x88}
139+
140+ // Compose "a\u0308o\u0308u\u0308" into "\u00e4\u00f6\u00fc" (= "äöü" via precomposed characters)
141+ utf8proc_uint8_t *nfc= utf8proc_NFC(nfd);
142+
143+ free(nfd);
144+ free(nfc);
145+ ```
0 commit comments