|
| 1 | +# tutf8e |
| 2 | + |
| 3 | + *Tute Feighty* |
| 4 | + |
| 5 | + A tiny UTF-8 encoder for C. |
| 6 | + |
| 7 | +## Goals |
| 8 | + |
| 9 | + * As small and fast as possible |
| 10 | + * Narrowly scoped to one-step UTF-8 encoding in C |
| 11 | + * Link only what you need and use |
| 12 | + * MIT licence |
| 13 | + |
| 14 | +## Supported Encodings |
| 15 | + |
| 16 | + * [iso-8859-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1) Latin-1 Western European |
| 17 | + * [iso-8859-2](https://en.wikipedia.org/wiki/ISO/IEC_8859-2) Latin-2 East European |
| 18 | + * [iso-8859-3](https://en.wikipedia.org/wiki/ISO/IEC_8859-3) Latin-3 South European |
| 19 | + * [iso-8859-4](https://en.wikipedia.org/wiki/ISO/IEC_8859-4) Latin-4 North European |
| 20 | + * [iso-8859-5](https://en.wikipedia.org/wiki/ISO/IEC_8859-5) Part 5: Latin/Cyrillic |
| 21 | + * [iso-8859-6](https://en.wikipedia.org/wiki/ISO/IEC_8859-6) Part 6: Latin/Arabic |
| 22 | + * [iso-8859-7](https://en.wikipedia.org/wiki/ISO/IEC_8859-7) Part 7: Latin/Greek |
| 23 | + * [iso-8859-8](https://en.wikipedia.org/wiki/ISO/IEC_8859-8) Part 8: Latin/Hebrew |
| 24 | + * [iso-8859-9](https://en.wikipedia.org/wiki/ISO/IEC_8859-9) Latin-5 Turkish |
| 25 | + * [iso-8859-10](https://en.wikipedia.org/wiki/ISO/IEC_8859-10) Latin-6 Nordic |
| 26 | + * [iso-8859-11](https://en.wikipedia.org/wiki/ISO/IEC_8859-11) Part 11: Latin/Thai |
| 27 | + * [iso-8859-13](https://en.wikipedia.org/wiki/ISO/IEC_8859-13) Latin-7 Baltic Rim |
| 28 | + * [iso-8859-14](https://en.wikipedia.org/wiki/ISO/IEC_8859-14) Latin-8 Celtic |
| 29 | + * [iso-8859-15](https://en.wikipedia.org/wiki/ISO/IEC_8859-15) Latin-9 Western European |
| 30 | + * [iso-8859-16](https://en.wikipedia.org/wiki/ISO/IEC_8859-16) Latin-10 South-Eastern European |
| 31 | + * [windows-1250](https://en.wikipedia.org/wiki/Windows-1250) Central European and Eastern European |
| 32 | + * [windows-1251](https://en.wikipedia.org/wiki/Windows-1251) Cyrillic |
| 33 | + * [windows-1252](https://en.wikipedia.org/wiki/Windows-1252) English |
| 34 | + * [windows-1253](https://en.wikipedia.org/wiki/Windows-1253) Greek |
| 35 | + * [windows-1254](https://en.wikipedia.org/wiki/Windows-1254) Turkish |
| 36 | + * [windows-1255](https://en.wikipedia.org/wiki/Windows-1255) Hebrew |
| 37 | + * [windows-1256](https://en.wikipedia.org/wiki/Windows-1256) Arabic |
| 38 | + * [windows-1257](https://en.wikipedia.org/wiki/Windows-1257) Baltic |
| 39 | + * [windows-1258](https://en.wikipedia.org/wiki/Windows-1258) Vietnamese |
| 40 | + |
| 41 | +## Test Procedure |
| 42 | + |
| 43 | +``` |
| 44 | +$ ./codegen.py |
| 45 | +
|
| 46 | +$ gcc src/* test/test.c -Iinclude |
| 47 | +
|
| 48 | +$ ./a.out |
| 49 | +A quick brown fox jumps over the lazy dog |
| 50 | +Nechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzu, tanga a quickstepu. |
| 51 | +Pijamalı hasta yağız şoföre çabucak güvendi. |
| 52 | +Põdur Zagrebi tšellomängija-följetonist Ciqo külmetas kehvas garaažis |
| 53 | +В чащах юга жил бы цитрус? Да, но фальшивый экземпляр! |
| 54 | +διαφυλάξτε γενικά τη ζωή σας από βαθειά ψυχικά τραύματα |
| 55 | +עטלף אבק נס דרך מזגן שהתפוצץ כי חם |
| 56 | +Pijamalı hasta yağız şoföre çabucak güvendi. |
| 57 | +Flygande bäckasiner söka hwila på mjuka tuvor. |
| 58 | +เป็นมนุษย์สุดประเสริฐเลิศคุณค่า กว่าบรรดาฝูงสัตว์เดรัจฉาน จงฝ่าฟันพัฒนาวิชาการ อย่าล้างผลาญฤๅเข่นฆ่าบีฑาใคร ไม่ถือโทษโกรธแช่งซัดฮึดฮัดด่า หัดอภัยเหมือนกีฬาอัชฌาสัย ปฏิบัติประพฤติกฎกำหนดใจ พูดจาให้จ๊ะๆ จ๋าๆ น่าฟังเอยฯ |
| 59 | +Jeżu klątw, spłódź Finom część gry hańb! |
| 60 | +11 passed, 0 failed tests |
| 61 | +``` |
| 62 | + |
| 63 | +## How small is it? |
| 64 | + |
| 65 | +512 bytes + overhead per encoding. |
| 66 | + |
| 67 | +``` |
| 68 | +$ for i in src/*; do gcc -c $i -O1; done |
| 69 | +$ du -bhc *.o | grep total |
| 70 | +32K total |
| 71 | +
|
| 72 | +$ for i in src/*; do gcc -c $i -O3; done |
| 73 | +$ du -bhc *.o | grep total |
| 74 | +32K total |
| 75 | +
|
| 76 | +$ for i in src/*; do gcc -c $i -Os; done |
| 77 | +$ du -bhc *.o | grep total |
| 78 | +28K total |
| 79 | +``` |
| 80 | + |
| 81 | +## Related |
| 82 | + |
| 83 | + * [iconv](https://www.gnu.org/software/libiconv/) |
| 84 | + * [icu](http://site.icu-project.org/) |
0 commit comments