-
Notifications
You must be signed in to change notification settings - Fork 4
Lexingutf 8
ericprud edited this page Mar 15, 2013
·
5 revisions
Lexing UTF-8
http://www.w3.org/TR/turtle/#grammar-production-PN_CHARS_BASE is defined in terms of unicode characters. This is trivially converted to a UTF-8 parser by e.g. http://www.w3.org/2005/03/23-lex-U:
| AZ | [A-Z] | A-Z | [A-Z] | |
| az | [a-z] | a-z | |[a-z] | |
| ÀÖ | [#x00C0-#x00D6] | c380-c396 | |\xC3[\x80-\x96] | |
| Øö | [#x00D8-#x00F6] | c398-c3b6 | |\xC3[\x98-\xB6] | |
| ø | [#x00F8-#x02FF] | c3b8-cbbf | |\xC3[\xB8-\xBF]|[\xC4-\xCB][\x80-\xBF] | |
| [#x0370-#x037D] | cdb0-cdbd | |\xCD[\xB0-\xBD] | ||
| [#x037F-#x1FFF] | cdbf-e1bfbf | |\xCD\xBF|[\xCE-\xDF][\x80-\xBF]|\xE0[\xA0-\xBF][\x80-\xBF]|\xE1[\x80-\xBF][\x80-\xBF] | ||
| [#x200C-#x200D] | e2808c-e2808d | |\xE2\x80[\x8C-\x8D] | ||
| [#x2070-#x218F] | e281b0-e2868f | |\xE2(\x81[\xB0-\xBF]|[\x82-\x85][\x80-\xBF]|\x86[\x80-\x8F]) | ||
| [#x2C00-#x2FEF | e2b080-e2bfaf | |\xE2([\xB0-\xBE][\x80-\xBF]|\xBF[\x80-\xAF]) | ||
| [#x3001-#xD7FF] | e38081-ed9fbf | |\xE3(\x80[\x81-\xBF]|[\x81-\xBF][\x80-\xBF])|[\xE4-\xEC][\x80-\xBF][\x80-\xBF]|[\xE1-\xEC][\x80-\xBF][\x80-\xBF]|\xED[\x80-\x9F][\x80-\xBF] | ||
| [#xF900-#xFDCF] | efa480-efb78f | |\xEF([\xA4-\xB6][\x80-\xBF]|\xB7[\x80-\x8F]) | ||
| [#xFDF0-#xFFFD] | efb7b0-efbfbd | |\xEF(\xB7[\xB0-\xBF]|[\xB8-\xBE][\x80-\xBF]|\xBF[\x80-\xBD]) | ||
| [#x10000-#xEFFFF] | f0908080-f3afbfbf | |\xF0[\x90-\xBF][\x80-\xBF][\x80-\xBF] |[\xF1-\xF2][\x80-\xBF][\x80-\xBF][\x80-\xBF] |\xF3[\x80-\xAF][\x80-\xBF][\x80-\xBF] |