You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Providing 'mode' provides hints to the actual encoder as to how it should encode the string.Only UTF-7 and UTF-8 use "mode".
310
+
# Providing 'mode' provides hints to the actual encoder as to how it should encode the string.Only UTF-7 and UTF-8 use "mode".
311
311
#
312
312
# utf-7 by default does not encode alphanumeric and a few other characters. By specifying the mode of "all", then all of the characters are encoded, not just the non-alphanumeric set.
313
313
# to_unicode(str, 'utf-7', 'all')
314
314
#
315
315
# utf-8 specifies that alphanumeric characters are used directly, eg "a" is just "a". However, there exist 6 different overlong encodings of "a" that are technically not valid, but parse just fine in most utf-8 parsers. (0xC1A1, 0xE081A1, 0xF08081A1, 0xF8808081A1, 0xFC80808081A1, 0xFE8080808081A1). How many bytes to use for the overlong enocding is specified providing 'size'.
316
-
# to_unicode(str, 'utf-8', 'overlong', 2)
316
+
# to_unicode(str, 'utf-8', 'overlong', 2)
317
317
#
318
-
# Many utf-8 parsers also allow invalid overlong encodings, where bits that are unused when encoding a single byte are modified. Many parsers will ignore these bits, rendering simple string matching to be ineffective for dealing with UTF-8 strings. There are many more invalid overlong encodings possible for "a". For example, three encodings are available for an invalid 2 byte encoding of "a". (0xC1E1 0xC161 0xC121).By specifying "invalid", a random invalid encoding is chosen for the given byte size.
319
-
# to_unicode(str, 'utf-8', 'invalid', 2)
318
+
# Many utf-8 parsers also allow invalid overlong encodings, where bits that are unused when encoding a single byte are modified. Many parsers will ignore these bits, rendering simple string matching to be ineffective for dealing with UTF-8 strings. There are many more invalid overlong encodings possible for "a". For example, three encodings are available for an invalid 2 byte encoding of "a". (0xC1E1 0xC161 0xC121).By specifying "invalid", a random invalid encoding is chosen for the given byte size.
0 commit comments