You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Providing 'mode' provides hints to the actual encoder as to how it should encode the string. Only UTF-7 and UTF-8 use "mode".
408
+
# Providing 'mode' provides hints to the actual encoder as to how it
409
+
# should encode the string.
361
410
#
362
-
# utf-7 by default does not encode alphanumeric and a few other characters. By specifying the mode of "all", then all of the characters are encoded, not just the non-alphanumeric set.
363
-
# to_unicode(str, 'utf-7', 'all')
411
+
# Only UTF-7 and UTF-8 use "mode".
364
412
#
365
-
# utf-8 specifies that alphanumeric characters are used directly, eg "a" is just "a". However, there exist 6 different overlong encodings of "a" that are technically not valid, but parse just fine in most utf-8 parsers. (0xC1A1, 0xE081A1, 0xF08081A1, 0xF8808081A1, 0xFC80808081A1, 0xFE8080808081A1). How many bytes to use for the overlong enocding is specified providing 'size'.
366
-
# to_unicode(str, 'utf-8', 'overlong', 2)
413
+
# utf-7 by default does not encode alphanumeric and a few other
414
+
# characters. By specifying the mode of "all", then all of the
415
+
# characters are encoded, not just the non-alphanumeric set.
416
+
# to_unicode(str, 'utf-7', 'all')
367
417
#
368
-
# Many utf-8 parsers also allow invalid overlong encodings, where bits that are unused when encoding a single byte are modified. Many parsers will ignore these bits, rendering simple string matching to be ineffective for dealing with UTF-8 strings. There are many more invalid overlong encodings possible for "a". For example, three encodings are available for an invalid 2 byte encoding of "a". (0xC1E1 0xC161 0xC121). By specifying "invalid", a random invalid encoding is chosen for the given byte size.
369
-
# to_unicode(str, 'utf-8', 'invalid', 2)
418
+
# utf-8 specifies that alphanumeric characters are used directly, eg
419
+
# "a" is just "a". However, there exist 6 different overlong
420
+
# encodings of "a" that are technically not valid, but parse just fine
421
+
# in most utf-8 parsers. (0xC1A1, 0xE081A1, 0xF08081A1, 0xF8808081A1,
422
+
# 0xFC80808081A1, 0xFE8080808081A1). How many bytes to use for the
423
+
# overlong enocding is specified providing 'size'. to_unicode(str,
424
+
# 'utf-8', 'overlong', 2)
370
425
#
371
-
# utf-7 defaults to 'normal' utf-7 encoding
372
-
# utf-8 defaults to 2 byte 'normal' encoding
426
+
# Many utf-8 parsers also allow invalid overlong encodings, where bits
427
+
# that are unused when encoding a single byte are modified. Many
428
+
# parsers will ignore these bits, rendering simple string matching to
429
+
# be ineffective for dealing with UTF-8 strings. There are many more
430
+
# invalid overlong encodings possible for "a". For example, three
431
+
# encodings are available for an invalid 2 byte encoding of "a".
432
+
# (0xC1E1 0xC161 0xC121).
373
433
#
434
+
# By specifying "invalid", a random invalid encoding is chosen for the
435
+
# given byte size. to_unicode(str, 'utf-8', 'invalid', 2)
436
+
#
437
+
# utf-7 defaults to 'normal' utf-7 encoding utf-8 defaults to 2 byte
0 commit comments