You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Providing 'mode' provides hints to the actual encoder as to how it
409
+
# should encode the string.
348
410
#
349
-
# Providing 'mode' provides hints to the actual encoder as to how it should encode the string. Only UTF-7 and UTF-8 use "mode".
411
+
# Only UTF-7 and UTF-8 use "mode".
350
412
#
351
-
# utf-7 by default does not encode alphanumeric and a few other characters. By specifying the mode of "all", then all of the characters are encoded, not just the non-alphanumeric set.
352
-
# to_unicode(str, 'utf-7', 'all')
413
+
# utf-7 by default does not encode alphanumeric and a few other
414
+
# characters. By specifying the mode of "all", then all of the
415
+
# characters are encoded, not just the non-alphanumeric set.
416
+
# to_unicode(str, 'utf-7', 'all')
353
417
#
354
-
# utf-8 specifies that alphanumeric characters are used directly, eg "a" is just "a". However, there exist 6 different overlong encodings of "a" that are technically not valid, but parse just fine in most utf-8 parsers. (0xC1A1, 0xE081A1, 0xF08081A1, 0xF8808081A1, 0xFC80808081A1, 0xFE8080808081A1). How many bytes to use for the overlong enocding is specified providing 'size'.
355
-
# to_unicode(str, 'utf-8', 'overlong', 2)
418
+
# utf-8 specifies that alphanumeric characters are used directly, eg
419
+
# "a" is just "a". However, there exist 6 different overlong
420
+
# encodings of "a" that are technically not valid, but parse just fine
421
+
# in most utf-8 parsers. (0xC1A1, 0xE081A1, 0xF08081A1, 0xF8808081A1,
422
+
# 0xFC80808081A1, 0xFE8080808081A1). How many bytes to use for the
423
+
# overlong enocding is specified providing 'size'. to_unicode(str,
424
+
# 'utf-8', 'overlong', 2)
356
425
#
357
-
# Many utf-8 parsers also allow invalid overlong encodings, where bits that are unused when encoding a single byte are modified. Many parsers will ignore these bits, rendering simple string matching to be ineffective for dealing with UTF-8 strings. There are many more invalid overlong encodings possible for "a". For example, three encodings are available for an invalid 2 byte encoding of "a". (0xC1E1 0xC161 0xC121). By specifying "invalid", a random invalid encoding is chosen for the given byte size.
358
-
# to_unicode(str, 'utf-8', 'invalid', 2)
426
+
# Many utf-8 parsers also allow invalid overlong encodings, where bits
427
+
# that are unused when encoding a single byte are modified. Many
428
+
# parsers will ignore these bits, rendering simple string matching to
429
+
# be ineffective for dealing with UTF-8 strings. There are many more
430
+
# invalid overlong encodings possible for "a". For example, three
431
+
# encodings are available for an invalid 2 byte encoding of "a".
432
+
# (0xC1E1 0xC161 0xC121).
359
433
#
360
-
# utf-7 defaults to 'normal' utf-7 encoding
361
-
# utf-8 defaults to 2 byte 'normal' encoding
434
+
# By specifying "invalid", a random invalid encoding is chosen for the
435
+
# given byte size. to_unicode(str, 'utf-8', 'invalid', 2)
362
436
#
437
+
# utf-7 defaults to 'normal' utf-7 encoding utf-8 defaults to 2 byte
0 commit comments