|
27 | 27 | *
|
28 | 28 | */
|
29 | 29 |
|
| 30 | +/* Modified UTF-7 used for 'international mailbox names' in the IMAP protocol |
| 31 | + * Also known as mUTF-7 |
| 32 | + * Defined in RFC 3501 5.1.3 (https://tools.ietf.org/html/rfc3501) |
| 33 | + * |
| 34 | + * Quoting from the RFC: |
| 35 | + * |
| 36 | + *********************************************************************** |
| 37 | + * In modified UTF-7, printable US-ASCII characters, except for "&", |
| 38 | + * represent themselves; that is, characters with octet values 0x20-0x25 |
| 39 | + * and 0x27-0x7e. The character "&" (0x26) is represented by the |
| 40 | + * two-octet sequence "&-". |
| 41 | + * |
| 42 | + * All other characters (octet values 0x00-0x1f and 0x7f-0xff) are |
| 43 | + * represented in modified BASE64, with a further modification from |
| 44 | + * UTF-7 that "," is used instead of "/". Modified BASE64 MUST NOT be |
| 45 | + * used to represent any printing US-ASCII character which can represent |
| 46 | + * itself. |
| 47 | + * |
| 48 | + * "&" is used to shift to modified BASE64 and "-" to shift back to |
| 49 | + * US-ASCII. There is no implicit shift from BASE64 to US-ASCII, and |
| 50 | + * null shifts ("-&" while in BASE64; note that "&-" while in US-ASCII |
| 51 | + * means "&") are not permitted. However, all names start in US-ASCII, |
| 52 | + * and MUST end in US-ASCII; that is, a name that ends with a non-ASCII |
| 53 | + * ISO-10646 character MUST end with a "-"). |
| 54 | + *********************************************************************** |
| 55 | + * |
| 56 | + * The purpose of all this is: 1) to keep all parts of IMAP messages 7-bit clean, |
| 57 | + * 2) to avoid giving special treatment to +, /, \, and ~, since these are |
| 58 | + * commonly used in mailbox names, and 3) to ensure there is only one |
| 59 | + * representation of any mailbox name (vanilla UTF-7 does allow multiple |
| 60 | + * representations of the same string, by Base64-encoding characters which |
| 61 | + * could have been included as ASCII literals.) |
| 62 | + * |
| 63 | + * RFC 2152 also applies, since it defines vanilla UTF-7 (minus IMAP modifications) |
| 64 | + * The following paragraph is notable: |
| 65 | + * |
| 66 | + *********************************************************************** |
| 67 | + * Unicode is encoded using Modified Base64 by first converting Unicode |
| 68 | + * 16-bit quantities to an octet stream (with the most significant octet first). |
| 69 | + * Surrogate pairs (UTF-16) are converted by treating each half of the pair as |
| 70 | + * a separate 16 bit quantity (i.e., no special treatment). Text with an odd |
| 71 | + * number of octets is ill-formed. ISO 10646 characters outside the range |
| 72 | + * addressable via surrogate pairs cannot be encoded. |
| 73 | + *********************************************************************** |
| 74 | + * |
| 75 | + * So after reversing the modified Base64 encoding on an encoded section, |
| 76 | + * the contents are interpreted as UTF-16BE. */ |
| 77 | + |
30 | 78 | #include "mbfilter.h"
|
31 | 79 | #include "mbfilter_utf7imap.h"
|
32 | 80 |
|
|
0 commit comments