Allow falling back from PtypString (utf-16) to PtypString8 with specified encoding#10
Open
aghster wants to merge 1 commit intomolotochok:mainfrom
Open
Allow falling back from PtypString (utf-16) to PtypString8 with specified encoding#10aghster wants to merge 1 commit intomolotochok:mainfrom
aghster wants to merge 1 commit intomolotochok:mainfrom
Conversation
…fied encoding for string properties
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi,
first of all thank you for the great library! It's particularly cool that it has no dependencies.
However, when I tried to parse a bunch of different test msg files the library failed for several of them. This was because, in these emails, the type of several string properties such as subject, senderName and senderEmail was not PtypString (0x001F) as defined in ROOT_PROPERTIES, but PtypString8 (0x001E). Apparently, both types are valid for these properties (see for example https://learn.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagsubject-canonical-property).
This is why I adapted the code such that a property can have a list of valid types that are tried one after another until a result is obtained. In this way, the valid types for the subject property can be [PtypString, PtypString8]. Since PtypString8 can have different encodings, the code also tries to determine the codepage from the codepage property and to decode the string using the corresponding encoding (falling back to utf-8 if the codepage cannot be determined).
I'm just an amateur programmer, so maybe there's a better way to implement this than I did. I'm curious to know what you think.