Skip to content

Allow falling back from PtypString (utf-16) to PtypString8 with specified encoding#10

Open
aghster wants to merge 1 commit intomolotochok:mainfrom
aghster:main
Open

Allow falling back from PtypString (utf-16) to PtypString8 with specified encoding#10
aghster wants to merge 1 commit intomolotochok:mainfrom
aghster:main

Conversation

@aghster
Copy link

@aghster aghster commented Nov 4, 2025

Hi,

first of all thank you for the great library! It's particularly cool that it has no dependencies.

However, when I tried to parse a bunch of different test msg files the library failed for several of them. This was because, in these emails, the type of several string properties such as subject, senderName and senderEmail was not PtypString (0x001F) as defined in ROOT_PROPERTIES, but PtypString8 (0x001E). Apparently, both types are valid for these properties (see for example https://learn.microsoft.com/en-us/office/client-developer/outlook/mapi/pidtagsubject-canonical-property).

This is why I adapted the code such that a property can have a list of valid types that are tried one after another until a result is obtained. In this way, the valid types for the subject property can be [PtypString, PtypString8]. Since PtypString8 can have different encodings, the code also tries to determine the codepage from the codepage property and to decode the string using the corresponding encoding (falling back to utf-8 if the codepage cannot be determined).

I'm just an amateur programmer, so maybe there's a better way to implement this than I did. I'm curious to know what you think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant