v0.54.1
- [TeamMsgExtractor #462] Fix potential issue where child MSG might have incompatible encoding to parent MSG when trying to grab a stream from the parent.
- Added code to attempt to significantly improve RTF deencapsulation times. This tries to strip away unneeded data before passing it to
RTFDE. This shows improvements on all files that take more than one second. Currently, this actually fixes some files previously outputting wrong fromRTFDEwhen deencapsulating the HTML body, specifically around non breaking spaces sometimes not transferring over.
v0.54.0
- [TeamMsgExtractor #456] Changed the prepared html output to use plainly encoded HTML instead of prettified, since current prettification options used mangles the output and causes the output to sometimes be very large.
v0.53.2
- [TeamMsgExtractor #452] Adjusted code to allow html encoding to be cached to try to speed up
bs4operations. - [TeamMsgExtractor #453] Fixed handler for too large filetimes so that some filetimes being too large doesn't break the handler.
- Fixed a bug that would cause an error in task objects due to a lack of
enumerate. - Fix
TOCEntrynot initializingDVTargetDevicecorrectly. - Add temporary properties for
ContentIDtoSignedAttachment. AFAIK these can't ever be set, but this prevents errors in some places.
v0.53.1
- Expanded allowable range for
red-black-tree-mod. - Fix issue with
MessageBase.asEmailMessage()that prevented embedded MSG files from being attached. - Expand allowable versions of
BeautifulSoup4.
v0.53.0
- Added tests for many functions in
extract_msg.utils. - Fix an issue in
extract_msg.utils.msgPathToString()that prevented backslashes from being replaced with forward slashes. - Change the behavior of
extract_msg.utils.minutesToDurationStr()to properly use plurals. - Fixed issue in
extract_msg.utils.unwrapMsg()that would prevent it from working on signed messages due to an API change. - Added new exception
MimetypeFailureError. - Modified the logic of
MessageBase.asEmailMessage()to useAttachmentBase/SignedAttachment.nameinstead ofgetFilename()which only exists on AttachmentBase. - Modified the logic of
MessageBase.htmlBodyPrepared()to properly put the mimetype in image tags to ensure rendering. Logic was also modified to useencodeinstead ofprettifyto reduce computation and output size.
v0.52.0
- [TeamMsgExtractor #444] Fix typo in string that prevented HTML body from generating from the plain text body properly.
- Adjusted the behavior of
MSGFile.areStringsUnicodeto prioritize the property specified by the parent MSG files for MSG files that are embedded. Additionally, added a fallback to rely on whether or not there is a stream using the001Ftype to determine the property value if it is entirely missing. - Adjusted
OleWriter.fromMsg()andMSGFile.export()to add the argumentallowBadEmbedwhich helps to correct a few different issues that may appear in embedded MSG files. These corrections allow the embedded file to still be extracted and to open properly in Outlook. - In addition to the above, the errors that some of those corrections will suppress are now significantly more informative about what went wrong.
v0.51.1
- Add class type added in last version to known class types.
v0.51.0
- [TeamMsgExtractor #401] Add basic support for MSG class type
IPM.SkypeTeams.Message.
v0.50.1
- [TeamMsgExtractor #434] Fix bug introduced in previous version.
v0.50.0
- [TeamMsgExtractor #432] Adjust html header code to replace non-ascii characters with escaped versions. Also adjusted plain text to html conversion to ensure non-ascii character from the body are encoded to escaped values to be safe.
- Made some corrections to
NullDate.
v0.49.0
- [TeamMsgExtractor #427] Adjusted code for converting time stamps to create null dates for any time stamp beyond a certain point. The point was determined to be close to the existing null dates.
- [TeamMsgExtractor #425] Added basic support for custom attachments that are Windows Metafiles.
- Changed tolerance of bitmap custom attachment handler to allow for attachments with only a CONTENT stream. This change was made after seeing an example of a file that only had a CONTENT stream and no other streams for the custom data. The code now also tries to create default values for things previously determined from those other streams.
- Fixed an issue in
tryGetMimetypewere the code didn't properly check if the data type was bytes (it only checked if it had a type). - Corrected some exports.
- Added new
ErrorBehaviorvalueCUSTOM_ATTACH_TOLERANTto allow skipping checks for unused data that is normally validated.
v0.48.7
- [TeamMsgExtractor #420] Fixed typo introduced in last version.
v0.48.6
- [TeamMsgExtractor #417] Fixed issues with
openMsgwhere some corrupted MSG files could end up throwing an uncaught exception and leaving the file handle open.
v0.48.5
- [TeamMsgExtractor #414] Fixed typo in
message_signed_base.py.
v0.48.4
- [TeamMsgExtractor #411] Fix console script throwing error due to changed console args not defaulting.
v0.48.3
- [TeamMsgExtractor #409] Added missing private method to
SignedAttachment. - Fixed some missing typing information.
v0.48.2
- Fixed bugs with
MessageBase.asEmailMessage(). Numerous improvements to how it handles the data.
v0.48.1
- Added an option (
-s,--stdin) to the command line to take an MSG file from stdin. This allows the user to pipe the MSG data from another program directly instead of having to write a middleman that uses theextract-msglibrary directly or having to write the file to the disk first. - Changed main function to allow for manual argument list to be passed to it.
- Added attributes to
AttachmentBasefor creation and modification time. These can be accessed throughcreatedAtorcreationTimeandlastModificationTimeormodifiedAt. - Changed
OleWritertests to output the name of the test file being done if an error occurs. - Added tests for some command line stuff.
v0.48.0
- Adjusted error handling for named properties to handle critical streams being missing and to allow suppression of those errors.
- Adjusted error handling for named properties to allow silencing of errors caused by invalid references to the name stream. If
ErrorBehavior.NAMED_NAME_STREAMis provided to theMSGFileinstance, a warning will be logged and that entry will simply be dropped. - Adjusted error handling for signed messages to better check for issues with the signed attachment. This should make errors from violating the standard much easier to understand. These errors can be ignored, but the attachment will not be parsed as a signed attachment.
- Minor docstring updates.
- Minor adjustments to
OleWriterto prepare the code for being able to write version 4 files. Version 3 files are currently the only one's supported, but much of the code had hard-coded values that could be replaced with variables and small conditionals. This will have very little performance impact, and should not be noticeable. - Improved comments on
OleWriterto make private sections more understandable. - Changed
MessageSignedBase._rawAttachmentstoMessageSignedBase.rawAttachmentsto provide non-private access in a reliable way.
v0.47.0
- Changed the public API for
PropertiesStoreto improve the quality of its code. The properties type is now mandatory, and the intelligence field (and the related enum) has been removed.- Additionally, the
toBytesand__bytes__methods will both generate based on the contents of this class, allowing for new properties to be created and for existing properties to be modified or removed if the class is set to writable on creation.
- Additionally, the
- Added new method
Named.getPropNameByStreamID. This method takes the ID of a stream (or property stream entry) and returns the property name (as a tuple of the property name/ID and the property set) that is stored there. ReturnsNoneif the stream is not used to store a named property. This name can be directly used (if it is notNone) to get theNamedPropertyBaseinstance associated. This method is most useful for people looking at the raw data of a stream and trying to figure out what named property it refers to. - Fixed mistake in struct definitions that caused a float to require 8 bytes to unpack.
- Added tests for
extract_msg.properties.props. - Added basic tests for
extract_msg.attachments. - Added validation tests for the
enumsandconstantssubmodule. - Removed unneeded structs.
- Fixed an issue where
PtypGuidwas being parsed by the wrong property type. Despite having a fixed size, it is still a variable length property. - Fixed all of the setters not working. I didn't know they needed to use the same name as the getter, and I swear they were working at some point with the current setup. Some sources online suggested the original form should work, which is even stranger.
- Unified all line endings to LF instead of a mix of CRLF and LF.
- Changed enums
BCTextFormatandBCLabelFormattoIntFlagenums. The values that exist are for the individual flags and not the groups of flags. - Made
FieldInfowritable, however it can no longer be directly converted to bytes since it requires additional information outside of itself to convert to bytes. It still retains atoBytesmethod, however it requires an argument for the additional data. - Fixed
UnsupportedAttachmentinverting theskipNotImplementedkeyword argument. - Fixed
DMPaperSizenot being a subclass ofEnum. - Extended values for
DMPaperSize. - Removed unneeded structs.
- Fixed exports for
extract_msg.constants.st. - Updated various parts of the documentation to improve it and make it more consistent.
- Fixed
Recipient.existsTypedPropertyandAttachmentBase.existsTypedPropertyhaving the wrong return type. - Removed "TODO" markers on
OleStreamStructand finalized it to only handle the OLEStream for embedded objects. - Fixed type annotations for
extract_msg.utils.fromTimeStamp. - Added new function
extract_msg.properties.prop.createNewProp. This function allows a new property to be created with default data based on the name. The name MUST be an 8 character hex string, the first 4 characters being the value for the property ID and the second 4 being the value for the type. - Fixed
VariableLengthProp.reservedFlagsreturning the wrong type. - Adjusted many of the property structs to make it easier to use them for packing.
- Renamed some of the property structs to be more informative.
- Fixed
ServerIDusing the wrong struct (would have thrown an exception for not having enough data). - Unified signing for integer properties. All integer properties are considered unsigned by default when read directly from the MSG file, however specific property accessors may convert to signed if the property is specifically intended to be so. This does not necessarily include properties that are stored as integers but have a value that is not an integer, like
PtypCurrency. - Changed
extract_msg.constants.NULL_DATEto be a subclass ofdatetime.datetimeinstead of just a fixed value. Functions that return null dates may end up returning distinct versions ofNullDate(the newly created subclass), however all instances ofNullDatewill register as being equal to each other. Existing equality comparisons to theNULL_DATEconstant will all function as intended, howeverischecks may fail for some null dates. - Changed currency type to return a
Decimalinstance for higher precision. - Made
FixedLengthPropandVariableLengthPropwritable. Only the propertyflagsfromPropBaseis writable. This also includes the ability to convert them to bytes based on their value. - Fixed many issues in
VariableLengthPropregarding it's calculation of sizes. - Removed
hasLenfunction. - Changed style to remove space between variable name and colon that separates the type.
- Corrected
InvaildPropertyIdErrortoInvalidPropertyIdError. - Updated to
olefileversion 0.47. - Updated
RTFDEminimum version to 0.1.1. - Changed dependency named for
compressed-rtfto remove minor typo (it should forward to the correct place regardless, but just to be safe). - Fixed issues with implementation of
OleWriterwhen it comes to large sets of data. Most issues would fail silently and only be noticeable when trying to open the file. If your file is less than 2 GB, you would likely not have notices and issues at all. This includes adding a new exception that is throw from thewritemethod,TooManySectorsError. - Fixed some issues with
OleWriterthat could cause entries to end up partially added if the data was an issue.
v0.46.2
- Adjusted typing information on regular expressions. They were using a subscript that was added in Python 3.9 (apparently that is something the type checker doesn't check for), which made the module incompatible with Python 3.8. If you are using Python 3.9 or higher a version check will switch to the more specific typing.
v0.46.1
- [TeamMsgExtractor #394] Fix typo in props that caused the wrong number of bytes to be given to a struct.
v0.46.0
- [TeamMsgExtractor #95] Adjusted the
overrideEncodingproperty ofMSGFileto allow automatic encoding detection. Simply set the property to the string"chardet"and, assuming thechardetmodule is installed, it will analyze a number of the strings to try and form a consensus about the encoding. This will ignore the specified encoding only if if successfully detects. Otherwise it will log a warning and fall back to the default behavior. - [TeamMsgExtractor #387] Changed
extract_msg.utils.decodeRfc2047to not throw decoding errors if the content given is not ASCII. - [TeamMsgExtractor #387] Changed header parsing policy to
email.policy.compat32to prevent partial parsing of quoted header fields. - [TeamMsgExtractor #388] Updated documentation of
MSGFile.exportto specify that updated fields on anMSGFileinstance (and it's subclasses) will not be reflected in the result of the function. Many of the functions do use the newest version of a cached_property, but this is not one of them. - Removed methods deprecated in
v0.45.0. - Changed the base class of
EntryIDfrom no base class toabc.ABC. - Added
positionproperty toEntryIDto tell how many bytes were used to create theEntryID. - Added a number of properties to
MSGFilefrom [MS-OXCMSG]. - Moved some properties down to
MessageBasefrom it's subclasses. - Added support for Journal objects.
- Changed internal code of
PermanentEntryIDto correctly parse the data. Previously the distinguished name did not actually end at the null character, instead ending at the end of the bytes provided. If there was trailing data, it would be captured inadvertently. - Finished definition for
StoreObjectEntryID. - Added new keyword arguments for MSG files:
dateFormatanddatetimeFormat. These allow the user to easily override the strings being used for format dates and dates that include a time component, respectively.- In unifying all the formats into 2 options, you may notice that some will look a bit different starting from this version, as there was an unfortunately large amount of variation.
- Fixed code for
MessageBase.parsedDatewhich could have incorrect values. - Fixed issues with
MessageBase.dateand related things either being incorrectly documented or doing things that are not specified by the documentation. It was supposed to have been changed to usedatetimeobjects, but it was still using strings. - Removed unused function
extract_msg.utils.isEmptyString. - Removed unused function
extract_msg.utils.properHex. - Added a
getStreamhelper function toCustomAttachmentHandler. - Added a
getStreamAshelper function toCustomAttachmentHandler. - Added new custom attachment handler for journal-associated attachments.
- Changed
EntryID.autoCreateto returnNoneif givenNoneor empty bytes. - Changed
EntryID.autoCreateto raise aFeatureNotImplementedexception if no valid entry ID class is found. - Fix typing annotations for
CustomAttachmentHandler. - Removed unneeded
imapclientdependency. - Changed
getJsonto have values be null if they aren't found rather than an empty string. - Implemented the
getJsonmethod correctly for a number of classes. - Changed
Task.percentCompleteto always return a float. - Changed the
NotImplementedErrorfor custom attachment handler not being found toFeatureNotImplemented. Additionally, changed the error message to specify the CLSID found on the attachment to better enable people to report issues. - Changed code for
RecipientandMessageBasethat makes it rely onMessageBase.recipientTypeClassto determine the class to use for therecipientTypeproperty. Adjusted the typing ofRecipientto have it reflect the type that will be used. - Correctly changed the returned value for
ResponseStatus.fromIterto actually return a List instead of a set. - Filled out typing information for a significant portion of the module where variables or functions were missing it. This includes the entirety of the constants submodule.
- Corrected a number of minor issues.
- Extended values for
DVAspectenum. - Added new enums to go with parsing for
OLEPresentationStream. - Changed
NNTPNewsgroupFolderEntryID.newsgroupNameto bytes instead of string since it is ANSI. - Fixed an issue that would cause headers to fail to parse properly if the header text starts with "Microsoft Mail Internet Headers Version 2.0" which is common on some MSG files. This is fixed by stripping that from the beginning before actually parsing the text. This is to circumvent CPython issue #85329, confirmed to still exist in at least some of the supported Python versions.
- Added
listDirandslistDiras methods toAttachmentBase,Recipient, andNamed. These always exclude the prefix, returning as if their directory is the root of the object. This allows the named to be directly used for accessing those files. - Numerous spelling fixes in docstrings, comments, and exceptions.
- Reduced the amount of initialization performed by
MessageBase. Much of this initialization was there from before a lot of stuff changed tocached_propertyand a number of internal variables were being used. Now all of the relevant variables will be initialized by the way they are accessed. - Added new exception
DependencyError. - Changed the errors for missing optional dependencies from
ImportErrortoDependencyError. - Removed all instances of the
rawDataproperty in favor of thetoBytesmethod. For now, many of these will simply return the raw data used, specifically those that are still unmodifiable. Any whose properties have the ability to be modified will have properly implemented versions. These classes also allowNoneto be passed as the value for their data, which will be the default if no arguments have been passed to the constructor. If no arguments orNoneis given as the data, it will create a new instance with default values. This is all in an effort to move towards the ability to create new MSG files and theMSGWriterclass. AlltoBytesmethods will either exclusively returnbytesor will returnNoneto specify that the structure isn't valid to convert to bytes. Structures that may be invalid will be annotated asOptional[bytes]for the return type.- Additionally, these objects will also support the
__bytes__method. If the object returned is not bytes, the method will throw aTypeError.
- Additionally, these objects will also support the
- Removed the individual
PropBaseflag properties and changed the mainflagsproperty to return an enum containing the flags. - Changed various data structs to allow modification and creation of new instances for writing to an MSG file.
- Changed
TZRuleto use unsigned values where applicable. - Changed
TZRuleto require the 14 null bytes (I commented it out completely on accident instead of swapping it to a plain read). It now logs a warning about the bytes not being null. - Removed unneeded function
windowsUnicode. - Moved
FixedLengthProperty.parseTypeto the private API. This was not intended for external use anyways, so leaving it as public API didn't make sense. - Fixed check for type in
ContactAddressEntryIDbeing the wrong value. - Modified
inputToBytesto support objects with the__bytes__method. If the method exists and works then it will be used as a last resort. - Modified
OleWriterto accept objects with a__bytes__method for the data to use for an entry. - Added
__bytes__method toMSGFile. This is equivalent to callingMSGFile.exportBytes.
v0.45.0
- BREAKING: Changed parsing of string multiple properties to remove the trailing null byte. This will cause the output of parsing them to differ.
- Updated typing information for some functions and classes.
- Fixed a bug with
MessageSignedBase.attachmentsthat would cause it to return None instead of an empty list if the number of normal attachments was 0 was the error behavior was set to ignore violations of the standard. - Updated
MessageSignedBase.attachmentsto usefunctools.cached_propertyinstead ofproperty. - Fixed spelling errors in some exception strings.
- Made
NamedPropertyBasea subclass ofabc.ABC. - Cleaned up some of the code for named properties to remove unused variables and remove inefficient code.
- Changed
PropBaseto be a subclass ofabc.ABC. - Added detailed versioning info to the README.
- Deprecated many private functions, including methods on many of the classes. Of primary note are
_getStreamand_getStringStream, which have been moved to the public API asgetStreamandgetStringStream. Any deprecated functions still exist and will forward to a public API function if they are not being removed. Additionally, all internal usage of them has been removed. This change is one of the big preparations that is needed for the1.0.0release.- As mentioned, a number of these deprecated functions have been moved to the public API. It is recommended that you run tests with your code after enabling deprecation warnings to see what should be changed.
- Removed items deprecated in or before
0.42.0. - Changed the API for the private method
_genRecipient. This is not intended for use outside of the module except for subclasses. The change removed the allowance of ints for the second argument, requiring that it be a valid enum type. - Convert many enum types to
IntEnum. - Extended functionality of
PropertiesStoreto allow for integer property names and getting a property based on just the ID. You can also get a list of all properties that use a given ID. - Added new function
PropertiesStore.getPropertieswhich gets a list of all properties matching the property ID. Return type is a list ofPropBaseinstances. - Added new function
PropertiesStore.getValuewhich looks for the first matchingFixedLengthPropand returns the value from it. - Improved internal code related to getting a property with a potentially unknown type.
- Added a number of entirely new functions to the public API on
MSGFile,AttachmentBase,PropertiesStore, andRecipientobjects:getMultipleBinary: Gets a multiple binary property as a list ofbytesobjects.getSingleOrMultipleBinary: A combination ofgetStreamandgetMultipleBinarywhich prefers a single binary stream. Returns a singlebytesobject or a list ofbytesobjects.getMultipleString: Gets a multiple string property as a list ofstrobjects.getSingleOrMultipleString: A combination ofgetStringStreamandgetMultipleStringwhich prefers a single string stream. Returns a single bytes object or a list of bytes objects.getPropertyVal: Shortcut forinstance.props.getValuethat allows new behavior to be added by overriding it.getNamedProp: Shortcut forinstance.namedProperties.get((propertyName, guid), default)that allows new behavior to be added by overriding it.
- Removed
Named._getStringStreamandNamed.sExists. The named properties storage will always use regular streams and not string streams. - Changed all
Namedmethods to no longer have a prefix argument. The prefix should always be false sense the named property mapping will only exist in the top level directory. - Adjusted
tryGetMimeTypeto allows any attachments whosedataproperty would return abytesinstance. - Changed internal code to use public API functions wherever possible. This includes making many private API functions use calls to the public API for getting bits of data.
- Fixed potential issue with
AttachmentBase.clsidwhich had the potential to cause some attachments to fail to generate a CLSID. - Outright removed or changed a significant portion of the private API. I have rarely, if ever, seen references to these parts, so this should cause you no issues. Some of these have also been moved to the public API, either identically or with changes, and the mapping is as such:
_getNamedAs->getNamedAs: Changed to always require a conversion argument. If you were previously using it to plainly get a named property or to handle the properly being None or a real value, you should use the return value ofgetNamedPropinstead._getPropertyAs->getPropertyAs: Same as above, usegetPropertyValinstead for None or plain access._getStreamAs->getStreamAs,getStringStreamAs: Once again, see above. UsegetStreamandgetStringStream, respectively.
v0.44.0
- Fixed a bug that caused
MessageBase.headerInitto always returnFalseafter the 0.42.0 update. - Changed
MessageBase.headerInitto a property. - Fixed
extract_msg.utils.__all__. - Minor reorganization within
extract_msg/utils.py. - Minor changes to docstrings.
- Minor README updates.
- Fix issue with folded header fields decoding incorrectly when given to
extract_msg.utils.decodeRfc2047.
v0.43.0
- [TeamMsgExtractor #56] [TeamMsgExtractor #248] Added new function
MessageBase.asEmailMessagewhich will convert theMessageBaseinstance, if possible, to anemail.message.EmailMessageobject. If an embedded MSG file on aMessageBaseobject is of a class that does not have this function, it will simply be attached to the instance as bytes. - Changed imports in
message_base.pyto help with type checkers. - Changed from using
email.parser.EmailParsertoemail.parser.HeaderParserinMessageBase.header. - Changed some of the internal code for
MessageBase.header. This should improve usage of it, and should not have any noticeable negative changes. You man notice some of the values parse slightly differently, but this effect should be mostly suppressed.
v0.42.2
- Fix bug in
AttachmentBase.mimetypethat would cause it to throw an error when accessed. This bug was introduced inv0.42.0.
v0.42.1
- Fixed some constants being accessed with the wrong name (names were changed in reorganization).
- Removed unused regular expression.
v0.42.0
- [TeamMsgExtractor #372] Changed the way that the save functions return a value. This makes the return value from all save functions much more informative, allowing a user to separate if a file or folder (or if more than one) was saved from the function. It also guarantees that all classes from this module will return the relevant path(s) if data is actually saved.
- [TeamMsgExtractor #288] Added feature to allow attachment save functions to simply overwrite existing files of the same name. This can be done with the
overwriteExistingkeyword argument from code or the--overwrite-existingoption from the command line. - [TeamMsgExtractor #40] Added new submodule
custom_attachments. This submodule provides an extendable way to handle custom attachment types, attachment types whose structure and formatting are not defined in the Microsoft documentation for MSG files. This includes a handler to at least partially cover support for Outlook images. - [TeamMsgExtractor #373] Added the
encodingsubmodule for encoding tasks, including proper support for Microsoft's implementation of CP950. This gets added to the codecs list as "windows-950".- Added infrastructure to make it easy to add variable-byte (up to two bytes) encodings and single-byte encodings.
- Added the following encodings:
- windows-874
- x-mac-ce
- x-mac-cyrillic
- x-mac-greek
- x-mac-icelandic
- x-mac-turkish
- Fixed an issue in the save functions that left the possibility for the zip files to not end up closing if the save function created it and then had an exception.
- Added new property
AttachmentBase.clsidwhich returns the listed CLSID value of the data stream/storage of the attachment. - Changed internal behavior of
MSGFile.attachments. This should not cause any noticeable changes to the output. - Refactored code significantly to make it more organized.
- Changed the exports from the main module to only include an important subset of the module. For other items, you'll have to import the submodule that it falls under to access it. Submodules export all important pieces, so it will be easier to find.
- This includes having many modules be under entirely new paths. Some of these changes have been done with no deprecation, something I generally try to avoid. This is happening at the same time as the public API is significantly changing, which makes it more acceptable.
- Fixed
__main__using the wrong enum for error behavior. - Fixed
Named.getbeing severely out of date (it's not used anywhere by the module which is why it wasn't noticed). - Fixed
Named.__getitem__being entirely case-sensitive. - Switched much of the internal code (and the
treePathproperty of all classes that have it) to usingweakref.ReferenceTypeto avoid hard cyclic references. - Fixed
Recipient._getTypedStreamnever returning a value. - Added additional type hints in various places.
- Modified tests.py to only run if it is run as a file instead of imported.
- Changed
knownMsgClassto a private function since it is explicitly not being exported by any part of the module. - Removed unused function
getFullClassName. - Fixes to the HTML body when saving as HTML will no longer require the
preparedHtml/--prepared-htmloption. - Removed unused exceptions.
- Entirely reorganized the way attachments are initialized, including the class that will be used in various circumstances. Embedded MSG files, custom attachments, and web attachments will all use dedicated classes that are subclasses of
AttachmentBase.- With this change, the way to specify a new
Attachmentclass is to override the function used when creating attachments. This can be done by passingattachmentInit = myFunctionas an option toopenMsg. This function MUST return an instance ofAttachmentBase.
- With this change, the way to specify a new
- Added first implementation of web attachments. Saving is not currently possible, but basic relevant property access is now possible. Saving will not be stopped by this attachment if
skipNotImplemented = Trueis passed to the save function. - Changed the option to suppress
RTFDEerrors to fall under theErrorBehaviorenum. Usage of the original option will be allowable, but is being marked as deprecated. However, it is still a dedicated option from the command line.- Also fixed the option not properly ignoring some
RTFDEerrors, specifically the ones that it is normal for the module to throw.
- Also fixed the option not properly ignoring some
- Removed some constants that are not used by the module.
- Updated to support
RTFDEversion0.1.0. Users encountering random errors from that module should find that those errors have disappeared. If you get errors from it still, bring up the issue on their GitHub. - Fixed bug that would cause weird behavior if you gave an empty string as the path for an MSG file.
- Added support for
IPM.StickyNote. - Fixed an issue that would cause MSG file to never close if an error happened during any of the
__init__functions for MSG classes. - Removed unneeded
chardetdependency. - Removed
Contact.__init__as it didn't provide any unique behavior. - Changed the documentation of
openMsgto specify that it accepts all options recognized byMSGFilesubclasses, allowing the doc string to not be modified every time one of them is changed.- Changed the documentation of various
__init__methods to do the same thing.
- Changed the documentation of various
- Added
dataTypeproperty toAttachmentBaseandSignedAttachmentfor checking the class that the data will be, if accessible. ReturnsNoneif the data is inaccessible, including because accessing it would throw an exception. - Added new enum
InsecureFeaturesand optioninsecureFeatures. This option will allow certain features with security implications to be used for files that you trust. Currently the only feature it supports is the usage ofPIL/Pillowto open and modify images. All features like this will be opt-in to reduce possible vulnerabilities. - Modified all custom exceptions the module uses to derive from a single base class for better organization.
- Added new exceptions to handle some of the situations previously handled by base Python exceptions.
- Changed internal handling of the
prefixoption forMSGFile.__init__(and thereforeopenMsg). If you are not setting this manually, you should notice little difference. - Made enums less strict and converted all using
fromBitsto beIntFlagenums. - Fixed
CalendarBase.keywordsbeing blatantly incorrect (it was so bad I don't know how it slipped through). - Fixed
Contact.genderbeing blatantly incorrect. - Fixed sender not being properly decoded in some circumstances.
- Changed behavior of
MSGFileto haveolefileraise defects of typeDEFECT_INCORRECTand above instead of justDEFECT_FATAL. Uncaught issues ofDEFECT_INCORRECTcan often cause the module to have parsing issues that may be misleading, this just ensures the issue is clarified. This behavior can be reverted back to the previous withErrorBehavior.OLE_DEFECT_INCORRECT. - Fixed potential issues that may have made is possible for certain attachments to ignore filename conflict resolution code.
v0.41.5
- Fixed an issue from version
0.41.3where the header being present but missing theFromfield would cause an exception.
v0.41.4
- Fixed an issue in the last version that would break the decoding function if the contents were not encoded.
- Updated
tzlocaland allow future updates forcompressed_rtfandebcdic.
v0.41.3
- [TeamMsgExtractor #365] Fixed an issue that would cause certain values retrieved from the header to not be decoded properly. It does this when retrieving the values, so nothing about the header has been changed.
- Added new property
MessageBase.headerTextwhich is the text content of the header stream. Adjusted other things to use this instead of trying to retrieve the stream directly in multiple places. - Added typing to
MessageBase.header.
v0.41.2
- Updated annotations on
MessageBase.save. - Added new enum
BodyTypes. - Added property
MessageBase.detectedBodiesfor detecting what bodies have been stored (not generated by the module) in the .msg file.
v0.41.1
- [TeamMsgExtractor #362] Fixed an issue with the removal of the
--devoption missing one of the checks (I swear I actually tested it).
v0.41.0
- [TeamMsgExtractor #357] Fixed an issue where the properties stream being absent would raise an error message that was not clear.
- [TeamMsgExtractor #357] Added a way to suppress
StandardViolationErrorfor poorly created files. This may cause issues you might not expect since this exception is meant to stop the processing for a reason. - [TeamMsgExtractor #223] Finally got around to dealing with signed attachments that are embedded MSG files.
SignedAttachment.datanow returns eitherbytesorMSGFile.SignedAttachmentalso now has aasBytesproperty that will return the bytes that created the signed attachment, regardless of if it is an MSG or not, making it unnecessary to callMSGFile.exportBytesto get the bytes of the embedded MSG file, which can add a small delay to your code. AsAttachmentis a much more complex class, it does not (at least yet) have this property. Also unlikeAttachment,SignedAttachmentwill not throw an exception if the data is an MSG file but is not supported. Instead, it will simply be logged as a exception, but the code will continue. If the data is successfully read as an embedded MSG file, theAttachmentTypewill beAttachmentType.SIGNED_EMBEDDED. - Deprecated
AttachErrorBehaviorin favor of the newErrorBehaviorenum which controls the error behavior for the various parts of the MSG file. Uses of the former will work until the next major version. - Deprecated the
attachmentErrorBehaviorparameter ofMSGFilein favor oferrorBehavior. Uses of the previous will work until the next major version. - Added
treePathproperty toSignedAttachmentto bring it more in line withAttachmentBase. - Fixed bug in rare logging message caused by incorrect type name.
- Removed the
devanddev_classessubmodules, as most of their features are possible using the base code. Additionally, the classes involved because significantly outdated over time. - Removed the
--devargument from the command line. - Changed the message for the standards violation error when an attachment has no specified attachment type and it could not be determined automatically. The error now specifies the path of the attachment that had an issue for easier debugging. Additionally, the log message for it simply not being present has also been changed to show this information.
- Added a dev level log that will output the entire properties mapping if the attachment type is not set. Dev level is 5.
- Fixed a critical error in
Propertiesthat caused the__contains__method to always beFalse. This occurred because it was missing areturnstatement. Fortunately, it looks like only one part of the module was affected due to other parts using a properly written function. - Removed some debug prints that slipped through.
- Changed some parts to use
infor checking that a property exists as opposed tohas_key. The function was there to act more like Python 2. - Deprecated
Properties.has_key. - Removed the
validationsubmodule and all related references. It was pretty outdated and has minimal usage at this point in time. It may come back at some later point. - Changed behavior of
MessageBase.saveso it doesn't save raw when an exception occurs. This behavior may have ended up creating unexpected output which is why it was removed. It was mainly there for debugging in the first place, but is no longer necessary. - Added
__contains__function toNamedclass. - Fixed missing import in
message_base.pythat would only cause problems if something was wrong with the HTML. - Fixed a bad error message appearing when trying to add or use an entry in
OleWriterusing an empty path. - Changed the
InvalidFileFormatErrorfor a missing property stream to aStandardViolationError. - Changed the exception messages for a few exceptions to fix typos and clarify.
- Fixed issues in
_rtf.tokenize_rtfwhich would cause an exception to handle incorrectly and throw an unclear error. - Fixed various small bugs caused by typos.
- Clean up unneeded imports.
- Improved existing
__all__entries and added some where they should be. - Removed default export of the
UnrecognizedMSGTypeErrorexception in favor of exporting the exceptions module. - Removed default export of
properHex. - Improved type checking in many places.
- Fixed issues in
RecurrencePattern.
v0.40.0
- [TeamMsgExtractor #338] Added new code to handle injection of text into the RTF body. For many cases, this will be much more effective as it relies on ensuring that it is in the main group and past the header before injection. It is not currently the first choice as it doesn't have proper respect for encapsulated HTML, however it will replace some of the old methods entirely. Solving this issue was done through the use of a few functions and the internal
_rtfmodule. This module in it's entirety is considered to be implementation details, and I give no guarantee that it will remain in it's current state even across patch versions. As such, it is not recommended to use it outside of the module. - Changed
MessageBase.rtfEncapInjectableHeaderandMessageBase.rtfPlainInjectableHeaderfromstrtobytes. They always get encoded anyways, so I don't know why I had them returning asstr. - Updated minimum Python version to 3.8 as 3.6 has reached end of support and 3.7 will reach end of support within the year.
- Updated information in
README.
v0.39.2
- Fixed issues with
AttachmentBase.namethat could cause it to generate wrong. - Added convenience function
MSGFile.exportByteswhich returns the exported version fromMSGFile.exportas bytes instead of writing it to a file or file-like object.
v0.39.1
- [TeamMsgExtractor #333] Fixed typo in a warning.
- [TeamMsgExtractor #334] Removed
__del__method fromMSGFile. It was there for cleanup, but wasn't planned well enough to stop it from causing issues. It may be reintroduced in the future if I can manage to remove the issues. - [TeamMsgExtractor #335] Fixed some parts of
extract_msg.utils.getCommandArgshaving invalid logic after a previous (rather old) update that caused exceptions when using certain options. - Added new property
treePathtoAttachmentBaseandMSGFile(which adds it to nearly every class). This property is the path to the current instance, represented as a tuple of instances that would be used to get to the current instance. - Added sphinx documentation.
- Fixed an issue in
OleWriterthat would produce corrupted OLE files if the DIFAT needed more than the header.
v0.39.0
- [TeamMsgExtractor #318] Added code to handle a standards violation (from what I can tell, anyways) caused by the attachment not having an
AttachMethodproperty. The code will log a warning, attempt to detect the method, and throw aStandardViolationErrorif it fails. - [TeamMsgExtractor #320] Changed the way string named properties are handled to allow for the string stream to have some errors and still be parsed. Warnings about these errors will be logged.
- [TeamMsgExtractor #324] Fixed an issues with contact saving when a list property returns
None. - [TeamMsgExtractor #326] Fixed a bug that could cause some files to error when exporting.
- Fixed an issue where creation and modification times were not being copied to the new OLE file created by
OleWriter. - Fixed up a few docstrings.
- Fixed a few issues in
MSGFileregarding thefilenamekeyword argument. - Added new argument
rootPathtoOleWriter.fromOleFilefor saving a specific directory from an OLE file instead of just copying the entire file. That directory will become the root of the new one. - Adjusted code for
OleWriterto generate certain values only at save time to make them more dynamic. This allows for existing streams to be properly edited (although has issues with allowing storages to be edited). - Added new function
OleWriter.deleteEntryto remove an entry that was already added. If the entry is a storage, all children will be removed too. - Added new function
OleWriter.editEntryto edit an entry that was already added. - Added new function
OleWriter.addEntryto add a new entry to the writer without anOleFileIOinstance. Properties of the entry are instead set using the same keyword arguments as described inOleWriter.editEntry. - Changed
_DirectoryEntrytoDirectoryEntryto make the more finalized version public. Access to the originals that theOleWriterclass creates should never happen, instead copies should be returned to ensure the behavior is as expected. - Added new function
OleWriter.getEntrywhich returns a copy of theDirectoryEntryinstance for that stream or storage in the writer. Use this function to see the current internal state of an entry. - Added new function
OleWriter.renameEntrywhich allows the user to rename a stream or storage (in place). This only changes it's direct name and not it's location in the new OLE file. - Added new function
OleWriter.walkwhich is similar toos.walkbut for walking the structure of the new OLE file. - Added new function
OleWriter.listItemswhich is functionally equivalent toolefile.OleFileIO.listdirwhich returns a list of paths to every item. Optionally a user can get the paths just for streams, just for storages, or both. Requesting neither will simply return an empty list. Default is to just return streams. - Added a small amount of path validation to
inputToMsgPathwhich is used in a lot of places where user input for a path is accepted. It ensures illegal characters don't exist and that the path segments (each name for a storage or stream) are less than 32 characters. This will be most helpful forOleWriter. - Added many internal helper functions to
OleWriterto make extensions easier and consolidate common code. Many of these involve direct access to internal data which is why they are private.
v0.38.4
- Fix line in
OleWriterthat was causing exporting to fail. - Fixed some issues with the
README.
v0.38.3
- Fixed issues in HTML generation that caused line breaks to be omitted from large sections of the text.
- Fixed issues with requirements file.
v0.38.2
- Fixed new
NameErroraccidentally introduced in the previous version.
v0.38.1
- Added a
__del__method toMSGFileto ensure a bit of proper cleanup should all references to anMSGFileinstance be removed before the file is closed.OleFileIOdoesn't appear to have one, so it's up to us to ensure it is properly closed. Note that thedelkeyword does not guarantee the immediate deletion of the object, and you should take care to close the file yourself. If this is not possible, importing thegcmodule and using it'scollectmethod will free the files if your code has no references to them. - Fixed an import issue with signed messages.
v0.38.0
- [TeamMsgExtractor #117] Added class
OleWriterto allow the writing of OLE files, which allows for embedded MSG files to be extracted. - Added function
MSGFile.exportwhich copies all streams and storages from an MSG file into a new file. This can "clone" an MSG file or be used for extracting an MSG file that is embedded inside of another. - Added hidden function to
MSGFilefor getting theOleDirectoryEntryfor a storage or stream. This is mainly for use by theOleWriterclass. - Added option
extractEmbeddedtoAttachment.save(--extract-embeddedon the command line) which causes embedded MSG files to be extracted instead of running their save methods. - Fixed minor issues with
utils.inputToMsgPath(renamed fromutils.inputToMsgPath). - Renamed
utils.msgpathToStringtoutils.msgPathToString. - Made some of the module requirements a little more strict to better version control. I'll be trying to make periodic checks for updates to the dependency packages and make sure that new versions are compatible before changing the allowed versions, while also trying to keep the requirements a bit flexible.
v0.37.1
- Added option to save function (including
MSGFile.saveAttachments) to skip any attachments marked as hidden. This can also be done from the command line with--skip-hidden. - Fixed an issue that could cause an
MSGFileto be included in the attachment names instead of just strings. - Fixed up several comments.
- Fixed bug that would cause spaces at the end of the subject or filename to break the module.
v0.37.0
- [TeamMsgExtractor #303] Renamed internal variable that wasn't changed when the other instances or it were renamed.
- [TeamMsgExtractor #302] Fixed properties missing (a fatal MSG error) raising an unclear exception. It now uses
InvalidFileFormatError. - Updated
READMEto contain documentation on command line option added in previous version. - Removed
MSGFile.mainPropertiesafter deprecating it in v0.36.0. - Added new
PropertiesIntelligencetype:ERROR. This type is used when a properties instance is created but has something wrong with it that is not necessarily fatal. Currently the only thing that will cause it is the properties stream being 0 bytes.
v0.36.5
- Documented and exposed to the command line the ability to save the headers to it's own file when saving the msg file. Thanks to martin-mueller-cemas on GitHub for fixing this (and telling me I can switch the branch a pull request points to). I don't know when I added the code to allow the user to do it, but somehow it was never documented and never exposed to the command line.
v0.36.4
- [TeamMsgExtractor #291] Fixed typo in
MSGFile.saveRawthat may have existed for a significant amount of time. It was using the wrong function (same name, but with different capitalization) but was hidden untilMSGFilestopped being derived fromOleFileIO. - Added logging code to
MessageBase.getSavePdfBodyto log the list that is going to be used to runwkhtmltopdf. This is mainly for debugging purposes, to allow users to potentially see why their arguments may be failing. - Updating funding information on GitHub and the
READMEwith more ways to support the module's development. - Fixed one of the exceptions in
MessageBase.getSavePdfBodynot using an fstring which caused it to omit information. - Changed the way
wkhtmltopdfis called to patch a possible security vulnerability. This also seems to have fixed [TeamMsgExtractor #291].
v0.36.3
- Added an option to skip the body if it could not be found, rather than throwing an error. This will cause no file to be made for it in the event no valid body exists. For the save functions, this option is
skipBodyNotFoundand from the command line the option is--skip-body-not-found. - Fixed a bug that caused contacts to save the business phone with two colons instead of 1.
v0.36.2
- [TeamMsgExtractor #286] Fix missing import.
v0.36.1
- [TeamMsgExtractor #283] Added file for typing recognition.
- Added new option to allow messages with
NotImplementedErrorattachments to at least save everything not related to them. From the command line this would be--skip-not-implementedor--skip-ni. From the save function, this would be theskipNotImplementedoption.
v0.36.0
- Improved type hints to tell when a function may not return anything.
- Added support for the
reportTagproperty toMessageBase. I noticed this was one of the properties forIPM.Outlook.Recallso I decided to implement it. I'll work on ensuring all of[MS-OXOMSG]and[MS-OXCMSG]are implemented at a later date, including splitting offREPORTinto it's own class, because it is it's own class. - Fixed a few minor issues in some properties. Mostly some
boolreturning properties should have been returning False when they were not found instead of None. Ones that may return None are specifically typed as optional in the source code. Unfortunately using thehelpcommand doesn't seem to show the return type for properties for me at least on Python 3.9 and below. - Fixed
Attachment.savereturning apathlib.Pathobject instead of astrafter the conversion topathlib. - Added code to allow
pathlibobjects inutils.openMsgBulk. It usesglob.globwhich cannot take apathlibobject. - Removed
PermanentEntryIDfromEntryID.autoCreate. It shares a provider ID withAddressBookEntryID, and as such would never generate from it anyways. If you specifically need aPermanentEntryIDyou will have to instantiate it manually. Additionally, the entry has been removed fromenums.EntryIDType. - Fixed the GUIDs for some of the EntryID structures being returned as bytes instead of a standard GUID string. This is considered a breaking change and is the reason for the full version increase.
- Improved consistency of properties (the Python kind, not the MSG kind) so that properties for the raw data used to create an instance will all use the same name. Not all classes have this, but the ones that do will now all use
rawDataas the property name. - Fixed the properties header being read in a way that actually swapped the values in it.
- Modified
MSGFileto use the same name for Properties instances as all other classes.MSGFile.mainPropertieswill currently raise aDeprecationWarninginstead of outright failing to help ease the transition. UseMSGFile.propsinstead.
v0.35.3
- [TeamMsgExtractor #280] Fix typing issue in
message_base.py.
v0.35.2
- Made a change to the argument handling for
--no-folders. Since it requires--attachmentsOnlyto work, I simply made it error when it's not given to avoid confusion. - Updated README.
- Added
-vto be used instead of--verbose. This allows you to do-vvvfor verbosity level 3.
v0.35.1
- Fixed a few property conflicts that I missed in the last release (forgot to run the helper script before releasing).
v0.35.0
- [TeamMsgExtractor #206] Implemented full support for Post objects, including the ability to save them.
- [TeamMsgExtractor #212] Implemented full support for Task objects, including the ability to save them. This also includes TaskRequest objects.
- [TeamMsgExtractor #110] Implemented full support for Contact objects, including the ability to save them.
- [TeamMsgExtractor #143] Rewrote the system used for
Appointmentobjects to include all of the objects specified in [MS-OXOCAL]. Name changed toAppointmentMeeting. Completed support for Appointment objects, including the ability to save them. - [TeamMsgExtractor #243] Added optional dependency
mimetype-magic(installable using themimeextra) which helps to identify attachments that do not give a mime-type. - [TeamMsgExtractor #274] Apparently the properties stream can have random garbage at the end of it (outlook generate the file that showed this) so code was added to ensure it wouldn't break everything.
- [TeamMsgExtractor #274] Made sure that the save function would report if it failed to find or generate a valid body. Specifically, if you were just trying to use the plain text body but it didn't exist (the stream didn't exist, not that the stream was empty) it would silently pass, which was bad behavior. Additionally,
allowFallbackwill change the message to specify that current options were not usable for getting a valid body. - [TeamMsgExtractor #207] Changed behavior for max date. Apparently it looks like it is supposed to be August 31, 4500 at 11:59 PM. However, in case this needs to change, we have created a constant called
extract_msg.constants.NULL_DATEto represent this that you can use in your code to not have to worry about changing your code if we check it. - Moved a few more minor constants to
enums. - Added support for many internal data structures, specifically Entry ID structures.
- Refactored classes from
extract_msg.datato submoduleextract_msg.structures. - Added
python_requiresto setup.py as I noticed that it was missing. - Due to new saving requirements, adjusted the way header injection worked all around. Functions are now built-in to
MessageBase.getSaveXBodyfunctions have also been moved down to be defined inMessageBase. If the extension class needs to specify custom behaviors for creating the save bodies, these functions will need to be overridden.- For saving,
MessageBase(being the lowest one to currently contain bodies) has a few new properties. These properties represent the injection strings that will be injected into the bodies for the header, with an additional property to specify what properties map to what part of the format string. SeeMessageBase.headerFormatPropertiesfor more information and an example of how to implement this in your own class. - Injection strings in constants have been removed in favor of dynamic generation, which only creates what is needed. No you will no longer see an empty Bcc field in your messages when you save them.
- Plain text bodies now also use this injection, making it easy to change the header in all bodies by overwriting a single property that tells the program what data to put where.
- For saving,
- Fixed issue in encapsulated RTF header that caused the "To" field to not be present. I had to write them by hand, so it was bound to happen. They are now dynamically generated by each instance, so these fields should always appear.
- All save code has been moved down from
MessageintoMessageBasefor convenience.Messageexists now for specific checking and for future specializations. This also means that anything that is aMessageBasenow has the entire framework for saving built-in, with easy way to change details. - Fixed bad property in
Contact. - Created save function for
Contact. Saving, though it exists, is rather minimal and is limited to plain text and HTML. - Significantly extended the
Contactclass's properties. - Adjusted the naming of a few
Contactproperties to better match the microsoft names.firstName->givenName.lastName->surname.businessPhone->businessTelephoneNumber- Etc.
- Changed existing fax properties to give a dictionary of the properties they actually contain rather than just the number. This makes them behave like the newly added email properties.
- Fixed issues with
Taskproperties being incorrect. - Added implementation for PtypErrorCode.
- Changed behavior of
Properties.dateto only return the submit time. This is to ensure messages that were never sent do not have a sent date. - Changed behavior of
MessageBase.dateto only return a send date if the message has been sent. For messages with no flags, it assumesTrue. - Generally brought saving behavior closer to the way outlook handles it.
- Made
SignedAttachmentandBaseAttachmentmore similar by adding properties to each that are shared.BaseAttachmentnow have anameproperty andSignedAttachmentnow havelongFilenameandshortFilename. - Fixed issue in HTML saving that would cause some characters to be dropped when rendering them due to how the header injection worked.
- Removed
__init__methods from MSG classes that don't change it. This ensures notes are easily passed down. - Correction to last comment, one max date was supposed to be at that date, but another max date is at a different date of the same year.
- Changed the way that
PtypTimeis handled, making it a single function inutils. - Upgraded dependencies to newer versions (some really need to be newer, like
tzlocal, for best results). Included dependencies arebeautifulsoup4andtzlocal. - Fixed an issue where zip file naming conflicts always failed in zip files for attachments. Both the embedded msg and the plain attachments would fail.
- Actually fixed the issue that would break the main loop.
- MSGFile no longer inherits directly from
OleFileIO. While I would prefer to do that, the__init__method for it is rather expensive, and allowing embedded msg files to directly share each other's instances ofOleFileIOwould improve speed immensely. - Fixed attachments not being preemptively loaded when
delayAttachmentswasFalse. utils.openMsgnow delays attachments while loading the file to get the class type. This means all time for attachments is cut in half as they are only ever loaded once. It also means that files that won't open due to attachments will error a little later, but this shouldn't be a problem.- Changed named properties
Guidback to constants. This has to do with the next entry. - Fixed a major issue in named properties. Apparently the ID is not enough and you must have the GUID as well, as multiple properties can share the exact same ID.
- Added option
--no-foldersto the command line allowing you to save all attachments from a set of MSG files into a single folder. - Added option
--skip-embeddedto the command line to skip saving embedded MSG files. - Added option
skipEmbeddedtoAttachment.save(and all other related save methods that call it) to skip saving an embedded MSG file. - Changed
__main__so that it opens the zip file there instead of relying on everything it calls to do it again and again. - Changed the behavior of
--verboseto allow it to be stacked for more verbosity. Specifying it once turns on warnings, twice for info, and three times for debug. Not specifying it only turns on error logging.
v0.34.3
- Fixed issue that may have caused other olefile types to raise the wrong type of error when passed to
openMsg. - Fixed issues with changelog format.
- Fixed issue that caused progress to sometimes break the main loop when a file had Unicode characters if the console it was writing to didn't support them.
- Added option to
MessageBase(and subsequentlyopenMsg) that allows you to override the code being used for deencapsulation. SeeMessageBase.__init__for details on how to create an override function. - Added additional msg class types that I found to the list of known class types.
v0.34.2
- [TeamMsgExtractor #267] Fixed issue that caused signed messages that were .eml files to have their data field not be a bytes instance. This field will now always be bytes. If a problem making it bytes occurs, an exception will be raised to give you brief details.
- Added function
utils.unwrapMultipartthat takes a multipart message and acquires a plain text body, an HTML body, and a list of attachments from it. These attachments are returned asdicts that can easily be converted toSignedAttachments. It replaces the logicmailbitswas being used for, and as suchmailbitsis no longer required. The module may be reintroduced in the future. - Added new property
emailMessagetoSignedAttachment, which returns the emailMessageinstance used to get data for the attachment.
v0.34.1
- Added convenience function
utils.openMsgBulk(imported toextract_msgnamespace) for opening message paths with wildcards. Allows you to open several messages with one function, returning a list of the opened messages. - Added convenience function
utils.unwrapMsgwhich recurses through anMSGFileand it's attachments, creating a series of linear structures stored in a dictionary. Useful for analyzing, saving, etc. all messages and attachments down through the structure without having to recurse yourself. - Fixed an issue that would cause signed attachments to not properly generate.
v0.34.0
- [TeamMsgExtractor #102] Added the option to directly save body to pdf. This requires that you either have
wkhtmltopdfon your path or that you provide a path directly to it in order to work. Simply passpdf = Trueto save to turn it on. More details in the doc for the save function. You can also do this from the command line using the--pdfoption, incompatible with other body types. - Added
--globoption for allowing you to provide an msg path that will evaluate wildcards. - Removed per-file output names as they weren't actually functional and currently add too much complexity. If anyone knows a way to handle it directly with
argparselet me know. - Added
chardetas a requirement to help work around an error inRTFDE. - Looking into some of the unsupported encodings revealed at least one to be supported, but was missed due to the name not being registered as an alias.
- Fixed a bug that prevented an HTML body from the plain text body when it couldn't be generated any other way. This happened when the RTF body existed and the plain text body existed, but the RTF body was not encapsulated HTML.
- Due to several of the issues with RTFDE preventing saving due to exceptions, the option
ignoreRtfDeErrorshas been added to theMessageBaseclass andopenMsg. It can also be enabled from the command line with--ignore-rtfde. This option will make it so all exceptions will be silenced from the module.
v0.33.0
- [TeamMsgExtractor #212] Added support for Task objects.
- Added additional parameter
overrideClassto all_ensureSetXtype functions. This parameter is a class to initialize using the data, if provided. The data will be provided as the first argument to the__init__function of the class if the data is notNone. If the data is done, the value set and returned from_ensureSetXwill beNone. This can be overridden using the second added parameter, which defaults to this behavior. Simply setpreserveNonetoFalseto force the data to be passed to the class. Keep in mind that this means the class will receiveNoneas it's first parameter. - Updated
Namedclass to make it more like the dictionary it is build on top of. Specifically, addedkeys,values, anditemsas functions. - Fixed issue in
Namedwhere old code wasn't deleted, causing some named properties to be completely omitted. - Improved efficiency of
Namedby making it sogetno longer copied the entire dictionary repeatedly while looking for the property. - Fixed typo in
Attachmentthat caused embedded msg files to still regenerate theNamedinstance even though they should have been using the parent's. - Updated fields in
MSGFileto use enums. - Added additional properties to
MSGFile. - Significant cleanup.
v0.32.0
- [TeamMsgExtractor #217] Redesigned named properties to be much more efficient. All in all, load times for MSG files are significantly improved all around.
- Named properties load dynamically.
- Named properties use a single shared instance for embedded msg files to contain all of the entries, and tiny individual instances for actually accessing based on what is accessing.
Namedhas been updated so it's methods are more standardized.Namedis no longer usable to directly access property values. Instead, access to them is done throughNamedProperties._registerNamedPropertyhas been removed due to no longer being part of the named properties framework.
- Actually removed the exception I meant to remove in 0.31.0.
- Changed
Attachment.typeto an enum instead of a string. This makes it easier to see all of the possible values. - Added option to allow attachment saving to be skipped when calling
Message.save. - Added
typeproperty to all attachment types.AttachmentBaseusesAttachmentType.UNKNOWN,BrokenAttachmentusesAttachmentType.BROKEN, andUnsupportedAttachmentusesAttachmentType.UNSUPPORTED.
v0.31.1
- Updated signed attachment mimetype property from
mimetomimetypeto match with the regular attachment property.
v0.31.0
- [TeamMsgExtractor #223] First implementation of support for signed messages.
- [TeamMsgExtractor #256] Extended the properties of the attachment class to allow for access to more data.
- Fixed some issues with the contact class (some of the code was left unfinished).
- Converted many arguments for msg files to keyword arguments (
**kwargs). This is a breaking change if your code provides anything except the path to the msg file. This will, however, make it easier to add arguments without the function signature being polluted with numerous arguments. - Updated exception blocks to ensure they would only catch exceptions that are instances of Exception or a child of Exception. A few were missing this check and so might have caught unwanted exceptions.
- Shifted some minor parts down towards the
AttachmentBaseclass where possible to allow more usability with broken and unsupported attachments. Nothing about these properties required the attachment to be fully processed, which is why they were moved down. From the perspective of using theAttachmentclass nothing has changed. The following properties have been moved:attachmentEncoding,additionalInformation,cid/contentId,longFilename,renderingPosition, andshortFilename. - Added link to readme for supporting development.
- Finally found the behavior to use for missing encoding. As such,
MissingEncodingErroris being removed from the module entirely, as it is no longer a thing. - Moved many constant sets to enums. This makes things a lot more organized.
v0.30.14
- Fixed major bug in
MSGFile.listDirthat would cause it to give the wrong data due to caching. I forgot to have it cache a different result for each set of options, so it would always give the same result regardless of options after the first access. - Minor updates to
setup.py. - Fixed URLs in
README.
v0.30.13
- [TeamMsgExtractor #257] Fixed missing documentation for
customPathkeyword argument toAttachment.save.
v0.30.12
- [TeamMsgExtractor #253] Fixed various docstring issues.
v0.30.11
- [TeamMsgExtractor #249] Fixed an error with opening the body causing the file to throw an uncaught exception in the
__init__function ofMessageBase. The error may still come up later, but you will still have general access to the instance. - Fixed an issue with part of the command line documentation.
v0.30.10
- [TeamMsgExtractor #249] Fixed exception catching not properly accessing the exception (forgot to go through one of the submodules to access exception).
- Updated docstring for
MessageBase.deencapsulatedRtf.
v0.30.9
- Fixed the behavior of
Properties.getso it actually behaves like adict(that was the intent of it, but I did it the wrong way for some reason). - Fixed a type that caused an exception when no HTML body could be found nor generated.
v0.30.8
- Update
imapclientrequirement to>=2.1.0instead of==2.1.0. Currently there are no changes that would prevent current future versions from working.
v0.30.7
- [TeamMsgExtractor #239] Fixed msg.py not having
import pathlib. - After going through the details of the example MSG files provided with the module, specifically unicode.msg, I now am glad I decided to put in some fail-safes in the HTML body processing. One of them does not have an
<html>,<head>, nor<body>tag, and so would have had an error. This will actually prevent the header from injecting properly as well, so a bit of validation before was made necessary to ensure the HTML saving would still work. - Added new exception
BadHtmlError. - Added new function
utils.validateHtml. - Updated README credits.
- Changed header logic to generate manually if the header data has been stripped (filled with null bytes) and not just if the stream does not exist.
v0.30.6
- Small adjustments to internal code to make it a bit better.
- Added
Message.getSaveBody,Message.getSaveHtmlBody, andMessage.getSaveRtfBody. These three functions generate their respective bodies that will be used when saving the file, allowing you to retrieve the final products without having to write them to the disk first. All arguments that are passed toMessage.savethat would influence the respective bodies are passed to their respective functions. - I thought I added the documentation for
attachmentsOnlytoMessage.savebut apparently it was missing. Not sure what happened there but I made sure it was added this time. - Added new option to
Message.savecalledcharset. This is used in the preparation of the HTML body when usingpreparedHtml. This is also usable with--charset CHARSETfrom the command line.
v0.30.5
- [TeamMsgExtractor #225] Added the ability to generate the HTML body from the RTF body if it is encapsulated HTML. If there is no RTF body, then it will do a very basic generation from the plain text body. This process is automatically performed if the HTML body is missing.
- Added the ability for the plain text body to sometimes generate from the RTF body if the plain text body does not exist.
- Added documentation to
Message.savefor theattachmentsOnlyoption.
v0.30.4
- [TeamMsgExtractor #233] Added option to
Message.saveto only save the attachments (attachmentsOnly). This can also be used from the command line with the option--attachments-only. - Corrected error string for incompatible options in
Message.save. - Fixed if blocks being nested weirdly in
Message.saveinstead of being chained withelif. Probably an artifact left from adding support for HTML and RTF.
v0.30.3
- [TeamMsgExtractor #232] Updated list of known MSG class types so the module would correctly give
UnsupportedMSGTypeErrorinstead ofUnrecognizedMSGTypeError.
v0.30.2
- Fixed typo in
utils.knownMsgClass. - Updated contributing guidelines and pull request template. Pull requests were updated to match the new structure of the module while the guidelines were updated for clarity.
v0.30.1
- [TeamMsgExtractor #102] Added property
MessageBase.htmlBodyPreparedwhich provides the HTML body that has been prepared for actual use or conversion to PDF. All attachments that can be injected into the body directly will be injected. - Corrected a mistake in the documentation of
Message.save. - Fixed issue where the command line parser was not checking for raw in conjunction with other saving options.
- Changed
utils.setupLoggingto usepathlib. - Improved reliability of the logic in
utils.setupLogging. - Removed function
utils.getContFileDir. - Cleaned up plain
Exceptionbeing raised at the end ofutils.injectRtfHeaderif the injection failed. This now raisesRuntimeError, an error that should be reported so the injection system can be improved. - Added parsing for PtypServerId to
utils.parseType. - Added
FolderID,MessageID, andServerIDtodata. - Added zip output to the command line.
- Added support for PtypMultipleFloatingTime to
utils.parseType. - Improved documentation of many functions with the exceptions they may raise.
- Changed the way zip files are handled so that files written to them actually have a modification date now.
v0.30.0
- Removed all support for Python 2. This caused a lot of things to be moved around and changed from indirect references to direct references, so it's possible something fell through the cracks. I'm doing my best to test it, but let me know if you have an issue.
- Changed classes to now prefer
super()over direct superclass initialization. - Removed explicit object subclassing (it's implicit in Python 3 so we don't need it anymore).
- Converted most
.formats into f strings. - Improved consistency of docstrings. It's not perfect, but it should at least be better.
- Started the addition of type hints to functions and methods.
- Updated
utils.bytesToGuidto make it faster and more efficient. - Renamed
utils.msgEpochtoutils.filetimeToUtcto be more descriptive. - Updated internal variable names to be more consistent.
- Improvements to the way
__main__works. This does not affect the output it will generate, only the efficiency and readability.
v0.29.3
- [TeamMsgExtractor #226] Fix typo in command parsing that prevented the usage of
allowFallback. - Fixed main still manually navigating to a new directory with
os.chdirinstead of usingcustomPath. - Fixed issue in main where the
--htmloption was being using for both html and rtf. This meant if you wanted rtf it would not have used it, and if you wanted html it would have thrown an error. - Fixed
--out-namehaving no effect. - Fixed
--outhaving no effect.
v0.29.2
- Fixed issue where the RTF injection was accidentally doing HTML escapes for non-encapsulated streams and not doing escapes for encapsulated streams.
- Fixed name error in
Message.savecausing bad logic. For context, the internal variablezipwas renamed to_zipto avoid a name conflict with the built-in function. Some instances of it were missed.
v0.29.1
- [TeamMsgExtractor #198] Added a feature to save the header in it's own file (prefers the full raw header if it can find it, otherwise puts in a generated one) that was actually supposed to be in v0.29.0 but I forgot, lol.
v0.29.0
- [TeamMsgExtractor #207] Made it so that unspecified dates are handled properly. For clarification, an unspecified date is a custom value in MSG files for dates that means that the date is unspecified. It is distinctly different from a property not existing, which will still return None. For unspecified dates,
datetime.datetime.maxis returned. While perhaps not the best solution, it will have to do for now. - Fixed an issue where
utils.parseTypewas returning a string for the date when it makes more sense to return an actual datetime instance. - [TeamMsgExtractor #165] [TeamMsgExtractor #191] Completely redesigned all existing save functions. You can now properly save to custom locations under custom file names. This change may break existing code for several reasons. First, all arguments have been changed to keyword arguments. Second, a few keyword arguments have been renamed to better fit the naming conventions.
- [TeamMsgExtractor #200] Changed imports to use relative imports instead of hard imports where applicable.
- Updated the save functions to no longer rely on the current working directory to save things. The module now does what it can to use hard pathing so that if you spontaneously change working directory it will not cause problems. This should also allow for saving to be threaded, if I am correct.
- [TeamMsgExtractor #197] Added new property
Message.defaultFolderName. This property returns the default name to be used for a Message if none of the options change the name. - [TeamMsgExtractor #201] Fixed an issue where if the class type was all caps it would not be recognized. According to the documentation the comparisons should have been case insensitive, but I must have misread it at some point.
- [TeamMsgExtractor #202] Module will now handle path lengths in a semi-intelligent way to determine how best to save the MSG files. Default path length max is 255.
- [TeamMsgExtractor #203] Fixed an issue where having multiple "." characters in your file name would cause the directories to be incorrectly named when using the
useFileName(nowuseMsgFilename) argument in the save function. - [TeamMsgExtractor #204] Fixed an issue where the failsafe name used by attachments wasn't being encoded before hand causing encoding errors.
- MSG files with a type of simply
IPMwill now be returned asMSGFilebyopenMsg, as this specifies that no format has been specified. - [TeamMsgExtractor #214] Attachments that error because the MSG class type wasn't recognized or isn't supported will now correctly be
UnsupportedAttachmentinstead ofBrokenAttachment. - Improved internal code in many functions to make them faster and more efficient.
openMsgwill now tell you if a class type is simply unsupported rather than unrecognized. If it is found in the list, the function will raiseUnsupportedMSGTypeError.- Added caching to
MSGFile.listDir. I found that if you have larger files this single function might be taking up over half of the processing time because of how many times it is used in the module. - Fully implemented raw saving.
- Extended the
Contactclass to have more properties. - Added new function
MSGFile._ensureSetTypedwhich acts like the other ensure set functions but doesn't require you to know the type. Prefer to use other ensure set function when you know exactly what type it will be. - Changed
Message.saveRawtoMSGFile.saveRaw. - Changed
MSGFile.saveRawto take a path and save the contents to a zip file. - Corrected the help doc to reflect the current repository (was still on mattgwwalker).
- Fixed a bug that would cause an exception on trying to access the RTF body on a file that didn't have one. This is now correctly returning
None. - The
rawkeyword ofMessage.savenow actually works. - Added property
Attachment.randomFilenamewhich allows you to get the randomly generated name for attachments that don't have a usable one otherwise. - Added function
Attachment.regenerateRandomNamefor creating a new random name if necessary. - Added function
Attachment.getFilename. This function is used to get the name an attachment will be saved with given the specified arguments. Arguments are identical toAttachment.save. - Changed pull requests to reflect new style.
- Added additional properties for defined MSG file fields.
- Added zip file support for the
Attachment.saveandMessage.save. Simply pass a path for thezipkeyword argument and it will create a newZipFileinstance and save all of it's data inside there. Alternatively, you can pass an instance of a class that is either aZipFileorZipFile-like and it will simply use that. When this argument is defined, thecustomPathargument refers to the path inside the zip file. - Added the
htmlandrtfkeywords toMessage.save. These will attempt to save the body in the html or rtf format, respectively. If the program cannot save in those formats, it will raise an exception unless theallowFallbackkeyword argument isTrue. - Changed
utils.hasLento usehasattrinstead of the try-except method it was using. - Added new option
recipientSeparatortoMessageBaseallowing you to specify a custom recipient separator (default is ";" to match Microsoft Outlook). - Changed the
openMsgfunction inAttachmentto not be strict. This allows you to actually open the MSG file even if we don't recognize the type of embedded MSG that is being used. - Attempted to normalize encoding names throughout the module so that a certain encoding will only show up using one name and not multiple.
- Finally figured out what CRC32 algorithm is used in named properties after directly asking in a Microsoft forum (see the thread here). Fortunately the is already defined in the
compressed-rtfmodule so we can take advantage of that. - Reworked
MessageBase._genRecipientto improve it (because what on earth was that code it was using before?). Variables in the function are now more descriptive. Added comments in several places. - Many renames to better fit naming convention:
dev.setup_dev_loggertodev.setupDevLogger.MSGFile.fix_pathtoMSGFile.fixPath.MessageBase.save_attachmentstoMessageBase.saveAttachments.*.Existstoexists.*.ExistsTypedPropertyto*.existsTypedProperty.prop.create_proptoprop.createProp.Properties.attachment_counttoProperties.attachmentCount.Properties.next_attachment_idtoProperties.nextAttachmentId.Properties.next_recipient_idtoProperties.nextRecipientId.Properties.recipient_counttoProperties.recipientCount.utils.get_command_argstoutils.getCommandArgs.utils.get_full_class_nametoutils.getFullClassName.utils.get_inputtoutils.getInput.utils.has_lentoutils.hasLen.utils.setup_loggingtoutils.setupLogging.constants.int_to_data_typetoconstants.intToDataType.constants.int_to_intelligencetoconstants.intToIntelligence.constants.int_to_recipient_typetoconstants.intToRecipientType.- Misc internal function variables.
v0.28.7
- Added hex versions of the
MULTIPLE_X_BYTESconstants. - Added
1048toconstants.MULTIPLE_16_BYTES - [TeamMsgExtractor #173] rewrote the parsing of the length in
VariableLengthProp.__init__to use constants rather than values coded directly into the function. This should fix this issue.
v0.28.6
- [TeamMsgExtractor #191] This feature was never properly implemented, so it's not officially supported. However, this specific issue should be fixed. This is a temporary patch until I can get around to rewriting the way the module saves files in general.
- Added
venvto the.gitignorelist. - Added information to the readme.
v0.28.5
- [TeamMsgExtractor #189] Forgot to import
prepareFilenameinattachment.py. - Fixed bad link in the changelog.
v0.28.4
- [TeamMsgExtractor #184] Added code to
Messageto ensure subjects with null characters get stripped of them. - Moved code for stripping subjects of bad characters to
prepareFilenameinutils.
v0.28.3
- Fixed minor typo in an exception description.
- Updated the README this time. Forgot to do it for at least 1 update.
v0.28.2
- Started preparing more of the code for when HTML and RTF saving are fully implemented. Please note that they do not work at all right now. Commented out the code for this because it wasn't meant to be uncommented.
- [TeamMsgExtractor #184] Added code to ensure file names don't have null characters when saving an attachment.
- Minor improvement to the section of the save code that checks if you have provided incompatible options.
- [TeamMsgExtractor #185] Added the
IncompatibleOptionsError. It was supposed to be added a few updates ago, but was accidentally left out. - Modified
Message.saveto return the currentMessageinstance to allow for chained commands. This allows you to do something likeextract_msg.openMsg("path/to/message.msg").save().close()where you could not before.
v0.28.1
- [TeamMsgExtractor #181] Fixed issue in
Attachmentthat arose when moving some of the code to a base class. - Fixed small error in
utils.parse_typethat caused it to incorrectly compare expected and actual length. Fortunately, this had no actual effect aside from a warning. - Added the
ebcdicmodule to the requirements to add more supported encodings.
v0.28.0
- [TeamMsgExtractor #87] Added a new system to handle
NotImplementedErrorand other exceptions. All msg classes now have an option calledattachmentErrorBehaviorthat tells the class what to do if it has an error. The value should be one of three constants:ATTACHMENT_ERROR_THROW,ATTACHMENT_ERROR_NOT_IMPLEMENTED, orATTACHMENT_ERROR_BROKEN.ATTACHMENT_ERROR_THROWtells the class to not catch and exceptions and just let the user handle them.ATTACHMENT_ERROR_NOT_IMPLEMENTEDtells the class to catchNotImplementedErrorexceptions and put an instance ofUnsupportedAttachmentin place of a regular attachment.ATTACHMENT_ERROR_BROKENtells the class to catch all exceptions and either replace the attachment withUnsupportedAttachmentif it is aNotImplementedErrororBrokenAttachmentfor all other exceptions. With both of those options, caught exceptions will be logged. - In making the previous point work, much code from
Attachmenthas been moved to a new class calledAttachmentBase. BothBrokenAttachmentandUnsupportedAttachmentare subclasses ofAttachmentBasemeaning data can be extracted from their streams in the same way as a functioning attachment. - [TeamMsgExtractor #162] Pretty sure I actually got it this time. The execution flag should be applied by pip now.
- Fixed typos in some exceptions.
v0.27.16
- [TeamMsgExtractor #177] Fixed incorrect struct being used. It should be the correct one now, but further testing will be required to confirm this.
- Fixed log error message in
extract_msg.propto actually format a value into the message.
v0.27.15
- [TeamMsgExtractor #177] Fixed missing import.
v0.27.14
- [TeamMsgExtractor #173] Fixed typo that I made in the last version that broke things. I didn't have the resources to test this one myself, unfortunately.
- Fixed a typo in an exception message.
v0.27.13
- [TeamMsgExtractor #173] Moved some data used in checks into constants so that I can make sure they get changed every where that they are used. Hopefully I can close this issue.
v0.27.12
- [TeamMsgExtractor #173] Made an assumption about where an exception was thrown from and was wrong. While that location would have throw an exception, the function that called that code was the one to actually throw the exception in question. This issue should be fixed...
- [TeamMsgExtractor #162] Made another attempt to fix the execution flag on the wrapper script.
v0.27.11
- [TeamMsgExtractor #173] Tentatively implemented type 0x1014 (PtypMultipleInteger64). Apparently I forgot to do it earlier.
v0.27.10
- [TeamMsgExtractor #162] Fixed line endings in the wrapper script to be UNIX line endings rather than Windows line endings. Attempted to add the execution flag to the runnable script.
v0.27.9
- [TeamMsgExtractor #161] Added commands to the command line that will allow the user to specify that they want the message data to be output to
stdoutrather than to a file. - [TeamMsgExtractor #162] Added a wrapper for extract_msg that will be installed.
- Fixed some of the encoding names to allow them to actually be used in Python. The names they previously held were not aliases that currently exist.
- Added more documentation to
constants.CODE_PAGESto give more information about what it is. As it is a list of the possible encodings an msg file can use, I also specified which ones were supported by Python 3. - Moved the main code into a function so it is now callable from outside of the file.
v0.27.8
- [TeamMsgExtractor #158] Fixed a spelling error in a function name that was causing it to not be seen. The function was called
ceilDivbut was accidentally called ascielDiv.
v0.27.7
- Fixed an issue in the new bitwise adjustment functions. One of the variable names was incorrect.
v0.27.6
- Fixed a few lines in
data.py.
v0.27.5
- Fixed an error in
utils.dividethat would cause it to drop the extra data if there was not enough to create a full division. For example, if you had a string that was 10 characters, and divided by 3, you would only receive a total of 9 characters back. - Added some useful functions that will be used in the future.
- [TeamMsgExtractor #155] Updated to use new version of
tzlocal. - Updated changelog to fit new repository.
v0.27.4
- [TeamMsgExtractor #152] Fixed an issue where the name of an exception was put as the wrong thing.
v0.27.3
- [TeamMsgExtractor #105] Added code to fix an internal msg issue that had recipient lists being split up in the header after a certain amount of characters. This was now a bug on our part, but an issue with the generation of the msg file itself.
- Exposed the
MessageBaseclass directly fromextract_msg. I forgot to do this when I created it. - Added
MessageBase.bcc. I think it used to exist but got erased somehow on accident. Either way, it exists now.
v0.27.2
- After much debate, I have finally decided to allow an option to override the string encoding in message files. This Was something I reserved solely for
dev_classes.Messagebecause it felt like it didn't fit with how msg files were supposed to work. I also didn't want messages from people about them running into errors after they overrode the encoding. You can now do this by providing theoverrideEncodingoption on anyMSGFileclass as well as theopenMsgfunction. - [TeamMsgExtractor #103] Implemented correct detection of encodings. If you have any more issues with "'X' codec can't decode bytes" it is likely because the encoding specified inside the msg file is wrong.
v0.27.1
- [TeamMsgExtractor #147] Fixed an issue in
Message.savecaused by it attempting to directly access a private variable that was moved to the base classMessageBase.
v0.27.0
- [TeamMsgExtractor #143] Added new class
Appointmentthat can handle outlook appointments or meetings. - [TeamMsgExtractor #143] Added new class
MessageBasefor classes that end up being mostly like theMessageclass. Currently subclassed byMessageandAppointment. This should not have a direct effect on any code that uses this module. - Added support for
PtypFloatingTimeinutils.parseType. - Added proper support for
PtypTimeinFixedLengthProp.parseType - Added pretty print functions to the
Propertiesclass and theNamedclass. - Added new function
MSGFile._ensureSetPropertythat acts likeMSGFile._ensureSetexcept that it works with properties from thePropertiesinstance. - Added equivalent of the previously mentioned function to the
Attachmentclass and theRecipientclass.
v0.26.4
- Added new function
MSGFile._ensureSetNamedwhich acts likeMSGFile._ensureSetexcept that it works with named properties. - Added a version of the previous function to the
AttachmentandRecipientclasses. - Added new file
data.pythat contains various data structures that are used for specific properties. - Expanded the functionality of the
Recipientclass by adding more properties. - Added new functions
Named.getNamedandNamed.getNamedValuewhich retrieves a named property or the value of a named property, respectively, based on its name.
v0.26.3
- Added new function
MSGFile.savethat causes it and subclasses to raise aNotImplementedErrorif they do not override it. - Fixed some issues in the changelog.
- Added some additional constants for future use.
v0.26.2
- Fixed error in
Message._registerNamedPropertywhere I put the exceptionKeyErrorinstead ofAttributeError.
v0.26.1
- Fixed an issue in
openMsgthat would leave the basicMSGFileinstance open with the function returned, even if the function was returning a specific msg instance.
v0.26.0
- [TheElementalOfDestruction #3] Implementation of Named properties has finally been added. This allows us to access certain data that was not available to us before through regular methods.
- Added new function
MSGFile.slistDirthat acts likeMSGFile.listDir, except that it returns a list of strings rather than a list of lists. - [TheElementalOfDestruction #6] Added new function
MSGFile._getTypedStreamwhich, based on a path formatted in the same way as you would give toMSGFile._getStringStream, will return the data in the specified stream without the user needing to know the type before hand. However, if you DO know the type before hand, you can provide this function with one of the values inconstants.FIXED_LENGTH_PROPS_STRINGorconstants.VARIABLE_LENGTH_PROPS_STRING. - [TheElementalOfDestruction #6] Added new function
MSGFile._getTypedPropertywhich, based on a 4 digit hexadecimal string, will return the property in the properties file that matches that string without the type needing to be specified. However, if you DO know the type before hand, you can provide this function with one of the values inconstants.FIXED_LENGTH_PROPS_STRINGorconstants.VARIABLE_LENGTH_PROPS_STRING. - [TheElementalOfDestruction #6] Added new function
MSGFile._getTypedDatawhich is a combination of the two previously stated functions. - Added new function
MSGFile.ExistsTypedPropertywhich determines if a property with the specified id exists in the specified location. If you are looking for a property that may be in the properties file of an attachment or a recipient, please use the corresponding function from that class. - Added an equivalent of the previous 4 functions for the
RecipientandAttachmentclasses. - [TheElementalOfDestruction #2] Finished partial implementation of
utils.parseTypewhich was necessary for the proper implementation of named properties. This function is not fully implemented because there are some types we do not fully understand.
v0.25.3
- [TeamMsgExtractor #138] Fixed missing import in
extract_msg/utils.py.
v0.25.2
- [TeamMsgExtractor #134] Fixed a typo that caused
Message.headerDictto raise an exception. - Upgraded code for
Message.headerDictto avoid accidentally raising a key error if the header is ever missing the "Received" property. - Fixed an error in the changelog that caused some issue links to link to the wrong place.
v0.25.1
- [TeamMsgExtractor #132] Fixed an issue caused by unfinished code being left in the __main__ file.
- Cleaned up the imports to only be what is needed.
v0.25.0
- Added new class
MSGFile. TheMessageclass now inherits from this. This class is the base for all MSG files, not justMessages. It somewhat recently came to our attention that MSG files are used for a variety of things, including the storage of contacts, leading us to the next part of the changelog. - [TeamMsgExtractor #110] Added new class
Contactfor extracting the data from MSG files storing contacts. - Added new function
openMsgto the module to be used to open MSG files in which it is not certain what type of MSG is being opened. - Modified the
Attachmentclass to use theopenMsgfunction to open embedded MSG files. - Added option
delayAttachmentsto theMessageclass that will stop it from initializing attachments until the user is ready. This allows users to openMessages that have unimplemented attachment types without having to worry about the exception stopping them. This is also an option in the newopenMsgfunction.
v0.24.4
- Added new property
Message.isReadto show whether the email has been marked as read. - Renamed
Message.header_dicttoMessage.headerDictto better match naming conventions. - Renamed
Message.message_idtoMessage.messageIdto better match naming conventions.
v0.24.3
- Added new close function to the
Messageclass to ensure that all embeddedMessageinstances get closed as well. Not having this was causing issues with trying to modify the msg file after the user thought that it had been closed.
v0.24.2
- Fixed bug that somehow escaped detection that caused certain properties to not work.
- Fixed bug with embedded msg files introduced in v0.24.0
v0.24.0
- [TeamMsgExtractor #107] Rewrote the
Messsage.savefunction to fix many errors arising from it and to extend its functionality. - Added new function
isEmptyStringto check if a string passed to it isNoneor is empty.
v0.23.4
- [TeamMsgExtractor #112] Changed method used to get the message from an exception to make it compatible with Python 2 and 3.
- [TheElementalOfDestruction #23] General cleanup and all around improvements of the code.
v0.23.3
- Fixed issues in readme.
- [TheElementalOfDestruction #22] Updated
dev_classes.Messageto better match the currentMessageclass. - Fixed bad links in changelog.
- [TeamMsgExtractor #95] Added fallback encoding as well as manual encoding change to
dev_classes.Message.
v0.23.1
- Fixed issue with embedded msg files caused by the changes in v0.23.0.
v0.23.0
- [TeamMsgExtractor #75] & [TheElementalOfDestruction #19] Completely rewrote the function
Message._getStringStream. This was done for two reasons. The first was to make it actually work with msg files that have their strings encoded in a non-Unicode encoding. The second reason was to make it so that it better reflected msg specification which says that ALL strings in a file will be either Unicode or non-Unicode, but not both. Because of the second part, thepreferoption has been removed. - As part of fixing the two issues in the previous change, we have added two new properties:
- a boolean
Message.areStringsUnicodewhich tells if the strings are Unicode encoded. - A string
Message.stringEncodingwhich tells what the encoding is. This is used by theMessage._getStringStreamto determine how to decode the data into a string.
- a boolean
v0.22.1
- [TeamMsgExtractor #69] Fixed date format not being up to standard.
- Fixed a minor spelling error in the code.
v0.22.0
- [TheElementalOfDestruction #18] Added
--validateoption. - [TheElementalOfDestruction #16] Moved all dev code into its own scripts. Use
--devto use from the command line. - [TeamMsgExtractor #67] Added compatibility module to enforce Unicode
osfunctions. - Added new function to
Messageclass:Message.sExists. This function checks if a string stream exists. It's input should be formatted identically to that ofMessage._getStringStream. - Added new function to
Messageclass:Message.fix_path. This function will add the proper prefix to the path (if theprefixparameter is true) and adjust the path to be a string rather than a list or tuple. - Added new function to
utils.py:get_full_class_name. This function returns a string containing the module name and the class name of any instance of any class. It is returned in the format of{module}.{class}. - Added a sort of alias of
Message._getStream,Message._getStringStream,Message.Exists, andMessage.sExiststoAttachmentandRecipient. These functions run inside the associated attachment directory or recipient directory, respectively. - Added a fix to an issue introduced in an earlier version caused by accidentally deleting a letter in the code.
v0.21.0
- [TheElementalOfDestruction #12] Changed debug code to use logging module.
- [TheElementalOfDestruction #17] Fixed Attachment class using wrong properties file location in embedded msg files.
- [TheElementalOfDestruction #11] Improved handling of command line arguments using argparse module.
- [TheElementalOfDestruction #16] Started work on moving developer code into its own script.
- [TeamMsgExtractor #63] Fixed JSON saving not applying to embedded msg files.
- [TeamMsgExtractor #55] Added fix for recipient sometimes missing email address.
- [TeamMsgExtractor #65] Added fix for special characters in recipient names.
- Module now raises a custom exception (instead of just
IOError) if the input is not a valid OLE file. - Added
header_dictproperty to theMessageclass. - General minor bug fixes.
- Fixed a section in the
Recipientclass that I have no idea why I did it that way. If errors start randomly occurring with it, this fix is why.
v0.20.8
- Fixed a tab issue and parameter type in
message.py.
v0.20.7
- Separated classes into their own files to make things more manageable.
- Placed
__doc__back inside of__init__.py. - Rewrote the
Propclass to be two different classes that extend from a base class. - Made decent progress on completing the
parse_typefunction of theFixedLengthPropclass (formerly a function of thePropclass). - Improved exception handling code throughout most of the module.
- Updated the
.gitignore. - Updated README.
- Added
# DEBUGcomments before debugging lines to make them easier to find in the future. - Added function
create_propinprop.pywhich should be used for creating what used to be an instance of thePropclass. - Added more constants to reflect some of the changes made.
- Fixed a major bug that was causing the header to generate after things like "to" and "cc" which would force those fields to not use the header.
- Fixed the debug variable.
- Fixed many small bugs in many of the classes.
- [TheElementalOfDestruction #13] Various loose ends to enhance the workflow in the repo.