Skip to content

Commit afc3c2b

Browse files
committed
docs: Add example how to extract HTML from .msg programmatically
1 parent b6615a2 commit afc3c2b

File tree

2 files changed

+35
-1
lines changed

2 files changed

+35
-1
lines changed

README.md

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,9 @@ In the current version the option `--embed-img` does nothing.
5656

5757
# Programatic usage in a Python module
5858

59-
```
59+
## Decapsulate HTML from an uncompressed RTF file
60+
61+
```py
6062
from pathlib import Path
6163
from rtfparse.parser import Rtf_Parser
6264
from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
@@ -75,6 +77,37 @@ with open(target_path, mode="w", encoding="utf-8") as html_file:
7577
renderer.render(parsed, html_file)
7678
```
7779

80+
## Decapsulate HTML from an MS Outlook msg file
81+
82+
```py
83+
from pathlib import Path
84+
from extract_msg import openMsg
85+
from compressed_rtf import decompress
86+
from io import BytesIO
87+
from rtfparse.parser import Rtf_Parser
88+
from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
89+
90+
91+
source_file = Path("path/to/your/source.msg")
92+
target_file = Path(r"path/to/your/target.html")
93+
# Create parent directory of `target_path` if it does not already exist:
94+
target_file.parent.mkdir(parents=True, exist_ok=True)
95+
96+
# Get a decompressed RTF bytes buffer from the MS Outlook message
97+
msg = openMsg(source_file)
98+
decompressed_rtf = decompress(msg.compressedRtf)
99+
rtf_buffer = BytesIO(decompressed_rtf)
100+
101+
# Parse the rtf buffer
102+
parser = Rtf_Parser(rtf_file=rtf_buffer)
103+
parsed = parser.parse_file()
104+
105+
# Decapsulate the HTML from the parsed RTF
106+
decapsulator = HTML_Decapsulator()
107+
with open(target_file, mode="w", encoding="utf-8") as html_file:
108+
decapsulator.render(parsed, html_file)
109+
```
110+
78111
# RTF Specification Links
79112

80113
If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.

changelog.d/25.doc.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Add example how to programmatically extract HTML from MS Outlook message

0 commit comments

Comments
 (0)