Skip to content

Commit 89b35b8

Browse files
Merge pull request #21 from fleetingbytes/develop
Develop closes #18
2 parents 2369527 + bdefa2f commit 89b35b8

File tree

15 files changed

+369
-364
lines changed

15 files changed

+369
-364
lines changed

.gitattributes

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* text=auto

CHANGELOG.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,18 @@
22

33
<!-- towncrier release notes start -->
44

5+
## 0.9.0 (2024-03-11)
6+
7+
8+
### Bugfixes
9+
10+
- Recognize control words with where the parameter's digital sequence is delimited by any character other than an ASCII digit [#18](https://github.com/fleetingbytes/rtfparse/issues/18)
11+
12+
13+
### Development Details
14+
15+
- Renamed a few things, improved readme [#17](https://github.com/fleetingbytes/rtfparse/issues/17)
16+
517
## 0.8.2 (2024-03-05)
618

719

LICENSE.txt

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
1-
Copyright (c) 2023 Sven Siegmud
2-
3-
Permission is hereby granted, free of charge, to any person obtaining a copy
4-
of this software and associated documentation files (the "Software"), to deal
5-
in the Software without restriction, including without limitation the rights
6-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7-
copies of the Software, and to permit persons to whom the Software is
8-
furnished to do so, subject to the following conditions:
9-
10-
The above copyright notice and this permission notice shall be included in all
11-
copies or substantial portions of the Software.
12-
13-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19-
SOFTWARE.
1+
Copyright (c) 2023 Sven Siegmud
2+
3+
Permission is hereby granted, free of charge, to any person obtaining a copy
4+
of this software and associated documentation files (the "Software"), to deal
5+
in the Software without restriction, including without limitation the rights
6+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7+
copies of the Software, and to permit persons to whom the Software is
8+
furnished to do so, subject to the following conditions:
9+
10+
The above copyright notice and this permission notice shall be included in all
11+
copies or substantial portions of the Software.
12+
13+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19+
SOFTWARE.

README.md

Lines changed: 18 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,20 @@
11
# rtfparse
22

3-
RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with `rtfparse` is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.
3+
Parses Microsofts Rich Text Format (RTF) documents. It creates an in-memory object which represents the tree structure of the RTF document. This object can in turn be rendered by using one of the renderers.
4+
So far, rtfparse provides only one renderer (`Decapsulate_HTML`) which liberates the HTML code encapsulated in RTF. This will come handy, for examle, if you ever need to extract the HTML from a HTML-formatted email message saved by Microsoft Outlook.
5+
6+
MS Outlook also tends to use RTF compression, so the CLI of rtfparse can optionally do that, too.
7+
8+
You can of course write your own renderers of parsed RTF documents and consider contributing them to this project.
49

5-
rtfparse can also decompressed RTF from MS Outlook `.msg` files and parse that.
610

711
# Installation
812

913
Install rtfparse from your local repository with pip:
1014

1115
pip install rtfparse
1216

13-
Installation creates an executable file `rtfparse` in your python scripts folder which should be in your `$PATH`.
17+
Installation creates an executable file `rtfparse` in your python scripts folder which should be in your `$PATH`.
1418

1519
# Usage From Command Line
1620

@@ -24,49 +28,48 @@ rtfparse.info.log
2428
rtfparse.errors.log
2529
```
2630

27-
## Example: De-encapsulate HTML from an uncompressed RTF file
31+
## Example: Decapsulate HTML from an uncompressed RTF file
2832

29-
rtfparse --rtf-file "path/to/rtf_file.rtf" --de-encapsulate-html --output-file "path/to/extracted.html"
33+
rtfparse --rtf-file "path/to/rtf_file.rtf" --decapsulate-html --output-file "path/to/extracted.html"
3034

31-
## Example: De-encapsulate HTML from MS Outlook email file
35+
## Example: Decapsulate HTML from MS Outlook email file
3236

33-
Thanks to [extract_msg](https://github.com/TeamMsgExtractor/msg-extractor) and [compressed_rtf](https://github.com/delimitry/compressed_rtf), rtfparse internally uses them:
37+
For this, the CLI of rtfparse uses [extract_msg](https://github.com/TeamMsgExtractor/msg-extractor) and [compressed_rtf](https://github.com/delimitry/compressed_rtf).
3438

35-
rtfparse --msg-file "path/to/email.msg" --de-encapsulate-html --output-file "path/to/extracted.html"
39+
rtfparse --msg-file "path/to/email.msg" --decapsulate-html --output-file "path/to/extracted.html"
3640

3741
## Example: Only decompress the RTF from MS Outlook email file
3842

3943
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf"
4044

41-
## Example: De-encapsulate HTML from MS Outlook email file and save (and later embed) the attachments
45+
## Example: Decapsulate HTML from MS Outlook email file and save (and later embed) the attachments
4246

4347
When extracting the RTF from the `.msg` file, you can save the attachments (which includes images embedded in the email text) in a directory:
4448

4549
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir"
4650

47-
In `rtfparse` version 1.x you will be able to embed these images in the de-encapsulated HTML. This functionality will be provided by the package [embedimg](https://github.com/fleetingbytes/embedimg).
51+
In `rtfparse` version 1.x you will be able to embed these images in the decapsulated HTML. This functionality will be provided by the package [embedimg](https://github.com/fleetingbytes/embedimg).
4852

4953
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir" --embed-img
5054

5155
In the current version the option `--embed-img` does nothing.
5256

53-
# Programatic usage in python module
57+
# Programatic usage in a Python module
5458

5559
```
5660
from pathlib import Path
5761
from rtfparse.parser import Rtf_Parser
58-
from rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML
62+
from rtfparse.renderers.html_decapsulator import HTML_Decapsulator
5963
6064
source_path = Path(r"path/to/your/rtf/document.rtf")
61-
target_path = Path(r"path/to/your/html/de_encapsulated.html")
65+
target_path = Path(r"path/to/your/html/decapsulated.html")
6266
# Create parent directory of `target_path` if it does not already exist:
6367
target_path.parent.mkdir(parents=True, exist_ok=True)
6468
65-
6669
parser = Rtf_Parser(rtf_path=source_path)
6770
parsed = parser.parse_file()
6871
69-
renderer = De_encapsulate_HTML()
72+
renderer = HTML_Decapsulator()
7073
7174
with open(target_path, mode="w", encoding="utf-8") as html_file:
7275
renderer.render(parsed, html_file)
@@ -76,6 +79,5 @@ with open(target_path, mode="w", encoding="utf-8") as html_file:
7679

7780
If you find a working official Microsoft link to the RTF specification and add it here, you'll be remembered fondly.
7881

79-
* [Swissmains Link to RTF Spec 1.9.1](https://manuals.swissmains.com/pages/viewpage.action?pageId=1376332&preview=%2F1376332%2F10620104%2FWord2007RTFSpec9.pdf)
8082
* [Webarchive Link to RTF Spec 1.9.1](https://web.archive.org/web/20190708132914/http://www.kleinlercher.at/tools/Windows_Protocols/Word2007RTFSpec9.pdf)
8183
* [RTF Extensions, MS-OXRTFEX](https://docs.microsoft.com/en-us/openspecs/exchange_server_protocols/ms-oxrtfex/411d0d58-49f7-496c-b8c3-5859b045f6cf)
Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
1-
{% if sections[""] %}
2-
{% for category, val in definitions.items() if category in sections[""] %}
3-
4-
### {{ definitions[category]['name'] }}
5-
6-
{% for text, values in sections[""][category].items() %}
7-
- {{ text }} {{ values|join(', ') }}
8-
{% endfor %}
9-
10-
{% endfor %}
11-
{% else %}
12-
No significant changes.
13-
14-
15-
{% endif %}
1+
{% if sections[""] %}
2+
{% for category, val in definitions.items() if category in sections[""] %}
3+
4+
### {{ definitions[category]['name'] }}
5+
6+
{% for text, values in sections[""][category].items() %}
7+
- {{ text }} {{ values|join(', ') }}
8+
{% endfor %}
9+
10+
{% endfor %}
11+
{% else %}
12+
No significant changes.
13+
14+
15+
{% endif %}

0 commit comments

Comments
 (0)