Skip to content

Commit b71d9b2

Browse files
author
Sven Siegmund
committed
ready
1 parent 1764517 commit b71d9b2

File tree

12 files changed

+55
-75
lines changed

12 files changed

+55
-75
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Changelog
2+
3+
<!-- towncrier release notes start -->

LICENSE

Lines changed: 0 additions & 21 deletions
This file was deleted.

README.md

Lines changed: 28 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,7 @@
22

33
RTF Parser. So far it can only de-encapsulate HTML content from an RTF, but it properly parses the RTF structure and allows you to write your own custom RTF renderers. The HTML de-encapsulator provided with `rtfparse` is just one such custom renderer which liberates the HTML content from its RTF encapsulation and saves it in a given html file.
44

5-
# Dependencies
6-
7-
```
8-
argcomplete
9-
extract-msg
10-
compressed_rtf
11-
```
5+
rtfparse can also decompressed RTF from MS Outlook `.msg` files and parse that.
126

137
# Installation
148

@@ -18,65 +12,60 @@ Install rtfparse from your local repository with pip:
1812

1913
Installation creates an executable file `rtfparse` in your python scripts folder which should be in your `$PATH`.
2014

21-
# First Run
15+
# Usage From Command Line
2216

23-
When you run `rtfparse` for the first time it will start a configuration wizard which will guide you through the process of creating a default configuration file and specifying the location of its folders. (These folders serve as locations for saving extracted rtf or html files.)
17+
Use the `rtfparse` executable from the command line. Read `rtfparse --help`.
2418

25-
In the configuration wizard you can press `A` for care-free automatic configuration, which would look something like this:
19+
rtfparse writes logs into `~/rtfparse/` into these files:
2620

2721
```
28-
$ rtfparse
29-
Config file missing, creating new default config file
22+
rtfparse.debug.log
23+
rtfparse.info.log
24+
rtfparse.errors.log
25+
```
3026

31-
____ ____ __ _ ____ _ ____ _ _ ____ ____ ___ _ ____ __ _
32-
|___ [__] | \| |--- | |__, |__| |--< |--| | | [__] | \|
33-
_ _ _ ___ ____ ____ ___
34-
|/\| | /__ |--| |--< |__>
27+
## Example: De-encapsulate HTML from an uncompressed RTF file
3528

29+
rtfparse --rtf-file "path/to/rtf_file.rtf" --de-encapsulate-html --output-file "path/to/extracted.html"
3630

37-
◊ email_rtf (C:\Users\nagidal\rtfparse\email_rtf) does not exist!
31+
## Example: De-encapsulate HTML from MS Outlook email file
3832

39-
(A) Automatically configure this and all remaining rtfparse settings
40-
(C) Create this path automatically
41-
(M) Manually input correct path to use or to create
42-
(Q) Quit and edit `email_rtf` in rtfparse_configuration.ini
33+
Thanks to [extract_msg](https://github.com/TeamMsgExtractor/msg-extractor) and [compressed_rtf](https://github.com/delimitry/compressed_rtf), rtfparse internally uses them:
4334

44-
Created directory C:\Users\nagidal\rtfparse
45-
Created directory C:\Users\nagidal\rtfparse\email_rtf
46-
Created directory C:\Users\nagidal\rtfparse\html
47-
```
35+
rtfparse --msg-file "path/to/email.msg" --de-encapsulate-html --output-file "path/to/extracted.html"
4836

49-
`rtfparse` also creates the folder `.rtfparse` (beginning with a dot) in your home directory where it saves its default configuration and its log files.
37+
## Example: Only decompress the RTF from MS Outlook email file
5038

51-
# Usage From Command Line
39+
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf"
5240

53-
Use the `rtfparse` executable from the command line. For example if you want to de-encapsulate the HTML from an RTF file, do it like this:
41+
## Example: De-encapsulate HTML from MS Outlook email file and save (and later embed) the attachments
5442

55-
rtfparse -f "path/to/rtf_file.rtf" -d
43+
When extracting the RTF from the `.msg` file, you can save the attachments (which includes images embedded in the email text) in a directory:
5644

57-
Or you can de-encapsulate the HTML from an MS Outlook message, thanks to [extract_msg](https://github.com/TeamMsgExtractor/msg-extractor) and [compressed_rtf](https://github.com/delimitry/compressed_rtf):
45+
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir"
5846

59-
rtfparse -m "path/to/email.msg" -d
47+
In `rtfparse` version 1.x you will be able to embed these images in the de-encapsulated HTML. This functionality will be provided by the package [embedimg](https://github.com/fleetingbytes/embedimg).
6048

61-
The resulting html file will be saved to the `html` folder you set in the `rtfparse_configuration.ini`. Command reference is in `rtfparse --help`.
49+
rtfparse --msg-file "path/to/email.msg" --output-file "path/to/extracted.rtf" --attachments-dir "path/to/dir" --embed-img
6250

63-
# Usage in python module
51+
In the current version the option `--embed-img` does nothing.
52+
53+
# Programatic usage in python module
6454

6555
```
66-
import pathlib
56+
from pathlib import Path
6757
from rtfparse.parser import Rtf_Parser
68-
from rtfparse.renderers import de_encapsulate_html
69-
58+
from rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML
7059
71-
source_path = pathlib.Path(r"path/to/your/rtf/document.rtf")
72-
target_path = pathlib.Path(r"path/to/your/html/de_encapsulated.html")
60+
source_path = Path(r"path/to/your/rtf/document.rtf")
61+
target_path = Path(r"path/to/your/html/de_encapsulated.html")
7362
7463
7564
parser = Rtf_Parser(rtf_path=source_path)
7665
parsed = parser.parse_file()
7766
67+
renderer = De_encapsulate_HTML()
7868
79-
renderer = de_encapsulate_html.De_encapsulate_HTML()
8069
with open(target_path, mode="w", encoding="utf-8") as html_file:
8170
renderer.render(parsed, html_file)
8271
```

changelog.d/1.fixed.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Using `pyproject.toml` for installation with current pip versions

changelog.d/3.unimportant.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fixed reference before assignment error

changelog.d/5.unimportant.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Removed convoluted configurator

src/rtfparse/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
#!/usr/bin/env python
22

33

4-
__version__ = "0.8.0-rc1"
4+
__version__ = "0.8.0"

src/rtfparse/cli.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ def setup_logger(directory: Path) -> logging.Logger:
2525
try:
2626
provide_dir(directory)
2727
logger_config = logging_conf.create_dict_config(
28-
directory, "debug.log", "info.log", "errors.log"
28+
directory, "rtfparse.debug.log", "rtfparse.info.log", "rtfparse.errors.log"
2929
)
3030
except FileExistsError:
3131
logger.error(
@@ -99,6 +99,7 @@ def run(cli_args: Namespace) -> None:
9999
elif cli_args.msg_file:
100100
msg = em.openMsg(f"{cli_args.msg_file}")
101101
if cli_args.attachments_dir:
102+
provide_dir(cli_args.attachments_dir)
102103
for attachment in msg.attachments:
103104
with open(
104105
cli_args.attachments_dir / f"{attachment.longFilename}", mode="wb"

src/rtfparse/entities.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import re
77

88
# Own modules
9-
from rtfparse import errors, re_patterns, utils
9+
from rtfparse import re_patterns, utils
1010
from rtfparse.enums import Bytestring_Type
1111

1212
# Setup logging
@@ -56,7 +56,6 @@ def probe(cls, pattern: re_patterns.Bytes_Regex, file: io.BufferedReader) -> Byt
5656
logger.debug(f"Reached unexpected end of file.")
5757
result = Bytestring_Type.GROUP_END
5858
break
59-
# raise errors.UnexpectedEndOfFileError(f"at position {file.tell()}")
6059
continue
6160
break
6261
logger.debug(f"Probe {result = }")

src/rtfparse/minimal.py

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,22 @@
11
#!/usr/bin/env python
22

33

4-
import pathlib
4+
"""
5+
A minimal example for a programatic use of the rtf parser and renderer
6+
"""
57

8+
from pathlib import Path
69
from rtfparse.parser import Rtf_Parser
7-
from rtfparse.renderers import de_encapsulate_html
10+
from rtfparse.renderers.de_encapsulate_html import De_encapsulate_HTML
811

9-
source_path = pathlib.Path(r"path/to/your/rtf/document.rtf")
10-
target_path = pathlib.Path(r"path/to/your/html/de_encapsulated.html")
12+
source_path = Path(r"path/to/your/rtf/document.rtf")
13+
target_path = Path(r"path/to/your/html/de_encapsulated.html")
1114

1215

1316
parser = Rtf_Parser(rtf_path=source_path)
1417
parsed = parser.parse_file()
1518

16-
renderer = de_encapsulate_html.De_encapsulate_HTML()
19+
renderer = De_encapsulate_HTML()
20+
1721
with open(target_path, mode="w", encoding="utf-8") as html_file:
1822
renderer.render(parsed, html_file)

0 commit comments

Comments
 (0)