Skip to content

Commit af3c89a

Browse files
authored
feat: In FileTypeRouter add .msg to "application/vnd.ms-outlook" mapping (#8910)
* Add .msg mimetype support in file type router * Add reno * Update tests
1 parent 99a998f commit af3c89a

File tree

3 files changed

+15
-4
lines changed

3 files changed

+15
-4
lines changed

haystack/components/routers/file_type_router.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,14 @@
1515
logger = logging.getLogger(__name__)
1616

1717

18-
# we add markdown because it is not added by the mimetypes module
19-
# see https://github.com/python/cpython/pull/17995
20-
CUSTOM_MIMETYPES = {".md": "text/markdown", ".markdown": "text/markdown"}
18+
CUSTOM_MIMETYPES = {
19+
# we add markdown because it is not added by the mimetypes module
20+
# see https://github.com/python/cpython/pull/17995
21+
".md": "text/markdown",
22+
".markdown": "text/markdown",
23+
# we add msg because it is not added by the mimetypes module
24+
".msg": "application/vnd.ms-outlook",
25+
}
2126

2227

2328
@component
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
---
2+
enhancements:
3+
- |
4+
In the FileTypeRouter add explicit support for classifying .msg files with mimetype "application/vnd.ms-outlook" since the mimetypes module returns None for .msg files by default.

test/components/routers/test_file_router.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,14 +87,16 @@ def test_run(self, test_files_path):
8787
test_files_path / "txt" / "doc_2.txt",
8888
test_files_path / "audio" / "the context for this answer is here.wav",
8989
test_files_path / "images" / "apple.jpg",
90+
test_files_path / "msg" / "sample.msg",
9091
]
9192

92-
router = FileTypeRouter(mime_types=[r"text/plain", r"audio/x-wav", r"image/jpeg"])
93+
router = FileTypeRouter(mime_types=[r"text/plain", r"audio/x-wav", r"image/jpeg", "application/vnd.ms-outlook"])
9394
output = router.run(sources=file_paths)
9495
assert output
9596
assert len(output[r"text/plain"]) == 2
9697
assert len(output[r"audio/x-wav"]) == 1
9798
assert len(output[r"image/jpeg"]) == 1
99+
assert len(output["application/vnd.ms-outlook"]) == 1
98100
assert not output.get("unclassified")
99101

100102
def test_run_with_single_meta(self, test_files_path):

0 commit comments

Comments
 (0)