Skip to content

Commit 66c6235

Browse files
fix: improve markdown file type detection for unstructured parser
Co-Authored-By: Aaron <AJ> Steers <[email protected]>
1 parent f7ef188 commit 66c6235

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

airbyte_cdk/sources/file_based/file_types/unstructured_parser.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -442,9 +442,9 @@ def _get_filetype(self, file: IOBase, remote_file: RemoteFile) -> Optional[FileT
442442
file.seek(0)
443443
if file_content and isinstance(file_content, bytes):
444444
content_str = file_content.decode("utf-8", errors="ignore")
445-
if content_str.lstrip().startswith("#"):
446-
type_based_on_content = FileType.MD
447-
elif remote_file.mime_type == "text/markdown":
445+
if (content_str.lstrip().startswith("#") or
446+
remote_file.mime_type == "text/markdown" or
447+
remote_file.uri.endswith(".md")):
448448
type_based_on_content = FileType.MD
449449
else:
450450
type_based_on_content = FileType.UNK

0 commit comments

Comments
 (0)