Skip to content

Commit c344d0f

Browse files
maxmnemonicMaksym Lysak
andauthored
fix: Escaping underscore characters in md export (#57)
* Escaping underscore characters in md export Signed-off-by: Maksym Lysak <[email protected]> * Run pre-commits Signed-off-by: Maksym Lysak <[email protected]> --------- Signed-off-by: Maksym Lysak <[email protected]> Co-authored-by: Maksym Lysak <[email protected]>
1 parent b9b3c60 commit c344d0f

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

docling_core/types/doc/document.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1291,6 +1291,18 @@ def export_to_markdown( # noqa: C901
12911291
mdtext = re.sub(
12921292
r"\n\n\n+", "\n\n", mdtext
12931293
) # remove cases of double or more empty lines.
1294+
1295+
# Our export markdown doesn't contain any emphasis styling:
1296+
# Bold, Italic, or Bold-Italic
1297+
# Hence, any underscore that we print into Markdown is coming from document text
1298+
# That means we need to escape it, to properly reflect content in the markdown
1299+
def escape_underscores(text):
1300+
# Replace "_" with "\_" only if it's not already escaped
1301+
escaped_text = re.sub(r"(?<!\\)_", r"\_", text)
1302+
return escaped_text
1303+
1304+
mdtext = escape_underscores(mdtext)
1305+
12941306
return mdtext
12951307

12961308
def export_to_text( # noqa: C901

0 commit comments

Comments
 (0)