Commit 90d6dd4
fix(docx): split multiple OMML equations into separate formula items (#3123)
* fix(msword): split multiple OMML equations into separate formula items
When a DOCX paragraph contains multiple sibling <m:oMath> elements
(e.g. separate equations on one line), the converter previously
concatenated them into a single LaTeX string because element.iter()
walks all descendants depth-first.
Fix: iterate direct children of the paragraph element first to
correctly identify sibling <m:oMath> elements, converting each
independently. Falls back to deep iteration only when oMath
elements are nested inside wrapper elements.
Also splits standalone multi-equation paragraphs into individual
FORMULA document items instead of merging them into one.
Closes #3121
Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* test(msword): add multi-equation paragraph test document
Add a minimal DOCX file containing two separate oMath elements
in one paragraph with a text separator, along with groundtruth
output files for markdown, json, and plain text export.
Requested-by: @dolfim-ibm
Signed-off-by: Giulio Leone <giulioleone10@gmail.com>
Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* test(msword): regenerate multi-equation indented-text snapshot
Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* test: replace test doc with issue #3121 attachment
Use the real Word document from the issue reporter (smroels)
instead of the minimal programmatic fixture. The new document
contains three sibling <m:oMath> elements in one paragraph,
matching the exact failing shape described in #3121.
Regenerate groundtruth to match the richer document structure.
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* test: regenerate groundtruth for omml_multi_equation_paragraph
Re-run document conversion with current code to update .itxt and .json
groundtruth files. The .itxt had stale structure from the previous
programmatic fixture; the new real-document conversion produces the
correct output with three separate formula items.
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* style(docx): rerun ruff formatter for msword backend
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
* refactor(docx): drop unused tag_name binding
Remove the unused local in the direct oMath iteration path so the code
reads clearly and the outstanding review comment is fully addressed
without changing equation-handling behavior.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* DCO Remediation Commit for giulio-leone <giulio97.leone@gmail.com>
I, giulio-leone <giulio97.leone@gmail.com>, hereby add my Signed-off-by to this commit: 84cc70b
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* test(docx): cover equation paragraph branches
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
* test(docx): reuse backend fixture in msword tests
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
---------
Signed-off-by: giulio-leone <giulio.leone@users.noreply.github.com>
Signed-off-by: giulio-leone <giulio97.leone@gmail.com>
Signed-off-by: Giulio Leone <giulioleone10@gmail.com>
Co-authored-by: giulio-leone <giulio.leone@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent fdf5e20 commit 90d6dd4
File tree
6 files changed
+352
-23
lines changed- docling/backend
- tests
- data
- docx
- groundtruth/docling_v2
6 files changed
+352
-23
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
906 | 906 | | |
907 | 907 | | |
908 | 908 | | |
909 | | - | |
910 | | - | |
911 | | - | |
912 | | - | |
913 | | - | |
914 | | - | |
915 | | - | |
916 | | - | |
917 | | - | |
918 | | - | |
919 | | - | |
920 | | - | |
921 | | - | |
922 | | - | |
923 | | - | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
| 922 | + | |
| 923 | + | |
| 924 | + | |
| 925 | + | |
| 926 | + | |
| 927 | + | |
| 928 | + | |
| 929 | + | |
| 930 | + | |
| 931 | + | |
| 932 | + | |
| 933 | + | |
| 934 | + | |
| 935 | + | |
| 936 | + | |
| 937 | + | |
| 938 | + | |
| 939 | + | |
| 940 | + | |
| 941 | + | |
| 942 | + | |
| 943 | + | |
| 944 | + | |
| 945 | + | |
| 946 | + | |
| 947 | + | |
| 948 | + | |
| 949 | + | |
| 950 | + | |
| 951 | + | |
| 952 | + | |
| 953 | + | |
| 954 | + | |
| 955 | + | |
| 956 | + | |
| 957 | + | |
924 | 958 | | |
925 | 959 | | |
926 | 960 | | |
| |||
1055 | 1089 | | |
1056 | 1090 | | |
1057 | 1091 | | |
1058 | | - | |
| 1092 | + | |
1059 | 1093 | | |
1060 | | - | |
1061 | | - | |
1062 | | - | |
1063 | | - | |
1064 | | - | |
1065 | | - | |
1066 | | - | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
1067 | 1114 | | |
1068 | 1115 | | |
1069 | 1116 | | |
| |||
Binary file not shown.
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
Lines changed: 132 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
Lines changed: 13 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
0 commit comments