You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/pymupdf4llm/api.rst
+2-4Lines changed: 2 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -224,8 +224,7 @@ The |PyMuPDF4LLM| API
224
224
225
225
Return appropriate markdown header prefix. This is either "" or a string of "#" characters followed by a space.
226
226
227
-
Given a text span from a "dict"" extraction, determine the
228
-
markdown header prefix string of 0 to n concatenated '#' characters.
227
+
Given a text span from a "dict" extraction, determine the markdown header prefix string of 0 to n concatenated '#' characters.
229
228
230
229
:arg dict span: a dictionary containing the text span information. This is the same dictionary as returned by `page.get_text("dict")`.
231
230
@@ -332,8 +331,7 @@ This user function uses the document's Table of Contents -- under the assumption
332
331
333
332
Create an object which uses the document's Table of Contents (TOC) to determine header levels. Upon object creation, the table of contents is read via the `Document.get_toc()` method. The TOC data is then used to determine header levels in the `to_markdown()` method.
334
333
335
-
This is an alternative to :class:`IdentifyHeaders`. Instead of running through the full document to identify font sizes, it uses the document's Table Of
336
-
Contents (TOC) to identify headers on pages. Like :class:`IdentifyHeaders`, this also is no guarantee to find headers, but for well-built Table of Contents, there is a good chance for more correctly identifying header lines on document pages than the font-size-based approach.
334
+
This is an alternative to :class:`IdentifyHeaders`. Instead of running through the full document to identify font sizes, it uses the document's Table Of Contents (TOC) to identify headers on pages. Like :class:`IdentifyHeaders`, this also is no guarantee to find headers, but for well-built Table of Contents, there is a good chance for more correctly identifying header lines on document pages than the font-size-based approach.
337
335
338
336
It also has the advantage of being much faster than the font-size-based approach, as it does not execute a full document scan or even access any of the document pages.
0 commit comments