You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The page number "pno" is a 0-based integer `-∞ < pno < page_count`.
2339
+
.. note::
2340
+
2341
+
Most document methods (left column) exist for convenience reasons, and are just wrappers for: *Document[pno].<page method>*. So they **load and discard the page** on each execution.
2342
+
2343
+
However, the first two methods work differently. They only need a page's object definition statement - the page itself will **not** be loaded. So e.g. :meth:`Page.get_fonts` is a wrapper the other way round and defined as follows: `page.get_fonts` == `page.parent.get_page_fonts(page.number)`.
2344
+
2345
+
2346
+
When calling the :ref:`Document` equivalent methods then the page number is sent through as a parameter, e.g.:
2347
+
2348
+
`Document.get_page_images(pno)` or `Document.get_page_text(pno)`
2349
+
2350
+
.. tip::
2351
+
2352
+
The page number parameter, ``pno``, is a 0-based integer `-∞ < pno < page_count`.
2353
+
2354
+
2355
+
2356
+
2357
+
2358
+
Tables and Related Classes
2359
+
------------------------------------
2360
+
2361
+
The `TableFinder` class is returned by :meth:`Page.find_tables` and has related classes as follows:
2340
2362
2341
2363
.. note::
2342
2364
@@ -2351,7 +2373,7 @@ The page number "pno" is a 0-based integer `-∞ < pno < page_count`.
2351
2373
2352
2374
.. attribute:: tables
2353
2375
2354
-
A list of `Table` objects, each of which represents a table found on the page. Empty list if no table found.
2376
+
A list of :class:`Table` objects, each of which represents a table found on the page. An empty list if no tables are found.
2355
2377
2356
2378
.. attribute:: page
2357
2379
@@ -2361,93 +2383,124 @@ The page number "pno" is a 0-based integer `-∞ < pno < page_count`.
2361
2383
2362
2384
A list of tuples `(x0, y0, x1, y1)` representing the bounding boxes of all table cells (in any tables) found on the page. Note that cells may also be ``None`` objects, which are created to enforce a complete rows x columns structure for the affected table.
2363
2385
2386
+
:type::ref:`Page`
2387
+
2364
2388
2365
2389
.. class:: Table
2366
2390
2367
-
An object representing a table found on the page. Attributes of interest:
2391
+
An object representing a table found on the page.
2368
2392
2369
-
.. attribute:: bbox
2370
2393
2371
-
The bounding box of the table given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the table.
2394
+
.. attribute:: page
2395
+
2396
+
A back-reference to the owning page.
2397
+
2398
+
:type::ref:`Page`
2372
2399
2373
2400
.. attribute:: cells
2374
2401
2375
-
A list of tuples `(x0, y0, x1, y1)` representing the bounding boxes of the cells in the table. Note that cells may also be ``None`` objects, which will happen to prevent gaps in a rows x columns structure.
2402
+
An array of `Rect` objects for each cell in the table.
2376
2403
2377
-
.. attribute:: rows
2404
+
:type: list
2378
2405
2379
-
A list of :ref:`TableRow` objects, each of which represents a row in the table. The order of rows is the same as in the original table. If the table has no rows, this will be an empty list.
2380
2406
2381
-
.. attribute:: col_count
2407
+
.. attribute:: header
2408
+
2409
+
A `TableHeader` object.
2410
+
2411
+
:type: `TableHeader`
2412
+
2413
+
2414
+
.. attribute:: bbox
2415
+
2416
+
The bounding box of all cells of the table header.
2417
+
2418
+
2419
+
:type::ref:`Rect`
2420
+
2382
2421
2383
-
The number of columns in the table (integer).
2384
2422
2385
2423
.. attribute:: row_count
2386
2424
2387
-
The number of rows in the table (integer).
2425
+
Number of rows in the table.
2388
2426
2389
-
.. method:: extract
2427
+
:type: int
2390
2428
2391
-
Returns a (row-major) list of lists representing the plain text of the table cells. Each sublist contains the text of one row, and each item in that sublist is the text of one cell in that row. So, `Table.extract()[i][j]` will return the text of the cell in row ``i`` and column ``j``. If a cell is empty, the corresponding item will be an empty string. If the corresponding boundary box is ``None``, the item will also be ``None``.
Returns a string in `GitHub Markdown format <https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/organizing-information-with-tables>`_ representing the table. The string will contain a header line with column names, followed by a separator line, and then the rows of the table. The text of each cell will be enclosed in pipe characters `|`, and each row will be separated by a newline character `\n`. **Line breaks inside a cell** are being replaced by the HTML `<br>` tag. Bold, italic, mono-spaced and strikethrough text will be styled according to the corresponding Markdown syntax.
2432
+
Number of columns in the table.
2396
2433
2397
-
- Bold text will be enclosed in double asterisks ``"**"``.
2434
+
:type: int
2398
2435
2399
-
- Italic text will be enclosed in single underscore ``"_"``.
2400
2436
2401
-
- Mono-spaced text will be enclosed in backticks ``"`"``.
2437
+
.. attribute:: rows
2402
2438
2403
-
- Strikethrough text will be enclosed in double tildes ``"~~"``.
2404
-
2405
-
:arg bool clean: if ``True``, any hyphen "-" in the text is replaced by a ``"-"`` character.
2406
-
2407
-
:arg bool fill_empty: if ``True``, empty cells will be filled with a copy of neighboring cells in an effort to indicate potential column and row spans.
2439
+
An array of `TableRow` objects for each row in the table.
2408
2440
2409
-
* For each row and starting with index 1, the cell content will be replaced with the content of its left neighbor if it is ``None``.
2441
+
:type: list
2410
2442
2411
-
* For each column and starting with index 1, the cell content will be replaced with the content of its upper neighbor if it is ``None``.
2412
2443
2444
+
.. method:: extract()
2413
2445
2414
-
.. method:: to_pandas()
2446
+
Extracts table cell text data into a list.
2415
2447
2416
-
Returns a `pandas.DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_ representing the table. The DataFrame offers a plethora of functions, among them conversion to 20+ file formats (CSV, markdown, JSON, Excel, HD5 etc.). Where necessary, the table can be refined in multiple ways (e.g. deleting empty rows or columns) and mutliple DataFrames can be joined.
A list of strings representing the column names of the `Table`. This is usually the text content of the top row cells, but may instead be content identified above the detected table. The respective situation is encoded in the following attribute.
2455
+
:arg bool clean: If ``True`` then markdown syntax is removed from cell content.
2456
+
:arg bool fill_empty: If ``True`` then cell content `None` is replaced by the values above (columns) or left (rows) in an effort to approximate row and columns spans.
2424
2457
2425
-
.. attribute:: is_external
2426
2458
2427
-
Whether the header is part of the originally detected table (``False``) or was identified above the table (``True``). If ``True``, the header is not part of the table, but is used to identify the columns in the table. In this case, the header text will be used as column names in the extracted data.
2459
+
:type: string
2428
2460
2429
-
.. attribute:: bbox
2430
2461
2431
-
The bounding box of the header given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the header. If the header is not part of the table, this will be the rectangle that contains all cells of the header text, otherwise it is equal of the top row's boundary box.
2462
+
.. method:: to_pandas()
2432
2463
2433
-
.. attribute:: cells
2464
+
Return a `pandas DataFrame <https://pypi.org/project/pandas/>`_ `DataFrame <https://pandas.pydata.org/docs/reference/frame.html>`_ version of the table.
2434
2465
2435
-
A list of tuples of boundary boxes `(x0, y0, x1, y1)` of the cells in the header. Note that cells may also be ``None``, which will happen to prevent any gaps in a rows x columns structure. If the header is not part of the table, this will be the bounding boxes of the header text.
2466
+
:type: pandas DataFrame
2436
2467
2437
2468
2438
-
.. class:: TableRow
2469
+
.. class:: TableHeader
2439
2470
2440
-
An object defining a row in a `Table` found on the page. Attributes of interest:
2471
+
Dedicated class for table headers.
2441
2472
2442
2473
.. attribute:: bbox
2443
2474
2444
-
The bounding box of the row given as a tuple `(x0, y0, x1, y1)`. This is the rectangle that contains all cells of the row.
2475
+
The bounding box of the union of cells belonging to the table header, given as a tuple (x0, y0, x1, y1). This rectangle contains all table header cells.
2476
+
2477
+
:type::ref:`Rect`
2445
2478
2446
2479
.. attribute:: cells
2447
2480
2448
-
A list of tuples of boundary boxes `(x0, y0, x1, y1)` of the cells in this row. Note that cells may also be ``None`` objects, which will happen to prevent gaps in a rows x columns structure.
2481
+
A list of tuples for each bbox of a column header.
2482
+
2483
+
:type: list
2484
+
2485
+
.. attribute:: names
2486
+
2487
+
A list of strings with column header text.
2488
+
2489
+
:type: list
2490
+
2491
+
.. attribute:: external
2492
+
2493
+
A boolean indicating whether the header is outside the table cells.
Copy file name to clipboardExpand all lines: docs/vars.rst
+3-5Lines changed: 3 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -77,10 +77,8 @@ Constants
77
77
78
78
.. py:data:: pymupdf_date
79
79
80
-
ISO timestamp *YYYY-MM-DD HH:MM:SS* when these bindings were built.
81
-
82
-
:type: string
83
-
80
+
Disabled (set to None) in 1.26.1.
81
+
84
82
.. py:data:: version
85
83
86
84
(pymupdf_version, mupdf_version, timestamp) -- combined version information where `timestamp` is the generation point in time formatted as "YYYYMMDDhhmmss".
Copy file name to clipboardExpand all lines: docs/version.rst
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
----
2
2
3
-
This documentation covers **PyMuPDF v1.26.0** features as of **2025-05-22 00:00:01**.
3
+
This documentation covers **PyMuPDF v1.26.1**.
4
4
5
5
The major and minor versions of |PyMuPDF| and |MuPDF| will always be the same. Only the third qualifier (patch level) may deviate from that of |MuPDF|.
0 commit comments