Fix/images in table cells issue 21 by anup00900 · Pull Request #340 · pymupdf/pymupdf4llm

anup00900 · 2025-11-26T20:30:54Z

Fixes #21

Problem

Images in table cells were appearing below tables instead of inside the cells.

Solution

Implemented image detection within table cells for both legacy and layout modes.

Before Fix

Product	Preview
Widget

After Fix

Product	Preview
Widget

Technical Changes

Legacy Mode (pymupdf_rag.py) - All users:

Added add_images_to_table_markdown() function
Detects images with >50% bbox overlap with cells
Generates unique filenames for table cell images
Inserts ![image](path) markdown inline
Updated 3 locations calling table.to_markdown()

Layout Mode (document_layout.py) - pymupdf_layout users:

Include image blocks (type==1) in table_blocks
Enhanced extract_cells() for image handling

Testing

Tested with realistic product catalogs (5 products)
100% success rate (all images in correct cells)
Works with write_images and embed_images modes
Backward compatible

Benefits

Solves exact Issue Images in table #21 use case
Works for all users (not just commercial)
No breaking changes

Modified extract_cells() to detect and extract image blocks (type==1) within table cells, not just text blocks (type==0). Changes: - Updated extract_cells() to accept page and document parameters - Added logic to detect image blocks within cell bounding boxes - Implemented image extraction and saving for cells with images - Images are now embedded in cell markdown as ![image](path) syntax - Updated table_to_markdown() and table_extract() signatures - Updated calls in document_layout.py to pass page/document context - Added test script to demonstrate the fix When write_images=True or embed_images=True, images found in table cells are now properly extracted and referenced inline within the cell markdown, resolving the issue where images appeared below tables.

This fix enables images to appear inside their corresponding table cells instead of being extracted separately below the table. Changes for LEGACY MODE (pymupdf_rag.py): - Added add_images_to_table_markdown() function to detect images within table cell boundaries - Images with >50% overlap with a cell are assigned to that cell - Generates unique filenames for table cell images - Supports both write_images and embed_images modes - Inserts ![image](path) markdown syntax inline with cell text - Updated all 3 locations where table.to_markdown() is called Changes for LAYOUT MODE (document_layout.py): - Updated table_blocks to include image blocks (type==1) - Modified extract_cells() to detect and extract images in cells - Added page/document parameters to table extraction functions - Images are extracted and referenced inline in cells TESTING: Fully tested with embedded images in PDFs. All images correctly appear inside their table cells in the markdown output. Before fix: | Col1 | Col2 | Image | |---|---|---| | Text | Text | | ![image1](image1.png) After fix: | Col1 | Col2 | Image | |---|---|---| | Text | Text | ![image1](image1.png) | Resolves the requested behavior from Issue pymupdf#21.

anup00900 · 2025-11-27T05:14:02Z

I have read the CLA Document and I hereby sign the CLA

anup.roy and others added 2 commits November 26, 2025 15:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/images in table cells issue 21#340

Fix/images in table cells issue 21#340
anup00900 wants to merge 2 commits intopymupdf:mainfrom
anup00900:fix/images-in-table-cells-issue-21

anup00900 commented Nov 26, 2025

Uh oh!

anup00900 commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

anup00900 commented Nov 26, 2025

Problem

Solution

Before Fix

After Fix

Technical Changes

Testing

Benefits

Uh oh!

anup00900 commented Nov 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant