Skip to content

Commit 9287221

Browse files
authored
feat: add debug module with PageImage for visualization (#18)
* feat: add debug module with PageImage for visualization * feat: support string color names and RGBA tuples in debug methods * fix: fix lint errors in debug module
1 parent 698b772 commit 9287221

File tree

11 files changed

+1686
-281
lines changed

11 files changed

+1686
-281
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1515
- Add `Page.page_idx` property: zero-based index of the page within its document
1616
- Add `Page.rotation_degrees` property: clockwise rotation of the page in degrees
1717
- Add `Page.clear_cache()` method as the canonical name for clearing cached objects
18+
- Add `tablers.debug` module with `PageImage` class for visualizing detected tables and edges on a rendered page image; requires the optional `debug` extra (`pip install tablers[debug]`)
1819

1920
### Changed
2021

docs/getting_started/installation.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,18 @@ The recommended way to install Tablers is via pip:
1818
pip install tablers
1919
```
2020

21+
## Optional Dependencies
22+
23+
### Debug / Visualization
24+
25+
The `tablers.debug` module provides tools for visualizing detected tables, edges, and intersection points on a rendered page image. It requires two additional packages:
26+
27+
```bash
28+
pip install tablers[debug]
29+
```
30+
31+
This installs `pillow` and `pypdfium2` alongside Tablers. If these packages are not present, importing `tablers.debug` will raise an `ImportError`.
32+
2133
## Building from Source
2234

2335
If you need to build Tablers from source, follow these steps:

docs/reference/api.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -548,3 +548,137 @@ v_edge = Edge("v", 50.0, 0.0, 50.0, 100.0, width=2.0, color=(255, 0, 0, 255))
548548
| `Point` | `tuple[float, float]` | A 2D point (x, y) |
549549
| `BBox` | `tuple[float, float, float, float]` | Bounding box (x1, y1, x2, y2) |
550550
| `Color` | `tuple[int, int, int, int]` | RGBA color (0-255 each) |
551+
552+
---
553+
554+
## Debug Module (`tablers.debug`)
555+
556+
!!! note "Optional dependency"
557+
The debug module requires the `debug` extra. Install it with:
558+
```bash
559+
pip install tablers[debug]
560+
```
561+
562+
### PageImage
563+
564+
Renders a PDF page to a PIL image and provides drawing primitives for annotating detected tables, edges, and intersection points.
565+
566+
```python
567+
from tablers.debug import PageImage
568+
569+
class PageImage:
570+
def __init__(
571+
self,
572+
page: Page,
573+
original: PIL.Image.Image | None = None,
574+
resolution: int | float = 72,
575+
antialias: bool = False,
576+
)
577+
```
578+
579+
**Parameters:**
580+
581+
| Parameter | Type | Default | Description |
582+
|-----------|------|---------|-------------|
583+
| `page` | `Page` | - | The page to render |
584+
| `original` | `Optional[PIL.Image.Image]` | `None` | Pre-rendered image. If `None`, the page is rendered at the given resolution |
585+
| `resolution` | `Union[int, float]` | `72` | Rendering resolution in DPI |
586+
| `antialias` | `bool` | `False` | Enable anti-aliasing during rendering |
587+
588+
**Raises:** `RuntimeError` — If `original` is `None` and the document has already been closed.
589+
590+
!!! note "Password-protected PDFs"
591+
PageImage rendering supports only documents **without a password**. For password-protected PDFs, use `Document.save_to_bytes()` to obtain a decrypted copy, then open it with `Document(bytes=...)` and pass the resulting page to PageImage.
592+
593+
**Attributes:**
594+
595+
| Attribute | Type | Description |
596+
|-----------|------|-------------|
597+
| `original` | `PIL.Image.Image` | The unmodified rendered page image |
598+
| `annotated` | `PIL.Image.Image` | The working copy with all annotations applied |
599+
| `scale` | `float` | Ratio of image pixels to page points (`image_width / page_width`) |
600+
| `bbox` | `BBox` | Page coordinate space: `(0, 0, page.width, page.height)` |
601+
| `resolution` | `Union[int, float]` | The DPI used for rendering |
602+
603+
**Methods:**
604+
605+
| Method | Returns | Description |
606+
|--------|---------|-------------|
607+
| `reset()` | `PageImage` | Discard all annotations and restore `annotated` to `original` |
608+
| `copy()` | `PageImage` | Return a new `PageImage` sharing the same `original` but with an independent `annotated` copy |
609+
| `save(dest, format, quantize, colors, bits, **kwargs)` | `None` | Save the annotated image to a file path or `BytesIO` |
610+
| `show()` | `None` | Display the annotated image (calls `PIL.Image.show`) |
611+
| `_repr_png_()` | `bytes` | Return PNG bytes for Jupyter notebook inline display |
612+
613+
**Drawing methods** (all return `self` for chaining):
614+
615+
| Method | Description |
616+
|--------|-------------|
617+
| `draw_line(points, stroke, stroke_width)` | Draw a polyline. Accepts a tuple or list of two `(x, y)` points |
618+
| `draw_lines(list_of_lines, stroke, stroke_width)` | Draw multiple lines |
619+
| `draw_vline(location, stroke, stroke_width)` | Draw a vertical line spanning the full page height at `x = location` |
620+
| `draw_vlines(locations, stroke, stroke_width)` | Draw multiple vertical lines |
621+
| `draw_hline(location, stroke, stroke_width)` | Draw a horizontal line spanning the full page width at `y = location` |
622+
| `draw_hlines(locations, stroke, stroke_width)` | Draw multiple horizontal lines |
623+
| `draw_rect(bbox, fill, stroke, stroke_width)` | Draw a filled rectangle. Accepts a 4-tuple bbox `(x1, y1, x2, y2)` |
624+
| `draw_rects(list_of_rects, fill, stroke, stroke_width)` | Draw multiple rectangles |
625+
| `draw_circle(center, radius, fill, stroke)` | Draw a circle. Accepts a `(cx, cy)` center tuple |
626+
| `draw_circles(list_of_circles, radius, fill, stroke)` | Draw multiple circles |
627+
| `debug_table(table, fill, stroke, stroke_width)` | Draw a filled rectangle over every cell in a `Table` |
628+
| `debug_tablefinder(tf_settings, **kwargs)` | Draw all detected tables (cell outlines) and detected edges |
629+
630+
**Color arguments** (`fill`, `stroke` in the methods above): accept either an RGBA tuple `(r, g, b, a)` or a string. String colors are resolved via PIL's [`ImageColor.getrgb`](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html). For the list of supported string formats, see the [ImageColor reference](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html). Alpha is set to 255 (opaque) for string colors; for transparency use an RGBA tuple.
631+
632+
**Default color constants** (importable from `tablers.debug`):
633+
634+
| Constant | Value | Description |
635+
|----------|-------|-------------|
636+
| `DEFAULT_FILL` | `(0, 0, 255, 50)` | Semi-transparent blue fill |
637+
| `DEFAULT_STROKE` | `(255, 0, 0, 200)` | Near-opaque red stroke |
638+
| `DEFAULT_STROKE_WIDTH` | `1` | Stroke width in pixels |
639+
| `DEFAULT_RESOLUTION` | `72` | Default rendering DPI |
640+
641+
**Example — visualize table detection in Jupyter:**
642+
643+
```python
644+
from tablers import Document
645+
from tablers.debug import PageImage
646+
647+
with Document("example.pdf") as doc:
648+
page = doc.get_page(0)
649+
img = PageImage(page, resolution=150)
650+
651+
# Draw tables, edges, and intersection points in one call
652+
img.debug_tablefinder()
653+
654+
# Display inline (Jupyter auto-calls _repr_png_)
655+
img
656+
```
657+
658+
**Example — annotate and save:**
659+
660+
```python
661+
from tablers import Document, find_tables
662+
from tablers.debug import PageImage
663+
664+
with Document("example.pdf") as doc:
665+
page = doc.get_page(0)
666+
tables = find_tables(page, extract_text=False)
667+
668+
img = PageImage(page)
669+
for table in tables:
670+
img.debug_table(table)
671+
img.save("annotated.png", quantize=False)
672+
```
673+
674+
**Example — method chaining:**
675+
676+
```python
677+
img = (
678+
PageImage(page)
679+
.draw_hline(200.0)
680+
.draw_vline(300.0)
681+
.debug_tablefinder()
682+
)
683+
img.save("debug.png", quantize=False)
684+
```

docs/usage/advanced.md

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -401,6 +401,92 @@ except RuntimeError as e:
401401
print(f"Runtime error: {e}")
402402
```
403403

404+
## Visualizing Table Detection
405+
406+
The optional `tablers.debug` module lets you render a page to an image and annotate it with detected tables, edges, and intersection points. Install the extra dependencies first:
407+
408+
```bash
409+
pip install tablers[debug]
410+
```
411+
412+
Rendering supports only **documents without a password**. For password-protected PDFs, use `Document.save_to_bytes()` to get a decrypted copy, then open it with `Document(bytes=...)` and pass the resulting page to `PageImage`.
413+
414+
### Quick Visual Debug
415+
416+
`debug_tablefinder()` renders all detection results in one call: cell outlines (blue fill, red border) and detected edges (red lines). You can pass custom colors to `debug_table()` and the drawing methods; `fill` and `stroke` accept either RGBA tuples or strings. For supported string color formats, see the [PIL ImageColor reference](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html).
417+
418+
```python
419+
from tablers import Document
420+
from tablers.debug import PageImage
421+
422+
with Document("example.pdf") as doc:
423+
page = doc.get_page(0)
424+
img = PageImage(page, resolution=150)
425+
img.debug_tablefinder()
426+
427+
# Save to file
428+
img.save("debug.png", quantize=False)
429+
430+
# Or display inline in Jupyter (auto-detected via _repr_png_)
431+
img
432+
```
433+
434+
Pass `TfSettings` or keyword arguments to use non-default detection settings:
435+
436+
```python
437+
img.debug_tablefinder(vertical_strategy="lines", horizontal_strategy="text")
438+
```
439+
440+
### Annotating Individual Tables
441+
442+
Use `debug_table()` to annotate specific tables, or combine it with other drawing methods. Color arguments (`fill`, `stroke`) accept RGBA tuples or strings; for supported string formats see the [PIL ImageColor reference](https://pillow.readthedocs.io/en/stable/reference/ImageColor.html).
443+
444+
```python
445+
from tablers import Document, find_tables
446+
from tablers.debug import PageImage
447+
448+
with Document("example.pdf") as doc:
449+
page = doc.get_page(0)
450+
tables = find_tables(page, extract_text=False)
451+
452+
img = PageImage(page)
453+
454+
# Annotate all tables individually (optional: custom colors; same as default blue/red here)
455+
for table in tables:
456+
img.debug_table(table, fill="blue", stroke="red")
457+
458+
img.save("tables.png", quantize=False)
459+
```
460+
461+
### Drawing Primitives
462+
463+
`PageImage` provides low-level drawing helpers that all return `self` for chaining:
464+
465+
```python
466+
img = (
467+
PageImage(page)
468+
.draw_hline(200.0) # horizontal guide line
469+
.draw_vline(300.0) # vertical guide line
470+
.draw_rect((50, 100, 250, 400)) # arbitrary bbox
471+
.draw_circle((150.0, 250.0), radius=5) # point of interest
472+
)
473+
img.save("annotated.png", quantize=False)
474+
```
475+
476+
### Resetting and Copying
477+
478+
```python
479+
img = PageImage(page)
480+
img.debug_tablefinder()
481+
482+
# Remove all annotations and start fresh
483+
img.reset()
484+
485+
# Create an independent copy to try different annotations
486+
img2 = img.copy()
487+
img2.debug_tablefinder(vertical_strategy="text")
488+
```
489+
404490
## Next Steps
405491

406492
- See [Settings Reference](../reference/settings.md) for all configuration options

0 commit comments

Comments
 (0)