Skip to content

page.find_tables how to extract words in table cell  #3755

@wangqiangJN

Description

@wangqiangJN

Is your feature request related to a problem? Please describe.
pymupdf version 1.24.9

1 I want to parse my pdf , my pdf contains tables and other text.
2.when I use page. find_tables ,it can extract text in cell , but I find when cell has multi words ,as
example cell
price:520 people:bob
expect result by words :
price
520
people
bob
but results now
price:520 people:bob
3. so i want to split table cell content by words and get bbox, here have any function method and solution?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions