Replies: 2 comments 1 reply
-
Above example: In [1]: import fitz
In [2]: doc=fitz.open("Table1.pdf")
In [3]: page=doc[0]
In [4]: tabs = page.find_tables() # detect the tables
In [5]: len(tabs.tables)
Out[5]: 1
In [6]: tab = tabs[0]
In [7]: for e in tab.extract():
...: print(e)
...:
['大撒大撒 1', 'we are1', '大丈夫です 1', '큰 스프레드 1', '特色 1']
['大撒大撒 2', 'we are2', '大丈夫です 2', None, '特色 2']
['大撒大撒 3', 'we are3', '大丈夫です 3', '큰 스프레드 3', '特色 3']
['大撒大撒 4', 'we are4', '大丈夫です 4', '큰 스프레드 4', '特色 4']
['大撒大撒 5', 'we are5', '大丈夫です 5', '큰 스프레드 5', None]
['大撒大撒 6', 'we are6', '大丈夫です 6', '큰 스프레드 6', '特色 6']
In [8]: |
Beta Was this translation helpful? Give feedback.
1 reply
-
Closing this because on discord it was found to be a problem with an old release of PyMuPDF. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When extracting text by line, how to extract table structures and content?
Table1.pdf
For example, extract table structure like html code
Beta Was this translation helpful? Give feedback.
All reactions