Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #182

dsoft-jvo · 2024-06-21T14:56:50Z

dsoft-jvo
Jun 21, 2024

I use this table-transformer code to extract the tables and table structures of invoices. Without adding the --words_dir argument, the result is very satisfactory. From my understanding, the words_dir is needed to add the contents of the found structures to the result, so I tried adding it. After adding one, however, the result is strange. The detected table gets shrunk to a small corner of the image and the table-structures all overlap each other. At first, this seemed like a scaling problem, but after fixing this, the problem persists.

Aside from the visual result, the 'tables_structure' output is also strange when a --words_dir is added. Without --words_dir the amount of rows and columns seems to be constant. When adding the --words_dir, however, the amount of rows and columns varies. Sometimes there are more, sometimes less. The tokens are formatted as described in the docs/INFERENCE.MD document.

I cannot show any actual data or images, as the data is sensitive, but this is what I found during debugging:

Without --words_dir, i.e. tokens=[]:

With a --words_dir, i.e. tokens=[...data...]:

I feel like the problem lies in a misunderstanding I have about the functions of the --words_dir data. I have read the papers, but I feel like I am missing something about that aspect.

Could someone give some further explanation about the use and function of --words_dir? Are the results I am seeing expected? Why, or why not? And if not, how do I go about fixing them?

dsoft-tba · 2024-06-24T13:47:02Z

dsoft-tba
Jun 24, 2024

Fixed by this pull request:
#184

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #182

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Adding a words_dir (word tokens) lowers the amount of rows present in the tables_structure and skews the result #182

Uh oh!

Uh oh!

dsoft-jvo Jun 21, 2024

Replies: 1 comment

Uh oh!

dsoft-tba Jun 24, 2024

dsoft-jvo
Jun 21, 2024

dsoft-tba
Jun 24, 2024