Catch parsing mistakes #26

manuelrech · 2024-03-20T15:09:37Z

Sometimes with scanned pages we get '[NO_BLOCKS] PDF parsing resulted in empty content' and with GROBID parsing errors we get '[GENERAL] An exception occurred while running Grobid.'

to catch these errors we need some additional logic

Sometimes with scanned pages we get '[NO_BLOCKS] PDF parsing resulted in empty content' and with GROBID parsing errors we get '[GENERAL] An exception occurred while running Grobid.' to catch these errors we need some additional logic

I have removed the xml waring by setting features = 'xml' and with some small adjustments

with new xml parser we need a different checking system

manuelrech added 3 commits March 20, 2024 16:09

Catch parsing mistakes

9d34e2b

Sometimes with scanned pages we get '[NO_BLOCKS] PDF parsing resulted in empty content' and with GROBID parsing errors we get '[GENERAL] An exception occurred while running Grobid.' to catch these errors we need some additional logic

Remove xml - html warning

5a67ba8

I have removed the xml waring by setting features = 'xml' and with some small adjustments

update checks on wrongly parsed articles

0d8252d

with new xml parser we need a different checking system

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Catch parsing mistakes #26

Catch parsing mistakes #26

Uh oh!

manuelrech commented Mar 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Catch parsing mistakes #26

Are you sure you want to change the base?

Catch parsing mistakes #26

Uh oh!

Conversation

manuelrech commented Mar 20, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant