-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Apologies in advance if this is a duplicate issue.
I need to parse a PDF document where some cells contain two separate numbers. See the image below:
I used the following example code:
using (PdfDocument document = PdfDocument.Open("./document2.pdf", options))
{
var page = ObjectExtractor.Extract(document, 1);
var ea = new SpreadsheetExtractionAlgorithm();
IReadOnlyList<Table> tables = ea.Extract(page);
var table = tables[0];
var rows = table.Rows;
using var streamWriter = new StreamWriter("./myjson.json");
new JSONWriter().Write(streamWriter, table);
}
This produces the following (incorrect) result:
When I use Camelot (Python) I get the following (correct) result:
Is this a bug or am I doing something wrong?
A working solution in .NET would be ideal. I appreciate any help.
Metadata
Metadata
Assignees
Labels
No labels