Implement Bytescout SDK's PreserveFormattingOnTextExtraction feature

in ByteScout PDF Extractor SDK the default TextExtraction option is PreserveFormattingOnTextExtraction = true
This preserves the approximate layout of the PDF.  

The ExtractText feature of MuPDF.net ExtractText does not have this option.

As an example, take this PDF:
[statement_sample1.pdf](https://github.com/user-attachments/files/19091725/statement_sample1.pdf)

Running it through the default ExtractText in MuPDF produces:
[statement_sample1.mupdf.txt](https://github.com/user-attachments/files/19091733/statement_sample1.mupdf.txt)

Rows are broken up, tables are completely removed (look at the Checks Paid section)

While running it through the default Bytescout conversion produces this:
[statement_sample1.bytescout.txt](https://github.com/user-attachments/files/19091740/statement_sample1.bytescout.txt)

Bytescout code:
```
                using TextExtractor te = new()
                {
...
                };
                te.LoadDocumentFromFile(pdfPath);
                  te.SaveTextToFile(outputPath);
```

The layout of the PDF is preserved - the Account summary headings and values remain on the same rows - and the Checks Paid remains in a table layout

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement Bytescout SDK's PreserveFormattingOnTextExtraction feature #166

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement Bytescout SDK's PreserveFormattingOnTextExtraction feature #166

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions