Skip to content

Conversation

@ChangHyukSoo
Copy link

Two non-space characters were separated in the PDF file, but they were not separated when I get texts using fz_new_buffer_from_stext_page().
By modifying do_flatten() in fitz/util.c, I resolved this case.
I attach a sample PDF file for you to reproduce my case.

2023_insert_18.pdf

I also attach my test source.
zzget_text.cpp.txt

Before modification: "2023년중세계경제는고금리・고물가지속,"
After modification: "2023년 중 세계경제는 고금리・고물가 지속," <-- This is the correct result.

Insert a space between separated non-space characters when extracting texts from a PDF file.
@ChangHyukSoo
Copy link
Author

I have read the CLA Document and I hereby sign the CLA

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant