Skip to content
This repository was archived by the owner on Dec 9, 2018. It is now read-only.

How to get hidden element using --correct-text-visibility option? #779

@alx-dev

Description

@alx-dev

Hello.
I have a PDF: https://drive.google.com/open?id=1I4VU4bY2J2XWHryxRjN7eA_ZlI9uGIyT

After parsing with --correct-text-visibility, pdf2htmlEX add fc6 sc0 class in hidden elements:
<div class="t m0 xd he y37 ff2 fs3 fc2 sc0 ls0 ws0">
<span class="fc6 sc0">MEDICAL </span>
<span class="fc1">
<span class="fc6 sc0">PLANS</span>
<span class="fc5">
<span class="fc6 sc0"></span>
</span>
</span>
</div>
<div class="t m0 xd h1 y38 ff1 fs0 fc0 sc0 ls0 ws0">
<span class="fc6 sc0"> </span>
</div>
<div class="t m0 xd he y39 ff2 fs3 fc5 sc0 ls0 ws0">
<span class="fc6 sc0">MEDICAL</span>
<span class="fc0">
<span class="fc6 sc0"> </span>
<span class="fc1"><span class="fc6 sc0">PLANS</span></span>
<span class="fc6 sc0"> </span>
</span>
</div>

This tag will change depending on the document? If so, how we can retrieve the hidden items in order to remove them from the document.

Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions