Skip to content

Conversation

@OhashiReon
Copy link
Contributor

Fix: Close annotation span at end of HTML generation

This PR fixes an edge case in EncodingVisualizer.__make_html where an annotation <span> is not closed if the text ends inside an annotation.

Example

Input

from tokenizers import Tokenizer
from tokenizers.tools import EncodingVisualizer, Annotation

tokenizer = Tokenizer.from_pretrained("bert-base-cased")
visualizer = EncodingVisualizer(tokenizer, default_to_notebook=False)

text = "Hello world"
annotations = [Annotation(6, 11, "NOUN")]

html_output = visualizer(text, annotations)
print(html_output)

Before

<html>
    <style>
    </style>
    <body>
        <div class="tokenized-text" dir=auto>
            <span class="token odd-token"  >
                Hello
            </span>
            <span class="non-token"  >            
            </span>
            <span class="annotation" style="color:hsl(10,32%,64%)" data-label="NOUN">
                <span class="token even-token"  >
                    world
                </span>
            <!-- annotation span is not closed -->
        </div>
    </body>
</html>

After

<html>
    <style>
    </style>
    <body>
        <div class="tokenized-text" dir=auto>
            <span class="token odd-token"  >
                Hello
            </span>
            <span class="non-token"  >            
            </span>
            <span class="annotation" style="color:hsl(10,32%,64%)" data-label="NOUN">
                <span class="token even-token"  >
                    world
                </span>
            </span> <!-- annotation span is now properly closed -->
        </div>
    </body>
</html>

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah ty

@ArthurZucker ArthurZucker merged commit ecad3f1 into huggingface:main Dec 16, 2025
26 of 27 checks passed
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants