Skip to content

Discrepancy between how NER Annotator and spaCy are handling certain Unicode characters #119

@elifbeyzatok00

Description

@elifbeyzatok00

I wanted to display a json file labeled with spacy displacy. But the problem persists.

I carefully label in the tool:
image

When I view it with spacy displacy, irrelevant places are labeled, but the places that should be are not:
image

The code that I used to view labeled text with spacy displacy:

import json
import spacy
from spacy import displacy

# Spacy modelini yükle
nlp = spacy.load("en_core_web_sm")

# JSON dosyasının yolunu belirtin
file_path = "/content/annotations.json"

# JSON dosyasını açıp verileri yükleyin
with open(file_path, 'r', encoding='utf-8') as file:
    data = json.load(file)

    if 'annotations' in data:
        for annotation in data['annotations']:
            if annotation is not None:
                text = annotation[0]  # Metin
                entities = [(ent[0], ent[1], ent[2]) for ent in annotation[1]['entities']]  # Varlıklar

                # Displacy için gereken formatta veriyi hazırlayın
                spacy_displacy_data = {
                    "text": text,
                    "ents": [{"start": start, "end": end, "label": label} for start, end, label in entities],
                    "title": None
                }

                # Displacy ile görselleştirme yapın
                displacy.render(spacy_displacy_data, style="ent", manual=True, jupyter=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions