Skip to content
Discussion options

You must be logged in to vote

Hi All,

I have written a function that can help you to know whether the data point is misaligned.

import pandas as pd
def misaligned_data(data):
    from collections import Counter
    l=[]
    f_t=[]
    ent=[]
    for i in data:
        txt=i[0]
        e=i[1]['entities']
        f_t.append(txt)
        ent.append(e)
        f=spacy.gold.biluo_tags_from_offsets(nlp.make_doc(txt),e)
        if '-' in f:
            l.append(0)
        else:
            l.append(1)
    dt=pd.DataFrame(
    {'Text': f_t,
     'Entities': ent,
     'Label': l
    })
    print('Distribution',Counter(l))
    return dt  

This function will return a data frame with "Text, Entities, and Label" columns. The Labe…

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
1 reply
@ambuje
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by ambuje
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / training Feature: Training utils, Example, Corpus and converters v2 spaCy v2.x
2 participants
Converted from issue

This discussion was converted from issue #11497 on September 14, 2022 09:09.