Skip to content
Discussion options

You must be logged in to vote

You are reusing the same dictionary for every instance:

    data_label = {}

    # ...

    nlp = spacy.blank(lang)
    db = DocBin()
    for text, label in tqdm(data, total=len(data)):
        data_label[label] = 1.0 # <- HERE

So, eventually all labels will be set to a probability of 1.0 and all documents will have the same label dict. So, each doc will have a probability of 1.0 for all labels. You could fix this issue by making a copy of the label dict for each doc and setting the label in the copy.

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@Shiyinq
Comment options

@danieldk
Comment options

@Shiyinq
Comment options

@danieldk
Comment options

Answer selected by danieldk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / textcat Feature: Text Classifier
2 participants