(Categorical) Evaluation not what I expect #10085

kchalkSGS · 2022-01-18T22:11:15Z

kchalkSGS
Jan 18, 2022

Hi All,

Apologies if this ends up being a simple mistake, but I've been looking into it for a few days now and losing my mind. Spacy evaluation is different from my evaluation, specifically wrt microprecision.

If I use Language.evaluate(examples) I get ~70% micro precision.

import spacy
from spacy.tokens import DocBin
cnn_pipeline = spacy.load("Data/cnn_out/model-best")
ste_test = DocBin().from_disk("Data/ste-test.spacy")
ste_train = DocBin().from_disk("Data/ste-train.spacy")

exes=[]. #this is a mess. I'd take advice on how to do this better...
for doc in ste_train.get_docs(cnn_pipeline.vocab):
    predicted = cnn_pipeline(doc.text)
    exes.append(spacy.training.Example(predicted = predicted, reference = doc))
    
for doc in ste_test.get_docs(cnn_pipeline.vocab):
    predicted = cnn_pipeline(doc.text)
    exes.append(spacy.training.Example(predicted = predicted, reference = doc))

cnn_pipeline.evaluate(exes)
>>>  {'token_acc': 1.0,
>>>  'token_p': 1.0,
>>>  'token_r': 1.0,
>>>  'token_f': 1.0,
>>>  'cats_score': 0.0708063828776443,
>>>  'cats_score_desc': 'macro F',
>>>  'cats_micro_p': 0.7195071868583163,
>>>  'cats_micro_r': 0.10728063192700998,...}

If I load everything in a dataset and check if the label is the same as the prediction, I get ~32%.

import pandas as pd
def get_cat(text):
    return pd.Series(cnn_pipeline(text).cats).idxmax()

down =pd.read_csv('Data/temp (4).csv')
down['local']=down.text.apply(get_cat)
(down.local == down.STEType_mapped).value_counts()
 >>> False    11185
 >>> True      5422

Have I failed to understand micro precision? Do I need more arguments on evaluate to make sure it only takes credit for the top categorical prediction? I was not surprised to see 70%. I am surprised to see 30%-- if anything I expected this to overfit and I'm evaluating on the training data here. Should I have specified a different evaluation metric for training somehow? Am I losing touch with reality? I know this isn't fully replicable without the data. I can hear my grad school advisor in my head telling me to start over with a dummy set... But I'm asking the internet instead. If nothing else, I would love to hear a better way to get to evaluation from DocBins. Thanks for your help.
-K

Answered by kchalkSGS

Jan 18, 2022

Gosh dang it... at least asking for help publicly solved my problem... Like those people who smoke cigarettes to get a bus to show up. As you may have guessed, the data in the doc bin was not what I remembered it being. The 'text' in down was much longer than the examples in the docbins which were used for training. I would still love to get feedback on the mangled way I'm manipulating these datasets. Perhaps there is an easy-ish way to compare the text/examples in one dataset to that in a pandas DF? Or an especially good way to move pandas into docbin? I would love to be better at this intersection...

View full answer

kchalkSGS · 2022-01-18T22:40:34Z

kchalkSGS
Jan 18, 2022
Author

Gosh dang it... at least asking for help publicly solved my problem... Like those people who smoke cigarettes to get a bus to show up. As you may have guessed, the data in the doc bin was not what I remembered it being. The 'text' in down was much longer than the examples in the docbins which were used for training. I would still love to get feedback on the mangled way I'm manipulating these datasets. Perhaps there is an easy-ish way to compare the text/examples in one dataset to that in a pandas DF? Or an especially good way to move pandas into docbin? I would love to be better at this intersection...

3 replies

pmbaumgartner Jan 19, 2022

Perhaps there is an easy-ish way to compare the text/examples in one dataset to that in a pandas DF?

Looking at your pandas and spaCy code in the second part, I might have a suggestion.

Is your model saved to disk with the classifier as a pipeline component? You could do something like this if so:

nlp = spacy.load("your_model")
down = pd.read_csv('Data/temp (4).csv')

texts = (row.text for row in down.itertuples()) 

classifications = []
for doc in nlp(texts, n_process=-1):
   classifications.append(doc.cats)

classifications_df = pd.DataFrame(classifications)
class_predictions = classifications_df.idxmax(axis=1)

confusion_matrix = pd.crosstab(down["STEType_mapped"], class_predictions)

The big difference here compared to what you have is that this will make a generator of the documents so that only one is in memory at a time, then process them in parallel through the classification pipeline. This is much more efficient than individually calling the classifier on each individual doc, which is my understanding of what you were doing in your original example.

kchalkSGS Jan 20, 2022
Author

Thank you! This makes a lot of sense as the way to manage the pandas side of the evaluation. I think the df to text list can be done as texts = down.text.to_list() which I think is more efficient on the pandas side. I didn't realize you could provide a list of texts to the pipeline and that it would parallelize, so that is very helpful.

How is best to feed texts from a docbin into a pipeline?

(Also, fyi to the world: I have not actually managed to get my evaluations to match yet. I have identified problems, but not solutions)

pmbaumgartner Jan 21, 2022

How is best to feed texts from a docbin into a pipeline?

I think what you have is pretty close to optimal. You can now feed docs straight into NLP pipelines, so you could do that instead of using the doc.text attribute and reprocessing that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(Categorical) Evaluation not what I expect #10085

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

(Categorical) Evaluation not what I expect #10085

Uh oh!

kchalkSGS Jan 18, 2022

Replies: 1 comment · 3 replies

Uh oh!

kchalkSGS Jan 18, 2022 Author

Uh oh!

pmbaumgartner Jan 19, 2022

Uh oh!

kchalkSGS Jan 20, 2022 Author

Uh oh!

pmbaumgartner Jan 21, 2022

kchalkSGS
Jan 18, 2022

Replies: 1 comment 3 replies

kchalkSGS
Jan 18, 2022
Author

kchalkSGS Jan 20, 2022
Author