Spancat Model optimizing performance and accuracy #11642
Replies: 1 comment 3 replies
-
Hey @monWork, thanks for the question! This sounds like a cool applied use of SpanCat. Let me start out by saying that 62 to 68 F1 in span categorization is actually pretty good, especially considering the qualities of your data. There are two metrics I'm using to make this assessment. First is your span length, which impacts your metrics because the scores are based on an exact match, the implication being that if your model predicts 49 of the 50 tokens of the longest span correctly (missing 1 token), it doesn't count towards an accurate prediction. The second metric is your span distinctiveness value, which is less than 1. This means that the spans that you've categorized as The biggest step towards improvement I think is going to be refining your task and data. Do you have examples of your data you can share? That would help provide some specific suggestions on how to improve here. A general piece of advice would be to more specifically define what constitutes "negative language", in a way that might lead you to annotate spans more consistently and ideally annotate spans that have shorter length (this way you mitigate the long spans problem mentioned above). Keep in mind with shorter spans, you can still take a post-processing step to combine categorized spans in a way that make sense for your problem - for example, by saying if two spans occur within some distance of each other, they should be merged into a single span by a downstream component. Basically, don't try and solve your entire problem with one component - if you can decompose it into smaller problems that are solved more successfully individually, that's always going to help the overall performance of your system. As for hyperparameter searches and WandB configurations, we've got a good FAQ on that: #10625. Generally spaCy's defaults are pretty good, I personally haven't found the need to do hyperparameter searches if I've defined the problem well. In this sense, you can interpret low scores as an opportunity to refine your task and data, and recognize those low scores are likely not because you don't have the optimal hyperparameters - which will only truly get you gains of a few points after you've got quality data. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I am trying to create a model to detect a negative language (specially related to crime or fraud) .
I have annotated data from prodigy and my span length varies from 1 to 50 tokens. I tried to follow from spancat experimental project.
So I need to understand few details in order to improve my performance and accuracy (currently I am ranging 62-68 F1 score).
I tried looking into wandb for this, would that be helpful to find the right values for hyperparameters ,including ngrams suggestor sizes.
Lastly how many training examples would be sufficient for model to start learning it, currently i have around 1000 examples to train on and i have been gradually increasing the training size and training model for every new batch, going from 600 -1000 examples hasnt produced much of significant difference.
Can you share an example of how to use ngram suggestor range in wandb as well
I need to get around 85 of F1 score so it would be great if you can help me out in the right direction for this.
Beta Was this translation helpful? Give feedback.
All reactions