Spancat Model optimizing performance and accuracy #11642

monWork · 2022-10-13T13:47:43Z

monWork
Oct 13, 2022

Hi
I am trying to create a model to detect a negative language (specially related to crime or fraud) .
I have annotated data from prodigy and my span length varies from 1 to 50 tokens. I tried to follow from spancat experimental project.
So I need to understand few details in order to improve my performance and accuracy (currently I am ranging 62-68 F1 score).

How does the length of spans play a role in spancat model ? Do the examples have to equally sampled on the number of ngrams? My current distribution looks like something

Span Type      Length   SD     BD
------------   ------   ----   ----
NEG_NEWS       4.06     0.57   1.29
------------   ------   ----   ----
Wgt. Average   4.06     0.57   1.29

ℹ Over 90% of spans have lengths of 1 -- 21 (min=1, max=49). The most common
span lengths are: 1 (33.96%), 2 (4.17%), 3 (4.87%), 4 (5.66%), 5 (4.95%), 6
(1.65%), 14 (2.28%), 15 (1.73%), 16 (2.04%), 18 (1.73%), 19 (1.34%), 21 (1.18%).

How to tune the other hyperparameters for the model ?

dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null

I tried looking into wandb for this, would that be helpful to find the right values for hyperparameters ,including ngrams suggestor sizes.

Lastly how many training examples would be sufficient for model to start learning it, currently i have around 1000 examples to train on and i have been gradually increasing the training size and training model for every new batch, going from 600 -1000 examples hasnt produced much of significant difference.
Can you share an example of how to use ngram suggestor range in wandb as well

    parameters_dict = {
            "training.dropout": {
                "min": 0.05,
                "max": 0.5,
            },
            "training.optimizer.learn_rate": {"min": 0.001, "max": 0.01}
        }

I need to get around 85 of F1 score so it would be great if you can help me out in the right direction for this.

pmbaumgartner · 2022-10-13T18:15:59Z

pmbaumgartner
Oct 13, 2022

Hey @monWork, thanks for the question! This sounds like a cool applied use of SpanCat.

Let me start out by saying that 62 to 68 F1 in span categorization is actually pretty good, especially considering the qualities of your data. There are two metrics I'm using to make this assessment. First is your span length, which impacts your metrics because the scores are based on an exact match, the implication being that if your model predicts 49 of the 50 tokens of the longest span correctly (missing 1 token), it doesn't count towards an accurate prediction. The second metric is your span distinctiveness value, which is less than 1. This means that the spans that you've categorized as NEG_NEWS aren't that distinct relative to the rest of your training corpus. Think of that this way: if you label one sentence as relevant (e.g. "This is bad"), but there are other sentences in the corpus that you have not labeled the same way, the model will have a hard time picking up on what features truly distinguish relevant spans.

The biggest step towards improvement I think is going to be refining your task and data. Do you have examples of your data you can share? That would help provide some specific suggestions on how to improve here. A general piece of advice would be to more specifically define what constitutes "negative language", in a way that might lead you to annotate spans more consistently and ideally annotate spans that have shorter length (this way you mitigate the long spans problem mentioned above). Keep in mind with shorter spans, you can still take a post-processing step to combine categorized spans in a way that make sense for your problem - for example, by saying if two spans occur within some distance of each other, they should be merged into a single span by a downstream component. Basically, don't try and solve your entire problem with one component - if you can decompose it into smaller problems that are solved more successfully individually, that's always going to help the overall performance of your system.

As for hyperparameter searches and WandB configurations, we've got a good FAQ on that: #10625. Generally spaCy's defaults are pretty good, I personally haven't found the need to do hyperparameter searches if I've defined the problem well. In this sense, you can interpret low scores as an opportunity to refine your task and data, and recognize those low scores are likely not because you don't have the optimal hyperparameters - which will only truly get you gains of a few points after you've got quality data.

3 replies

monWork Oct 28, 2022
Author

@pmbaumgartner Thanks for the reply.

Can you help me with understanding the SD metric since following your suggestion i tried re annotating the spans and make sure the same content is consistently tagged everywhere.

Sharing some examples:

`{"text":"The Tony Blair Institute has accused the Guardian of deliberately misrepresenting the purchase of Cherie Blair's office premises.","spans":[{"text":"accused","start":29,"token_start":5,"token_end":5,"end":36,"type":"span","label":"NEG_NEWS"},{"start":50,"end":94,"token_start":8,"token_end":12,"label":"NEG_NEWS"}],"meta":{"rule":"final_review_corrections","span_lengths":[1,5],"span_texts":["accused","of deliberately misrepresenting the purchase"]}}
{"text":"block-time updated-timeUpdated at 11.29pm GMT block-time published-time 11.11pm GMT Switching to international news for a moment - Reuters and the ABC are reporting that Myanmar's leader, Aung San Suu Kyi, has been detained by the military, along with other figures from her government.","spans":[{"text":"detained by the military, along with other figures from her government","start":215,"token_start":43,"token_end":54,"end":285,"type":"span","label":"NEG_NEWS"}],"meta":{"rule":"final_review_corrections","span_lengths":[12],"span_texts":["detained by the military, along with other figures from her government"]}}

{"text":"I have just been assaulted by the candidate senator ally of Ramiro Suarez, Mr. Corzo #CorzoPateaPernia https://t.co/JDOat27LzZ - gregorio pernia1 (@gregoriopernia) February 25, 2018 In the audiovisual record, one can see and hear how the actor claims the congressman for his behavior and calls him " corrupt ", while Corzo responds by accusing him of being a " rapist ".","spans":[{"text":"assaulted by","start":17,"token_start":4,"token_end":5,"end":29,"type":"span","label":"NEG_NEWS"},{"start":301,"end":308,"token_start":53,"token_end":53,"label":"NEG_NEWS"},{"start":336,"end":370,"token_start":60,"token_end":67,"label":"NEG_NEWS"}],"meta":{"rule":"final_review_corrections","span_lengths":[2,1,8],"span_texts":["assaulted by","corrupt","accusing him of being a " rapist ""]}}

{"text":"Vladlen Seydaliyev was fined 30,000 roubles, Server Aliyev 15,000 roubles, Ruslan Umerov 25,000 roubles, Zarema Akhtemova, Asan Akhtemov's mother, 10,000 roubles, Alim Mamutov 10,000 roubles and Riza Seytveliyev 7,000 roubles, the Avdet weekly website reported.","spans":[{"text":"fined","start":23,"token_start":3,"token_end":3,"end":28,"type":"span","label":"NEG_NEWS"},{"text":"30,000 roubles","start":29,"token_start":4,"token_end":5,"end":43,"type":"span","label":"NEG_NEWS"},{"text":"15,000 roubles","start":59,"token_start":9,"token_end":10,"end":73,"type":"span","label":"NEG_NEWS"},{"text":"25,000 roubles","start":89,"token_start":14,"token_end":15,"end":103,"type":"span","label":"NEG_NEWS"},{"text":"10,000 roubles","start":147,"token_start":25,"token_end":26,"end":161,"type":"span","label":"NEG_NEWS"},{"text":"10,000 roubles","start":176,"token_start":30,"token_end":31,"end":190,"type":"span","label":"NEG_NEWS"},{"text":"7,000 roubles","start":212,"token_start":35,"token_end":36,"end":225,"type":"span","label":"NEG_NEWS"}],"meta":{"rule":"final_review_corrections","span_lengths":[1,2,2,2,2,2,2],"span_texts":["fined","30,000 roubles","15,000 roubles","25,000 roubles","10,000 roubles","10,000 roubles","7,000 roubles"]}}

{"text":"A statement the embassy released on Twitter last night follows reports in French media that Khalid Aedh al-Otaibi was arrested at Paris's Charles de Gaulle airport, where he was about to travel to Riyadh.","spans":[{"text":"arrested","start":118,"token_start":21,"token_end":21,"end":126,"type":"span","label":"NEG_NEWS"}],"meta":{"rule":"final_review_corrections","span_lengths":[1],"span_texts":["arrested"]}}

{"text":"The figure in yuan - equivalent to PS312m - that GSK is accused of funnelling to facilitate bribes to doctors and officials","spans":[{"text":"accused of funnelling to facilitate bribes to doctors and officials","start":56,"token_start":13,"token_end":22,"end":123,"type":"span","label":"NEG_NEWS"}],"meta":{"rule":"final_review_corrections","span_lengths":[10],"span_texts":["accused of funnelling to facilitate bribes to doctors and officials"]}}
`

monWork Oct 28, 2022
Author

I still have SD metrics of 0.53.
`Spans Key Labels

sc {'NEG_NEWS'}

ℹ Span characteristics for spans_key 'sc'
ℹ SD = Span Distinctiveness, BD = Boundary Distinctiveness

Span Type Length SD BD

NEG_NEWS 4.7 0.53 1.22
Wgt. Average 4.7 0.53 1.22

ℹ Over 90% of spans have lengths of 1 -- 19 (min=1, max=65). The most common
span lengths are: 1 (24.52%), 2 (5.25%), 3 (6.74%), 4 (6.61%), 5 (6.68%), 6
(6.81%), 7 (3.68%), 8 (4.22%), 9 (5.18%), 10 (3.75%), 11 (2.93%), 12 (1.91%), 13
(2.32%), 14 (2.86%), 15 (1.98%), 16 (2.45%), 18 (1.23%), 19 (1.43%).`

pmbaumgartner Nov 21, 2022

Hey @monWork - How are you defining a negative news title? I'm asking because I think there's probably not enough context for the model to understand what you want with some of these spans. For example, just the span "30,000 rubles" could be interpreted as anything - there's not enough information within that span to know what makes it positive or negative.

I would think about your annotation guidelines here - how would you describe this task to someone who has no experience doing this so that they could accomplish annotation?

In summary: I don't think this is a problem with the model, but a problem with the task and annotation. If you can specify your task in a way that examples could be consistently annotated by a non-expert.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Spancat Model optimizing performance and accuracy #11642

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Spancat Model optimizing performance and accuracy #11642

Uh oh!

Uh oh!

monWork Oct 13, 2022

Replies: 1 comment · 3 replies

Uh oh!

pmbaumgartner Oct 13, 2022

Uh oh!

monWork Oct 28, 2022 Author

Uh oh!

monWork Oct 28, 2022 Author

Uh oh!

pmbaumgartner Nov 21, 2022

monWork
Oct 13, 2022

Replies: 1 comment 3 replies

pmbaumgartner
Oct 13, 2022

monWork Oct 28, 2022
Author

monWork Oct 28, 2022
Author