Relation extraction results identical while varying dropout #8891

mlawson35 · 2021-08-05T17:46:17Z

mlawson35
Aug 5, 2021

Hi everyone,

I have had success in building on the great spaCy project implementation of relation extraction (https://github.com/explosion/projects/tree/v3/tutorials/rel_component) to create a working version for my specific project needs.

However, when doing some tuning, I noticed that my model results are exactly the same regardless of choice of dropout rate (with replication seed set) even though they are sensitive to changing other parameters in the .cfg file. Does anyone know why this may be, from either a technical or theoretical standpoint?

The relevant section of my config file is as follows:

[training]
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
accumulate_gradient = 1
patience = 2000
max_epochs = 0
max_steps = 20000
eval_frequency = 100
frozen_components = []
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
before_to_disk = null
seed = ${system.seed}

Please let me know if I can provide any other information, and thanks in advance!

Answered by MatthiasMurray

Aug 5, 2021

I would also be interested in the answer to this. I have found it impacts my rel model training, but not on every data set. From a theoretical standpoint my understanding is that it is a randomization of how often to ignore a weight adjustment/optimization on back propagation. It is meant to help prevent overfitting, which is often due to over representation of certain kinds of examples, which all imply roughly the same information so they aren’t individually as useful. Therefore dropping out a fixed portion of steps makes it easier for less represented portions of the data with more surprising information to shine.

I would guess that to better understand your specific case it would be us…

View full answer

MatthiasMurray · 2021-08-05T19:53:30Z

MatthiasMurray
Aug 5, 2021

I would also be interested in the answer to this. I have found it impacts my rel model training, but not on every data set. From a theoretical standpoint my understanding is that it is a randomization of how often to ignore a weight adjustment/optimization on back propagation. It is meant to help prevent overfitting, which is often due to over representation of certain kinds of examples, which all imply roughly the same information so they aren’t individually as useful. Therefore dropping out a fixed portion of steps makes it easier for less represented portions of the data with more surprising information to shine.

I would guess that to better understand your specific case it would be useful to have some more details about your data, like how large of a data set it is, how uniform/variable it is, and so on. Also would be useful to know what dropout rates you’ve tried (I notice more of a difference at 0.5 than at 0.1) and how you have varied other parameters.

5 replies

mlawson35 Aug 6, 2021
Author

Thanks for your response! So, you have found that there are certain configurations of your data for which REL tuning is not impacted by dropout choice but others where it is? That's very interesting.

The data I am working with is about 2,000 annotated sentences (with named entities and relations) of financial data. So in the scope of written text it is pretty "uniform" as far as boilerplate language and sentence structure much of the time (though I would certainly not say sentences are truly similar to one another). I've tried dropout rates of 0.3, 0.5, and 0.8 so far. The interesting part is that dropout does impact results of the NER model (which is implemented separately), just not relation extraction. Broadly speaking, I've only varied beta1 and beta2 (from RAdam optimizer settings) as well as the specific transformer used.

Thanks again for your input, it's an interesting issue that I can't quite explain!

MatthiasMurray Aug 6, 2021

Very interesting! There is some helpful information here. For one thing, if you have already trained an NER model, might I suggest you add an NER component to the training pipeline of your rel model, using your pretrained model as a starting source for the component? I don’t believe they will improve each other directly, but having both listening to the same transformer or tok2vec component may cause them to indirectly improve one another’s performance. Additionally, keep in mind that the rel_model in rel_component just uses the tensor representation of each entity pair under consideration, so varying sentence contexts may not be as impactful as you expect — for example the distance between entities is not a directly predictive feature for rel_component, except that max_length limits which instances are considered. If you feel like messing with the code for the sake of higher accuracy I did find that adding abs(ent1.start-ent2.start)/len(doc) to each instance before pooling added useful information without requiring rearchitecting the component.

svlandeg Aug 6, 2021

Interesting conversation :-)

Just as a quick sanity check, I double-checked whether there isn't anything wrong with the REL implementation on the dropout front, but the component's update does have a call to set_dropout_rate which should set the drop attribute for all the layers that have it. So that part seems correct.

However, the custom layers/models don't actually use node.attrs["dropout_rate"], so that would explain it. The Maxout layer for instance takes it as argument and chains a Dropout layer. I don't think there's anything such currently in the model definition - it would be a good idea to experiment with it though!

mlawson35 Aug 6, 2021
Author

@svlandeg I see! That makes a ton of sense -- I figured (or hoped, perhaps!) that it was something like that I'd missed. I'll experiment a bit with it and report back here for the sake of knowledge sharing. Thank you for taking the time to look into this and response! And, thank you so much for putting together this REL component -- it has been a fascinating learning experience and an amazing benefit to my use-case. Have a great weekend!

@MatthiasMurray, many thanks again for the help and discussion! I'll certainly implement the tip you provide on incorporating entity distance as well. Best of luck with your own implementation of this!

svlandeg Aug 6, 2021

Aww, thanks for your kind words! 😊

Definitely report back with more findings! The current implementation of this REL component is pretty basic/minimal, and we do hope to extend upon it in the future, so it's always interesting to hear how people are using/extending it and what the experiences are like in various use-cases.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Relation extraction results identical while varying dropout #8891

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Relation extraction results identical while varying dropout #8891

Uh oh!

Uh oh!

mlawson35 Aug 5, 2021

Replies: 1 comment · 5 replies

Uh oh!

MatthiasMurray Aug 5, 2021

Uh oh!

mlawson35 Aug 6, 2021 Author

Uh oh!

MatthiasMurray Aug 6, 2021

Uh oh!

svlandeg Aug 6, 2021

Uh oh!

mlawson35 Aug 6, 2021 Author

Uh oh!

svlandeg Aug 6, 2021

mlawson35
Aug 5, 2021

Replies: 1 comment 5 replies

MatthiasMurray
Aug 5, 2021

mlawson35 Aug 6, 2021
Author

mlawson35 Aug 6, 2021
Author