Relation extraction results identical while varying dropout #8891
-
Hi everyone, I have had success in building on the great spaCy project implementation of relation extraction (https://github.com/explosion/projects/tree/v3/tutorials/rel_component) to create a working version for my specific project needs. However, when doing some tuning, I noticed that my model results are exactly the same regardless of choice of dropout rate (with replication seed set) even though they are sensitive to changing other parameters in the .cfg file. Does anyone know why this may be, from either a technical or theoretical standpoint? The relevant section of my config file is as follows:
Please let me know if I can provide any other information, and thanks in advance! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
I would also be interested in the answer to this. I have found it impacts my rel model training, but not on every data set. From a theoretical standpoint my understanding is that it is a randomization of how often to ignore a weight adjustment/optimization on back propagation. It is meant to help prevent overfitting, which is often due to over representation of certain kinds of examples, which all imply roughly the same information so they aren’t individually as useful. Therefore dropping out a fixed portion of steps makes it easier for less represented portions of the data with more surprising information to shine. I would guess that to better understand your specific case it would be useful to have some more details about your data, like how large of a data set it is, how uniform/variable it is, and so on. Also would be useful to know what dropout rates you’ve tried (I notice more of a difference at 0.5 than at 0.1) and how you have varied other parameters. |
Beta Was this translation helpful? Give feedback.
I would also be interested in the answer to this. I have found it impacts my rel model training, but not on every data set. From a theoretical standpoint my understanding is that it is a randomization of how often to ignore a weight adjustment/optimization on back propagation. It is meant to help prevent overfitting, which is often due to over representation of certain kinds of examples, which all imply roughly the same information so they aren’t individually as useful. Therefore dropping out a fixed portion of steps makes it easier for less represented portions of the data with more surprising information to shine.
I would guess that to better understand your specific case it would be us…