There is no difference between 1 example and 150 exmples in training on a pretraind model. #8556

qingyun1988 · 2021-06-30T10:20:57Z

qingyun1988
Jun 30, 2021

I have asked a lot of questions about this. But, I still haven't solved this problem.

After trained on a pre-trained model by providing 1 example, one of the dependencies of the tested sentence had changed from "ccomp" to "dep", which was not as expected as I supposed.

So, I've provided more than 150 examples for it.

However, there is still the case.

It seems that the point of the problem is not relevant to the number of examples.

If the number of examples is not enough, Why is it that just 1 example is able to affect such large a pre-trained model?

Could it be that the "catastrophic forgetting" problem caused the issue?

Thank you very much!

Answered by polm

Jul 1, 2021

To keep things connected, here are your other issues on this topic: #8295, #8434, #8536, #8075, #7781, #7942.

As you note you have asked this repeatedly, and I'm sorry but I don't think we can give you more help than we already have on this.

In order to train the parser you need a lot of examples. There is not a specific number of examples that is enough. The models are also not completely predictable, and so while we can say that more data should be better, we can't explain why something happens with 1 sample but not 150. If you want results and you aren't getting them, you should probably try getting more training data. You could also try engaging help from an experienced consultant. As…

View full answer

polm · 2021-07-01T11:08:52Z

polm
Jul 1, 2021

To keep things connected, here are your other issues on this topic: #8295, #8434, #8536, #8075, #7781, #7942.

As you note you have asked this repeatedly, and I'm sorry but I don't think we can give you more help than we already have on this.

In order to train the parser you need a lot of examples. There is not a specific number of examples that is enough. The models are also not completely predictable, and so while we can say that more data should be better, we can't explain why something happens with 1 sample but not 150. If you want results and you aren't getting them, you should probably try getting more training data. You could also try engaging help from an experienced consultant. As you've found, it's difficult for us to give advice about specific models on the discussion forum, and I think you've reached the limit of what we can do here.

Also keep in mind that we're doing our best to help everyone with their questions and give advice, and it's really not that helpful to open so many different threads about the same topic.

1 reply

qingyun1988 Jul 1, 2021
Author

I'm very very very sorry that I've been bothering you over and over again. After all, you are not obligated to help me. And I thank you from the depth of my heart.

I need to explain that I've been working on trying to solve the problem, such as adding more examples. But it never works as I expect it to. I supposed that the number of examples is not enough for training. However, it is still the case after I add more examples.

So, at present, I don't know which is the case: the lack of examples, or the problem of "catastrophic forgetting"?

If it is case 1, I will add more examples for training. But, collecting and producing examples by hand are very difficult and inefficient works. Is there any way more easily and efficiently to do this?

If it is case 2, I've tried the nlp.rehearse() method, but it has occurred an error: The Tok2Vec listener did not receive any valid input from an upstream component.

I know there is another way to solve the issue of 'catastrophic forgetting' that is called 'Pseudo rehearsal' which is introduced in the article ' https://explosion.ai/blog/pseudo-rehearsal-catastrophic-forgetting'.

However, I am not very clear about this way.

Now, I have no idea how to solve the problem.

I hope I wouldn't disturb you and thank you very much.

qingyun1988 · 2021-07-02T13:01:26Z

qingyun1988
Jul 2, 2021
Author

There are two sentences:

Where do you think you are?
Who do you think you are?

They are almost identical in structure. The only difference between them is: in sentence 1, 'where' is an adverb; in sentence 2, 'who' is a pronoun.

However, using the pre-trained model 'en_core_web_sm' to analyze the two sentences didn't generate the dependencies I expected.

So, I tried to adjust their dependencies by resume-training the pre-trained model 'en_core_web_sm'.

For sentence 1, I provided just 1 example for training, the consequence of training was as same as I expected.

For sentence 2, I provided a lot of examples, however, there was nothing change. Even I add more and more examples(now, there are 161), It is still the case. It seems immutable that even if I keep adding more examples, it probably won't help.

Why are there two very different situations? Why are so many examples not helpful? After all, The model 'en_core_web_sm' is very small.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

There is no difference between 1 example and 150 exmples in training on a pretraind model. #8556

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

There is no difference between 1 example and 150 exmples in training on a pretraind model. #8556

Uh oh!

qingyun1988 Jun 30, 2021

Replies: 2 comments · 1 reply

Uh oh!

polm Jul 1, 2021

Uh oh!

qingyun1988 Jul 1, 2021 Author

Uh oh!

qingyun1988 Jul 2, 2021 Author

qingyun1988
Jun 30, 2021

Replies: 2 comments 1 reply

polm
Jul 1, 2021

qingyun1988 Jul 1, 2021
Author

qingyun1988
Jul 2, 2021
Author