Some questions on best practices for NER #7949
Replies: 1 comment 2 replies
-
Hey, you have a lot of questions here, so I'm going to address just a few points. I also suggest you read these slides.
We need more information about the kinds of variations in format you expect to see to give advice on this. In general I do not expect that tagging the number would allow you to get meaningful labels. Things like "number of shares sold" and "number of shares purchased" also aren't really good named entities. If your documents are always formatted like
I would encourage you to consider Prodigy.
Get a working system before you worry about hyperparameters. Tuning a model is one of the last things you do to get a little more improvement out of a running system. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Gurus,
I am just starting off with NLP and have been getting my feet wet over the last weeks with spacy and transformers. Prior to this, I have completed the Deep Learning specialization course on Coursera by Andrew Ng; so I do have some background. Apart from this, I have done other ML stuff but not in NLP.
Our goal: To do information extraction from financial documents
Questions:
The financial documents have statements like "Number of shares sold: 4500" or "Number of shares purchased: 300". Here we are interested in extracting 2 pieces of information i.e. Extract that this document contains "Number of shares sold" and the actual number sold is 4500. For this, we are thinking one of the two options:
Any advice on which approach is better or am I getting it completely wrong?
If I train with multiple documents and there is a totally new document with a different heading like let's "Stocks sold" or "Stocks bought", ideally the NER should be able to correctly identify these differences right?
When I refer to the best practices that Andrew Ng talked about in the "Improving Deep NN and Hyperparameter tuning" course, I recall that the following hyper parameters can be tuned
I trust this is the 200 that controls this? And E is Epochs? Not sure how in 200 mini batches there are 21 epochs?? Is this because I do not have more than 200 sentences in each training example?
I did read this, but could not figure out the above.
Any help would be greatly appreciated!
Thanks,
Satya
Beta Was this translation helpful? Give feedback.
All reactions