Skip to content
Discussion options

You must be logged in to vote

Ah, I see what you mean now. If you print the sentence boundaries that the trained model detects, you will see that it splits each token in a sentence by itself:

>>> list(doc.sents)
[The, language, will, be, in, english]

The reason is that you are calling

optimizer = nlp.begin_training()

which will also reinitialize all models. As a result, the parser (which performs the sentence splitting), will predict the sentence boundaries using a zeroed-out softmax layer and will start detecting a boundary after every token. So, you should remove the line that calls begin_training. Then later when you update the pipe, you can remove the sgd parameter and the pipe will create an optimizer internally:

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
1 reply
@ananduaji
Comment options

Comment options

You must be logged in to vote
1 reply
@ananduaji
Comment options

Answer selected by ananduaji
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer
2 participants