Skip to content
Discussion options

You must be logged in to vote

debug data counts the labels for the projectivized, aligned trees, so if there are a lot of misaligned tokens or non-projective trees (-V also gives counts for this), the counts can look different. If you have a clear case where you think there's a bug in the counts, you can attach it here and we can double-check.

If each training doc contains only one sentence, then the parser does not learn to split sentences. Since a lot of training corpora provide annotation in sentences rather than longer documents, we recommend grouping them into paragraph-sized chunks for training. If you have the details for your training corpus it's probably even better if you can create real paragraphs rather th…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by kanayer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / cli Feature: Command-line interface
2 participants