Skip to content
Discussion options

You must be logged in to vote

The parser is also learning where to split sentences, and I think what's going on is that if you remove the . characters, you're removing a really strong clue about where to put sentence boundaries, so you end up with a lot of longer or shorter parses and more errors.

Instead of removing ., I'd recommend splitting . into a separate token and attaching it with punct to the previous word.

punct and p relations are ignored by the scorer by default, but you can configure that with a custom scorer if you like.

Replies: 1 comment 11 replies

Comment options

You must be logged in to vote
11 replies
@kanayer
Comment options

@adrianeboyd
Comment options

@kanayer
Comment options

@adrianeboyd
Comment options

@kanayer
Comment options

Answer selected by kanayer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / ko Korean language data and models feat / parser Feature: Dependency Parser feat / scorer Feature: Scorer
2 participants