Skip to content
Discussion options

You must be logged in to vote

It sounds like you should be able to do most of this with the rule-based matchers.

Does Spacy offer methods that interface directly with really high quality, built-in corpora, like for the application above?

No. Large, high-quality corpora tend to require licensing arrangements which prevent us from redistributing them. If you want to test assertions about language on a lot of unannotated text you have more options, like Wikipedia.

You might want to take a look at the SPIKE project from AllenAI, where the web demo integrates the query language with datasets like Wikipedia, US patents, etc.

Replies: 2 comments 5 replies

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Comment options

You must be logged in to vote
5 replies
@hmltn-0
Comment options

@polm
Comment options

@polm
Comment options

@polm
Comment options

@hmltn-0
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / matcher Feature: Token, phrase and dependency matcher
3 participants