Skip to content
Discussion options

You must be logged in to vote

Hey,
Thanks for your question. Here are my thoughts on your three use cases:

  1. Unfortunately, spaCy isn't capable of reading in PDFs yet.
  2. Sure, you can use the pretrained models to extract the DATE entities and check their formatting. But I think using regex to find all date formats might be more helpful.
  3. Finding and extracting company names is a great use case for spaCy. You can use the pretrained NER models to extract company names and check if their names are different. You can also fine-tune models with your own data, or even train a model from scratch. You can find more information here in our docs.

I hope this was helpful!

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by thomashacker
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage
2 participants