Finding the similarity for web scraped data #9867
-
I am trying to find similarity of documents in my application. The document i am working on are the data scraped from relevant webpages. I have few doubts regarding applying in my application. Valid and invalid data is attached for a quick view
Sample data is attached as file. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
This is pretty hard. Technically you can train a classifier, but since there's an infinite number of things that are "not music" it's not really guaranteed to work. Maybe you can filter your incoming data using keywords like "concert"?
It would make sense for your model to be better, but you should try both and see since it shouldn't be difficult to do so.
You can just check if the data at the URL changed, right? That's the first thing I would check. In spaCy you could make an NER entity for change/cancel events I guess, but I would first see how effective a simple keyword check is. |
Beta Was this translation helpful? Give feedback.
This is pretty hard. Technically you can train a classifier, but since there's an infinite number of things that are "not music" it's not really guaranteed to work. Maybe you can filter your incoming data using keywords like "concert"?
It would make sense for your model to be…