Skip to content
Discussion options

You must be logged in to vote

What context do you want to predict malware names in?

spaCy's NER is designed to detect things like the names or people, organizations, or places in newspaper articles. So the main features are token features (parts of the actual tokens to label, like being upper case, or using words like "John" or "LLC") or context features ("Today [XXX] said...", "Recently [XXX] was acquried by [YYY] ..."). This general model of entities is effective in many contexts, but not for the samples you've given here.

If you have just a list of names and want to classify malware, or if you have just a URL, those are going to be single tokens, and spaCy doesn't really have enough information to make a reasonable…

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@polm
Comment options

@polm
Comment options

@MagiCsito
Comment options

@polm
Comment options

@MagiCsito
Comment options

Answer selected by adrianeboyd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer perf / accuracy Performance: accuracy
2 participants