Skip to content
Discussion options

You must be logged in to vote

Clustering data is basically its own research field, and there are many, many approaches. A simple one you can get started with is KNN.

There are techniques to generate labels for clusters but that kind of NLG is out of scope for spaCy.

About your approach more broadly here, it's not clear to me what you're trying to do or why this would be useful. Can you give a more concrete example? More practically speaking, while I can see how a human could do this, how would you say whether one solution was better or worse than another? If you can't tell the computer if it's doing a good job or not there's no real way to train a model for this.

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by svlandeg
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / vectors Feature: Word vectors and similarity
2 participants