RFE: vectors in different languages should be aligned #5888

asterbini · 2020-08-06T12:43:39Z

asterbini
Aug 6, 2020

Feature description

When working on different languages it would be very useful to compare word vectors to compare their "meaning".
Yet different models seems to use vector spaces that are not aligned, thus vectors for words with the same meaning are normally not similar.

It would be nice if:

all models released by Spacy are aligned, and thus similar-meaning lemmas have similar vectors.
or else all models have a transformation matrix that, applied to the current space, does transform it to a common vector space (e.g. 'en')

honnibal · 2020-08-07T14:01:25Z

honnibal
Aug 7, 2020
Maintainer

Agree that this would be nice but it would mean training the various models jointly, and it may impact the accuracy of specific models (especially models for smaller languages). It's also a big task to set up evaluation of the alignment.

If someone wants to look at this as a research project and they can get convincing results, we'd consider adopting it. But I think it seems like quite an ambitious project that won't yield results that are that compelling in practice.

Instead if your specific project requires this, you'll want to train your own vector models.

0 replies

asterbini · 2020-08-07T15:35:35Z

asterbini
Aug 7, 2020
Author

Right, I'll look for alignment algorithms or for how to produce the transformation matrices to align two models.
(https://arxiv.org/abs/1804.07745)

0 replies

asterbini · 2020-08-07T18:28:38Z

asterbini
Aug 7, 2020
Author

I have found the MUSE project from Facbook Research https://github.com/facebookresearch/MUSE

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFE: vectors in different languages should be aligned #5888

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

RFE: vectors in different languages should be aligned #5888

Uh oh!

asterbini Aug 6, 2020

Feature description

Replies: 3 comments

Uh oh!

honnibal Aug 7, 2020 Maintainer

Uh oh!

asterbini Aug 7, 2020 Author

Uh oh!

asterbini Aug 7, 2020 Author

asterbini
Aug 6, 2020

honnibal
Aug 7, 2020
Maintainer

asterbini
Aug 7, 2020
Author

asterbini
Aug 7, 2020
Author