Cosine similarity: Top pca component vs she-he

Hello Debiaswe Research Team, Thank you for making the code related to your paper available. This is very helpful! I am writing to seek clarification on analyzing gender bias in word vectors associated with professions. 

In your paper, you suggest using cosine similarity between a given profession vector and the top PCA component. I am trying to replicate the same in the wiki context. Unfortunately, I am getting results opposite to expected. 

For example, when I compute the cosine similarity between the _waitress_ vector  (or _nurse_ vector) and the top gender principal component, I get a -ve score. However, when I compute the cosine similarity between the same profession vector and _she_ - _he_ vector (as you show in the example [here](https://github.com/tolga-b/debiaswe/blob/master/tutorial_example1.ipynb)), I get a +ve score. 

I am confused about why the sign flips when using PCA and straightforward gender vector. I request your help.

Thank you!
sbs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cosine similarity: Top pca component vs she-he #7

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cosine similarity: Top pca component vs she-he #7

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions