Skip to content
This repository was archived by the owner on Nov 8, 2022. It is now read-only.

Commit ee12307

Browse files
Crosslingual Embeddings Refactored (#246)
* Refactored code * Changed as requested in refactor * Changed MD to RST, added function documentation, fixed paths * Style fixes * Added citations * Corrected Citations * Fixed citations again * Style fix * Refactored code * Changed as requested in refactor * Changed MD to RST, added function documentation, fixed paths * Style fixes * Added citations * Corrected Citations * Fixed citations again * Style fix * Updated crosslingual_emb.rst
1 parent 117c2dd commit ee12307

File tree

8 files changed

+1147
-0
lines changed

8 files changed

+1147
-0
lines changed

doc/source/assets/w2w.png

36.5 KB
Loading

doc/source/crosslingual_emb.rst

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
Unsupervised Crosslingual Embeddings
2+
####################################
3+
4+
Overview
5+
========
6+
This model uses a GAN to learn mapping between two language embeddings without supervision as demonstrated in Word Translation Without Parallel Data [1]_.
7+
8+
.. image:: assets/w2w.png
9+
10+
11+
Files
12+
=====
13+
- **nlp_architect/data/fasttext_emb.py**: Defines fasttext object for loading fasttext embeddings
14+
- **nlp_architect/models/crossling_emb.py**: Defines GAN for learning crosslingual embeddings
15+
- **examples/crosslingembs/train.py**: Trains the model and writes final crosslingual embeddings to weight_dir directory.
16+
- **examples/crosslingembs/evaluate.py**: Defines graph for evaluating the quality of crosslingual embeddings
17+
18+
Usage
19+
=====
20+
Main arguments which need to be passed to train.py are
21+
22+
- **emb_dir**: Directory where fasttext embeddings are present or need to be downloaded
23+
- **eval_dir**: Directory where evaluation dictionary is downloaded
24+
- **weight_dir**: Directory where final crosslingual dictionaries are defined
25+
26+
Use the following command to run training and generate crosslingual embeddings file:
27+
28+
.. code:: python
29+
30+
python train.py --data_dir <embedding dir> --eval_dir <evaluation data> \
31+
--weight_dir <save_data> --epochs 1
32+
33+
Example Usage
34+
---------------
35+
36+
Make directories for storing downloaded embeddings and multi language evaluation dictionaries
37+
38+
.. code:: bash
39+
40+
mkdir data
41+
mkdir ./data/crosslingual/dictionaries
42+
43+
Run training sequence pointing to embedding directory and multi language evaluation dictionaries. After training it will store the mapping weight and new cross lingual embeddings in weight_dir
44+
45+
.. code:: python
46+
47+
python train.py --data_dir ./data --eval_dir ./data/crosslingual/dictionaries --weight_dir ./
48+
49+
Results
50+
=======
51+
52+
When trained on English and French embeddings the results for word to word translation accuracy are as follows
53+
54+
.. csv-table::
55+
:header: "Eval Method ",K=1, K=10
56+
:widths: 25, 20, 20
57+
:escape: ~
58+
59+
NN,53.0,74.13
60+
CSLS,81.0, "93.0 "
61+
62+
63+
References
64+
==========
65+
.. [1] Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Herve Jegou Word Translation Without Parallel Data https://arxiv.org/pdf/1710.04087.pdf
66+
.. [2] P.Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information https://arxiv.org/abs/1607.04606

doc/source/index.rst

100644100755
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ The library contains state-of-art and novel NLP and NLU models in a varity of to
5757
- NER and NE expansion
5858
- Text chunking
5959
- Reading comprehension
60+
- Crosslingual Embeddings
6061
- Supervised sentiment analysis
6162

6263

@@ -118,6 +119,7 @@ on this project, please see the :doc:`developer guide <developer_guide>`.
118119
np2vec.rst
119120
supervised_sentiment.rst
120121
tcn.rst
122+
Unsupervised Crosslingual Embeddings <crosslingual_emb.rst>
121123

122124
.. toctree::
123125
:hidden:

examples/crosslingembs/README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# UNSUPERVISED CROSSLINGUAL EMBEDDINGS
2+
This model learns crosslingual embedding in an unsupervised manner using GANs as demonstrated in
3+
Word Translation Without Parallel Data by Alexis Conneau et al.,
4+
5+
6+
Use the following command to run training and generate crosslingual embeddings file
7+
8+
```python train.py --data_dir <embedding dir> --eval_dir <evaluation data> --weight_dir <save_data> --epochs 1```
9+
10+
Example Usage
11+
```mkdir data```
12+
```mkdir ./data/crosslingual/dictionaries```
13+
```python train.py --data_dir ./data --eval_dir ./data/crosslingual/dictionaries --weight_dir ./ ```
14+
15+
Citations
16+
---------
17+
1. Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Herve Jegou Word Translation Without Parallel Data https://arxiv.org/pdf/1710.04087.pdf
18+
2. P. Bojanowski, E. Grave, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information https://arxiv.org/abs/1607.04606
19+

0 commit comments

Comments
 (0)