You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Upgrade libraries and use KeyedVectors to load word vectors
* Use gensim native saved vectors instead
* Added tests and CI
* Updated with links to new word embeddings and some code cleaning
Neural ParsCit is a citation string parser which parses reference strings into its component tags such as Author, Journal, Location, Date, etc. Neural ParsCit uses Long Short Term Memory (LSTM), a deep learning model to parse the reference strings. This deep learning algorithm is chosen as it is designed to perform sequence-to-sequence labeling tasks such as ours. Input to the model are word embeddings which are vector representation of words. We provide word embeddings as well as character embeddings as input to the network.
4
6
5
7
@@ -15,14 +17,20 @@ source .venv/bin/activate
15
17
pip install -r requirements.txt
16
18
```
17
19
20
+
### Word Embeddings
21
+
22
+
The word embeddings does not come with this repository. You can obtain the [word embeddings](http://wing.comp.nus.edu.sg/~wing.nus/resources/NParsCit/vectors.tar.gz) and the [word frequency](http://wing.comp.nus.edu.sg/~wing.nus/resources/NParsCit/freq) from WING website.
23
+
24
+
You will need to extract the content of the word embedding archive (`vectors.tar.gz`) to the root directory for this repository by running `tar xfz vectors.tar.gz`.
@@ -50,10 +58,7 @@ There are many parameters you can tune (CRF, dropout rate, embedding dimension,
50
58
51
59
Input files for the training script have to follow the following format: each word of the citation string and its corresponding tag has to be on a separate line. All citation strings must be separated by a blank line.
52
60
53
-
54
-
If you want to use the word embeddings trained on ACM refrences, and the freq., please download from WING homepage: http://wing.comp.nus.edu.sg/?page_id=158 (currently not avaible due to space issue, mail animesh@comp.nus.edu.sg, animeshprasad3@gmail.com for a copy)
55
-
56
-
Details about the training data, experiments can be found in the following article. Traning data and CRF baseline can be downloaded from https://github.com/knmnyn/ParsCit. Please consider citing following piblication(s) if you use Neural ParsCit:
61
+
Details about the training data, experiments can be found in the following article. Training data and CRF baseline can be downloaded from https://github.com/knmnyn/ParsCit. Please consider citing following publication(s) if you use Neural ParsCit:
57
62
```
58
63
@article{animesh2018neuralparscit,
59
64
title={Neural ParsCit: A Deep Learning Based Reference String Parser},
0 commit comments