Skip to content

Commit 72b8433

Browse files
Russell StewartRussell Stewart
authored andcommitted
Added unk vectors to wordvec downloads
1 parent f33bf82 commit 72b8433

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,10 @@ Pre-trained word vectors are made available under the <a href="http://opendataco
1515
and License</a>
1616
<div class="entry">
1717
<ul style="padding-left:0px; margin-top:0px; margin-bottom:0px">
18-
<li> <a href="http://dumps.wikimedia.org/enwiki/20140102/">Wikipedia 2014</a> + <a href="https://catalog.ldc.upenn.edu/LDC2011T07">Gigaword 5</a> (6B tokens, 400K vocab, uncased, 50d, 100d, 200d, &amp; 300d vectors, 822 MB download): <a href="http://nlp.stanford.edu/data/glove.6B.zip">glove.6B.zip</a> </li>
19-
<li> Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): <a href="http://nlp.stanford.edu/data/glove.42B.300d.zip">glove.42B.300d.zip</a> </li>
20-
<li> Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): <a href="http://nlp.stanford.edu/data/glove.840B.300d.zip">glove.840B.300d.zip</a> </li>
21-
<li> Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, &amp; 200d vectors, 1.42 GB download): <a href="http://nlp.stanford.edu/data/glove.twitter.27B.zip">glove.twitter.27B.zip</a> Ruby <a href="preprocess-twitter.rb">script</a> for preprocessing Twitter data </li>
18+
<li> <a href="http://dumps.wikimedia.org/enwiki/20140102/">Wikipedia 2014</a> + <a href="https://catalog.ldc.upenn.edu/LDC2011T07">Gigaword 5</a> (6B tokens, 400K vocab, uncased, 50d, 100d, 200d, &amp; 300d vectors, 822 MB download): <a href="http://nlp.stanford.edu/data/wordvecs/glove.6B.zip">glove.6B.zip</a> </li>
19+
<li> Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): <a href="http://nlp.stanford.edu/data/wordvecs/glove.42B.300d.zip">glove.42B.300d.zip</a> </li>
20+
<li> Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): <a href="http://nlp.stanford.edu/data/wordvecs/glove.840B.300d.zip">glove.840B.300d.zip</a> </li>
21+
<li> Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 25d, 50d, 100d, &amp; 200d vectors, 1.42 GB download): <a href="http://nlp.stanford.edu/data/wordvecs/glove.twitter.27B.zip">glove.twitter.27B.zip</a> Ruby <a href="preprocess-twitter.rb">script</a> for preprocessing Twitter data </li>
2222
</ul>
2323
</div>
2424

0 commit comments

Comments
 (0)