You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+22-4Lines changed: 22 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,8 +1,7 @@
1
1
# Discovering spelling variants on Urban Dictionary
2
+
Source code of the paper [How to Evaluate Word Representations of Informal Domain?](https://arxiv.org/abs/1911.04669)
2
3
3
-
4
-
Scraping Urban Dict through website and API :bamboo:
5
-
-------------
4
+
## Scraping data from [Urban Dictionary](https://www.urbandictionary.com/):bamboo:
6
5
7
6
* Scraping data from webpage:
8
7
```diff
@@ -13,5 +12,24 @@ Scraping Urban Dict through website and API :bamboo:
13
12
```diff
14
13
+ scrapy crawl UD_API
15
14
```
15
+
## Bootstrapping algorithms
16
+
`UD_Extractor/`
17
+
18
+
## self-training based CRF tagging
19
+
`SeqLabeling/`
20
+
21
+
## Embedding pretraining with Tweets
22
+
train Word2Vec, FastText, GloVe with tweets data.
23
+
`trainEmbedding/'
24
+
25
+
## Twitter hashtag prediction task using pretrained embedding
26
+
Employ Twitter hashtag prediction downstream task using above pretrained informal word vectors as the extrinsic evaluation.
27
+
`HashtagPrediction/`
28
+
29
+
## Analysis
30
+
Use Mean Average Precision (MAP) as the intrinsic evaluation rate on word analogy task. Compare the correlations beween the intrinsic and extrinsic tasks.
31
+
`calcSim`
32
+
33
+
## Web interface
34
+
informal word pair search tool, written in Flask: `demo/`
16
35
17
-
Source code of the paper [How to Evaluate Word Representations of Informal Domain?](https://arxiv.org/abs/1911.04669)
0 commit comments