-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathFrom scratch.txt
More file actions
45 lines (23 loc) · 1.25 KB
/
From scratch.txt
File metadata and controls
45 lines (23 loc) · 1.25 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
1. Download the dataset with the command:
curl -O http://dblp.uni-trier.de/xml/dblp.xml.gz
2. Put the downloaded dataset into dblpdataset folder
mv dblp.xml.gz dbldataset/
3. Uncompress dataset
gzip -d dblp.xml.gz
5. Installing libraries to use
sudo pip install -r data-crawler/requirements.txt
sudo pip install -r preprocessing/requirements.txt
4. Generate symbolic links in dblpdataset and preprocessing
cd dblpdataset
mkdir v<N> #Donde N es el número de version
ln -s v<N> current_version #Donde N es el número de verison
mkdir current_version/minimal
cd ../preprocessing/Pablo:PlotTokenFrecuencies
ln -s ../../dblpdataset/current_version/ data
4. Generate the csv dataset with the command inside of data-crawler/example.sh
5. Minizming dataset
(head -10001 dblpdataset/current_version/DBLP.csv) > dblpdataset/current_version/minimal/DBLP.min10000.csv
python data-crawler/generateMinimalMapping.py -d dblpdataset/current_version/minimal/DBLP.min10000.csv -m dblpdataset/current_version/MapIdsToDBLP.csv -o
dblpdataset/current_version/minimal/MapIdsToDBLP.min10000.csv --verbose
5. Generate the csv tokens frequency with the command inside of preprocessing/example.sh
6. Visualize term-frequency charts in: http://khrno.cl/Plot