Skip to content

Commit 631f9dd

Browse files
committed
Update readme
1 parent b64ba1d commit 631f9dd

File tree

1 file changed

+21
-12
lines changed

1 file changed

+21
-12
lines changed

README.md

Lines changed: 21 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,35 @@
1-
Scripts for data preparation.
1+
# Scripts for extracting tables
22

33
Dependencies:
44
* [jq](https://stedolan.github.io/jq/) (`sudo apt install jq`)
5-
* docker
5+
* docker (run without `sudo`)
6+
* [conda](https://www.anaconda.com/distribution/)
67

78
Directory structure:
89
```
910
.
10-
├── data
11-
│   ├── annotations
12-
│   │   └── evaluation-tables.json.gz # current annotations
13-
│   └── arxiv
14-
│   ├── sources # gzip archives with e-prints
15-
│   ├── unpacked\_sources # automatically extracted latex sources
16-
│   └── htmls # automatically generated htmls
17-
└── prepare-data
11+
└── data
12+
   ├── annotations
13+
   │   └── evaluation-tables.json.gz # current annotations
14+
   └── arxiv
15+
   ├── sources # gzip archives with e-prints
16+
   ├── unpacked\_sources # automatically extracted latex sources
17+
   ├── htmls # automatically generated htmls
18+
   ├── htmls-clean # htmls fixed by chromium
19+
   └── tables # extracted tables
1820
```
1921

2022

2123
To preprocess data and extract tables, run:
22-
```cd prepare-data
24+
```
2325
conda env create -f environment.yml
2426
source activate xtables
25-
make -j 8 -i extract_all```
27+
make -j 8 -i extract_all > stdout.log 2> stderr.log
28+
```
2629
where `8` is number of jobs to run simultaneously.
30+
31+
## Test
32+
To test the whole extraction on a single file run
33+
```
34+
make test
35+
```

0 commit comments

Comments
 (0)