File tree Expand file tree Collapse file tree 1 file changed +21
-12
lines changed Expand file tree Collapse file tree 1 file changed +21
-12
lines changed Original file line number Diff line number Diff line change 1
- Scripts for data preparation.
1
+ # Scripts for extracting tables
2
2
3
3
Dependencies:
4
4
* [ jq] ( https://stedolan.github.io/jq/ ) (` sudo apt install jq ` )
5
- * docker
5
+ * docker (run without ` sudo ` )
6
+ * [ conda] ( https://www.anaconda.com/distribution/ )
6
7
7
8
Directory structure:
8
9
```
9
10
.
10
- ├── data
11
- │ ├── annotations
12
- │ │ └── evaluation-tables.json.gz # current annotations
13
- │ └── arxiv
14
- │ ├── sources # gzip archives with e-prints
15
- │ ├── unpacked\_sources # automatically extracted latex sources
16
- │ └── htmls # automatically generated htmls
17
- └── prepare-data
11
+ └── data
12
+ ├── annotations
13
+ │ └── evaluation-tables.json.gz # current annotations
14
+ └── arxiv
15
+ ├── sources # gzip archives with e-prints
16
+ ├── unpacked\_sources # automatically extracted latex sources
17
+ ├── htmls # automatically generated htmls
18
+ ├── htmls-clean # htmls fixed by chromium
19
+ └── tables # extracted tables
18
20
```
19
21
20
22
21
23
To preprocess data and extract tables, run:
22
- ``` cd prepare-data
24
+ ```
23
25
conda env create -f environment.yml
24
26
source activate xtables
25
- make -j 8 -i extract_all```
27
+ make -j 8 -i extract_all > stdout.log 2> stderr.log
28
+ ```
26
29
where ` 8 ` is number of jobs to run simultaneously.
30
+
31
+ ## Test
32
+ To test the whole extraction on a single file run
33
+ ```
34
+ make test
35
+ ```
You can’t perform that action at this time.
0 commit comments