Skip to content

Commit e32454a

Browse files
author
Marcin Kardas
committed
Update README
1 parent 3407f0d commit e32454a

File tree

3 files changed

+69
-33
lines changed

3 files changed

+69
-33
lines changed

README.md

Lines changed: 67 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -1,42 +1,78 @@
1-
# Scripts for extracting tables
1+
# AxCell: Automatic Extraction of Results from Machine Learning Papers
2+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/axcell-automatic-extraction-of-results-from/scientific-results-extraction-on-pwc)](https://paperswithcode.com/sota/scientific-results-extraction-on-pwc?p=axcell-automatic-extraction-of-results-from)
3+
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/axcell-automatic-extraction-of-results-from/scientific-results-extraction-on-nlp-tdms-exp)](https://paperswithcode.com/sota/scientific-results-extraction-on-nlp-tdms-exp?p=axcell-automatic-extraction-of-results-from)
24

3-
Dependencies:
4-
* [jq](https://stedolan.github.io/jq/) (`sudo apt install jq`)
5-
* docker (run without `sudo`)
6-
* [conda](https://www.anaconda.com/distribution/)
5+
This repository is the official implementation of [AxCell: Automatic Extraction of Results from Machine Learning Papers](https://arxiv.org/abs/2004.14356).
76

8-
Directory structure:
9-
```
10-
.
11-
└── data
12-
   ├── annotations
13-
   │   └── evaluation-tables.json.gz # current annotations
14-
   └── arxiv
15-
   ├── sources # gzip archives with e-prints
16-
   ├── unpacked\_sources # automatically extracted latex sources
17-
   ├── htmls # automatically generated htmls
18-
   ├── htmls-clean # htmls fixed by chromium
19-
   └── tables # extracted tables
7+
![pipeline](https://user-images.githubusercontent.com/13535078/81287158-33e01000-905a-11ea-8573-d716373efbdd.png)
8+
9+
```bibtex
10+
@inproceedings{axcell,
11+
title={AxCell: Automatic Extraction of Results from Machine Learning Papers},
12+
author={Marcin Kardas, Piotr Czapla, Pontus Stenetorp,
13+
Sebastian Ruder, Sebastian Riedel, Ross Taylor, Robert Stojnic},
14+
year={2020},
15+
booktitle={2004.14356}
16+
}
2017
```
2118

19+
## Requirements
2220

23-
To preprocess data and extract tables and texts, run:
24-
```
25-
make pull_images
21+
To create a [conda](https://www.anaconda.com/distribution/) environment named `axcell` and install requirements run:
22+
23+
```setup
2624
conda env create -f environment.yml
27-
source activate xtables
28-
make -j 8 -i extract_all > stdout.log 2> stderr.log
2925
```
30-
where `8` is number of jobs to run simultaneously. Optionally one can specify path to data directory, f.e., `make DATA_DIR=mydata ...`.
3126

32-
## Test
33-
To test the whole extraction on a single file run
34-
```
35-
make test
36-
```
27+
Additionally, `axcell` requires `docker` (that can be run without `sudo`). Run `scripts/pull_docker_images.sh` to download necessary images.
3728

38-
### Unit Tests
29+
## Datasets
30+
We publish the following datasets:
31+
* [ArxivPapers](https://github.com/paperswithcode/axcell/releases/download/v1.0/arxiv-papers.csv.xz)
32+
* [SegmentedTables & LinkedResults](https://github.com/paperswithcode/axcell/releases/download/v1.0/segmented-tables.json.xz)
33+
* [PWCLeaderboards](https://github.com/paperswithcode/axcell/releases/download/v1.0/pwc-leaderboards.json.xz)
3934

40-
```
41-
PYTHONPATH=. py.test
35+
See [datasets](notebooks/datasets.ipynb) notebook for an example of how to load the datasets provided below. The [extraction](notebooks/extraction.ipynb) notebook shows how to use `axcell` to extract text and tables from papers.
36+
## Training
37+
38+
39+
40+
## Evaluation
41+
42+
See the [evaluation](notebooks/evaluation.ipynb) notebook for the full example on how to evaluate AxCell on the PWCLeaderboards dataset.
43+
44+
## Pre-trained Models
45+
46+
You can download pretrained models here:
47+
48+
- [axcell](https://github.com/paperswithcode/axcell/releases/download/v1.0/models.tar.xz) — an archive containing the taxonomy, abbreviations, table type classifier and table segmentation model. See the [results-extraction](notebooks/results-extraction.ipynb) notebook for an example of how to load and run the models
49+
- [language model](https://github.com/paperswithcode/axcell/releases/download/v1.0/lm.xz) — [ULMFiT](https://arxiv.org/abs/1801.06146) language model pretrained on the ArxivPapers dataset
50+
51+
## Results
52+
53+
AxCell achieves the following performance:
54+
55+
###
56+
57+
58+
| Dataset | Macro F1 | Micro F1 |
59+
| ---------- |---------------- | -------------- |
60+
| [PWC Leaderboards](https://beta.paperswithcode.com/sota/scientific-results-extraction-on-pwc) | 21.1 | 28.7 |
61+
| [NLP-TDMS](https://beta.paperswithcode.com/sota/scientific-results-extraction-on-nlp-tdms-exp) | 19.7 | 25.8 |
62+
63+
64+
65+
## License
66+
67+
AxCell is released under the [Apache 2.0 license](LICENSE).
68+
69+
### Citation
70+
The pipeline is described in the following paper:
71+
```bibtex
72+
@inproceedings{axcell,
73+
title={AxCell: Automatic Extraction of Results from Machine Learning Papers},
74+
author={Marcin Kardas and Piotr Czapla and Pontus Stenetorp and Sebastian Ruder and Sebastian Riedel and Ross Taylor and Robert Stojnic},
75+
year={2020},
76+
booktitle={2004.14356}
77+
}
4278
```

notebooks/evaluation.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@
9696
}
9797
],
9898
"source": [
99-
"MODELS_URL = 'http://10.0.1.145:8001/static/axcell/models.tar.xz'\n",
99+
"MODELS_URL = V1_URL + 'models.tar.xz'\n",
100100
"MODELS_ARCHIVE = 'models.tar.xz'\n",
101101
"MODELS_PATH = Path('models')\n",
102102
"\n",

notebooks/results-extraction.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@
4343
"metadata": {},
4444
"outputs": [],
4545
"source": [
46-
"MODELS_URL = 'http://localhost:8001/static/axcell/models.tar.xz'\n",
46+
"MODELS_URL = 'https://github.com/paperswithcode/axcell/releases/download/v1.0/models.tar.xz'\n",
4747
"MODELS_ARCHIVE = 'models.tar.xz'\n",
4848
"MODELS_PATH = Path('models')\n",
4949
"\n",

0 commit comments

Comments
 (0)