Skip to content

Commit ec2881d

Browse files
BOBrownPrinceVictor
authored andcommitted
Create README.md
0 parents  commit ec2881d

File tree

10 files changed

+190
-0
lines changed

10 files changed

+190
-0
lines changed

.gitignore

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
dist/
2+
build/
3+
**.egg-info/
4+
**__pycache__/
5+
ckpts/
6+
**version.py
7+

README.md

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
<div align="center">
2+
<h1>StructEqTable-Deploy: A High-efficiency Open-source Toolkit for Table-to-Latex Transformation</h1>
3+
4+
5+
[[ Related Paper ]](https://arxiv.org/abs/2406.11633) [[ Website ]](https://unimodal4reasoning.github.io/DocGenome_page/) [[ Dataset (Google Drive)]](https://drive.google.com/drive/folders/1OIhnuQdIjuSSDc_QL2nP4NwugVDgtItD) [[ Dataset (Hugging Face) ]](https://huggingface.co/datasets/U4R/DocGenome/tree/main)
6+
7+
[[Models 🤗(Hugging Face)]](https://huggingface.co/U4R/StructTable-base/tree/main)
8+
9+
10+
</div>
11+
12+
Welcome to the official repository of StructEqTable-Deploy, a solution that converts images of Table into LaTeX, powered by scalable data from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/).
13+
14+
15+
## Abstract
16+
Table is an effective way to represent structured data in scientific publications, financial statements, invoices, web pages, and many other scenarios. Extracting tabular data from a visual table image and performing the downstream reasoning tasks according to the extracted data is challenging, mainly due to that tables often present complicated column and row headers with spanning cell operation. To address these challenges, we present TableX, a large-scale multi-modal table benchmark extracted from [DocGenome benchmark](https://unimodal4reasoning.github.io/DocGenome_page/) for table pre-training, comprising more than 2 million high-quality Image-LaTeX pair data covering 156 disciplinary classes. Besides, benefiting from such large-scale data, we train an end-to-end model, StructEqTable, which provides the capability to precisely obtain the corresponding LaTeX description from a visual table image and perform multiple table-related reasoning tasks, including structural extraction and question answering, broadening its application scope and potential.
17+
18+
19+
## TODO
20+
21+
- [x] Release inference code and checkpoints of StructEqTable.
22+
- [ ] Support Chinese version of StructEqTable.
23+
- [ ] Improve the inference speed of StructEqTable.
24+
25+
26+
### Installation
27+
28+
``` bash
29+
conda create -n structeqtable python=3.9
30+
31+
conda activate structeqtable
32+
33+
pip install "git+https://github.com/UniModal4Reasoning/StructEqTable-Deploy.git"
34+
35+
```
36+
37+
## Demo
38+
- run the demo.py
39+
```shell script
40+
cd demo
41+
python demo.py \
42+
--image_path demo/demo.png \
43+
--ckpt_path ${CKPT_PATH}
44+
```
45+
46+
- Visualization Results
47+
- The input data are sampled from SciHub domain.
48+
49+
![](demo/demo_1.png)
50+
51+
![](demo/demo_2.png)
52+
53+
54+
## Acknowledgements
55+
- [UniMERNet](https://github.com/opendatalab/UniMERNet). A Universal Network for Real-World Mathematical Expression Recognition.
56+
- [DocGenome](https://github.com/UniModal4Reasoning/DocGenome). An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models.
57+
- [ChartVLM](https://github.com/UniModal4Reasoning/ChartVLM). A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning.
58+
- [Donut](https://huggingface.co/naver-clova-ix/donut-base). The UniMERNet's Transformer Encoder-Decoder are referenced from Donut.
59+
- [Nougat](https://github.com/facebookresearch/nougat). The tokenizer uses Nougat.
60+
61+
62+
## License
63+
[Apache License 2.0](LICENSE)
64+
65+
## Citation
66+
If you find our models / code / papers useful in your research, please consider giving ⭐ and citations 📝, thx :)
67+
```bibtex
68+
@article{xia2024docgenome,
69+
title={DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models},
70+
author={Xia, Renqiu and Mao, Song and Yan, Xiangchao and Zhou, Hongbin and Zhang, Bo and Peng, Haoyang and Pi, Jiahao and Fu, Daocheng and Wu, Wenjie and Ye, Hancheng and others},
71+
journal={arXiv preprint arXiv:2406.11633},
72+
year={2024}
73+
}
74+
```
75+
76+
## Contact Us
77+
If you encounter any issues or have questions, please feel free to contact us via [email protected].

demo/demo.png

33.6 KB
Loading

demo/demo.py

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
import torch
2+
import argparse
3+
4+
from PIL import Image
5+
from struct_eqtable import build_model
6+
7+
8+
def parse_config():
9+
parser = argparse.ArgumentParser(description='arg parser')
10+
parser.add_argument('--image_path', type=str, default='demo.png', help='data path for table image')
11+
parser.add_argument('--ckpt_path', type=str, default='', help='ckpt path for table model')
12+
args = parser.parse_args()
13+
return args
14+
15+
def main():
16+
args = parse_config()
17+
18+
# build model
19+
model = build_model(args.ckpt_path, max_new_tokens=4096, max_time=120)
20+
21+
# model inference
22+
raw_image = Image.open(args.image_path)
23+
with torch.no_grad():
24+
output = model(raw_image)
25+
26+
# show output latex code of table
27+
for i, latex_code in enumerate(output):
28+
print(f"Table {i}:\n{latex_code}")
29+
30+
if __name__ == '__main__':
31+
main()

demo/demo_1.png

1.44 MB
Loading

demo/demo_2.png

824 KB
Loading

requirements.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
torch
2+
transformers

setup.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
from setuptools import find_packages, setup
2+
3+
4+
def write_version_to_file(version, target_file):
5+
with open(target_file, 'w') as f:
6+
print('__version__ = "%s"' % version, file=f)
7+
8+
if __name__ == '__main__':
9+
version = '0.1.0'
10+
write_version_to_file(version, 'struct_eqtable/version.py')
11+
12+
setup(
13+
name='struct_eqtable',
14+
version=version,
15+
description='A High-efficiency Open-source Toolkit for Table-to-Latex Transformation',
16+
install_requires=[
17+
'torch',
18+
'transformers',
19+
],
20+
21+
author='Hongbin Zhou, Xiangchao Yan, Bo Zhang',
22+
author_email='[email protected]',
23+
license='Apache License 2.0',
24+
packages=find_packages(exclude=['demo']),
25+
)

struct_eqtable/__init__.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
from .model import StructTable
2+
3+
4+
def build_model(model_ckpt, **kwargs):
5+
6+
model = StructTable(model_ckpt, **kwargs)
7+
return model

struct_eqtable/model.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
import torch
2+
3+
from torch import nn
4+
from transformers import AutoModelForVision2Seq, AutoProcessor
5+
6+
7+
class StructTable(nn.Module):
8+
def __init__(self, model_path, max_new_tokens=2048, max_time=60):
9+
super().__init__()
10+
self.model_path = model_path
11+
self.max_new_tokens = max_new_tokens
12+
self.max_generate_time = max_time
13+
14+
# init model and image processor from ckpt path
15+
self.init_image_processor(model_path)
16+
self.init_model(model_path)
17+
18+
def init_model(self, model_path):
19+
self.model = AutoModelForVision2Seq.from_pretrained(model_path)
20+
self.model.eval()
21+
22+
def init_image_processor(self, image_processor_path):
23+
self.data_processor = AutoProcessor.from_pretrained(image_processor_path)
24+
25+
def forward(self, image):
26+
# process image to tokens
27+
image_tokens = self.data_processor.image_processor(
28+
images=image,
29+
return_tensors='pt',
30+
)
31+
32+
# generate text from image tokens
33+
model_output = self.model.generate(
34+
flattened_patches=image_tokens['flattened_patches'],
35+
attention_mask=image_tokens['attention_mask'],
36+
max_new_tokens=self.max_new_tokens,
37+
max_time=self.max_generate_time
38+
)
39+
latex_codes = self.data_processor.batch_decode(model_output, skip_special_tokens=True)
40+
41+
return latex_codes

0 commit comments

Comments
 (0)