11<div align =" center " >
22
3- <img src =" https://raw.githubusercontent.com/fairdataihub/poster2json/main/logo.svg " alt =" logo " width =" 200 " height =" auto " />
3+ <img src =" https://cdn.posters.science/logos/poster-fairy.png " alt =" logo " width =" 200 " height =" auto " />
44
55<br />
66
@@ -58,6 +58,7 @@ Convert scientific posters (PDF/images) to structured JSON metadata using Large
5858** poster2json** extracts structured metadata from scientific conference posters (PDF or image format) into machine-actionable JSON conforming to the [ poster-json-schema] ( https://github.com/fairdataihub/poster-json-schema ) .
5959
6060The pipeline uses:
61+
6162- [ ** Llama-3.1-8B-Poster-Extraction** ] ( https://huggingface.co/jimnoneill/Llama-3.1-8B-Poster-Extraction ) for JSON structuring
6263- ** Qwen2-VL-7B** for vision-based OCR of image posters
6364- ** pdfalto** for layout-aware PDF text extraction
@@ -104,49 +105,56 @@ Output conforms to the [poster-json-schema](https://github.com/fairdataihub/post
104105{
105106 "$schema" : " https://posters.science/schema/v0.1/poster_schema.json" ,
106107 "creators" : [
107- {"name" : " Garcia, Sofia" , "givenName" : " Sofia" , "familyName" : " Garcia" , "affiliation" : [" University" ]}
108+ {
109+ "name" : " Garcia, Sofia" ,
110+ "givenName" : " Sofia" ,
111+ "familyName" : " Garcia" ,
112+ "affiliation" : [" University" ]
113+ }
114+ ],
115+ "titles" : [
116+ { "title" : " Machine Learning Approaches to Diabetic Retinopathy Detection" }
108117 ],
109- "titles" : [{"title" : " Machine Learning Approaches to Diabetic Retinopathy Detection" }],
110118 "posterContent" : {
111119 "sections" : [
112- {"sectionTitle" : " Abstract" , "sectionContent" : " ..." },
113- {"sectionTitle" : " Methods" , "sectionContent" : " ..." },
114- {"sectionTitle" : " Results" , "sectionContent" : " ..." }
120+ { "sectionTitle" : " Abstract" , "sectionContent" : " ..." },
121+ { "sectionTitle" : " Methods" , "sectionContent" : " ..." },
122+ { "sectionTitle" : " Results" , "sectionContent" : " ..." }
115123 ]
116124 },
117- "imageCaptions" : [{"captions" : [" Figure 1." , " ROC curves showing..." ]}],
118- "tableCaptions" : [{"captions" : [" Table 1." , " Performance metrics" ]}]
125+ "imageCaptions" : [{ "captions" : [" Figure 1." , " ROC curves showing..." ] }],
126+ "tableCaptions" : [{ "captions" : [" Table 1." , " Performance metrics" ] }]
119127}
120128```
121129
122130## System Requirements
123131
124- | Requirement | Specification |
125- | ------------- | ---------------|
126- | GPU | NVIDIA CUDA-capable, ≥16GB VRAM |
127- | RAM | ≥32GB recommended |
128- | Python | 3.10+ |
129- | OS | Linux, macOS, Windows (via WSL2) |
132+ | Requirement | Specification |
133+ | ----------- | -- ------------------------------ |
134+ | GPU | NVIDIA CUDA-capable, ≥16GB VRAM |
135+ | RAM | ≥32GB recommended |
136+ | Python | 3.10+ |
137+ | OS | Linux, macOS, Windows (via WSL2) |
130138
131139## Performance
132140
133141Validated on 10 manually annotated scientific posters:
134142
135- | Metric | Score | Threshold |
136- | --------| -------| -----------|
137- | Word Capture | 0.96 | ≥0.75 |
138- | ROUGE-L | 0.89 | ≥0.75 |
139- | Number Capture | 0.93 | ≥0.75 |
140- | Field Proportion | 0.99 | 0.30–2.50 |
143+ | Metric | Score | Threshold |
144+ | ---------------- | ----- | --------- |
145+ | Word Capture | 0.96 | ≥0.75 |
146+ | ROUGE-L | 0.89 | ≥0.75 |
147+ | Number Capture | 0.93 | ≥0.75 |
148+ | Field Proportion | 0.99 | 0.30–2.50 |
141149
142150** Pass Rate** : 10/10 (100%)
143151
144152## Documentation
145153
146- | Document | Description |
147- | ----------| -------------|
154+ | Document | Description |
155+ | ------------------------------------ | ------------------------------- |
148156| [ Architecture] ( docs/architecture.md ) | Technical details & methodology |
149- | [ Evaluation] ( docs/evaluation.md ) | Validation metrics & results |
157+ | [ Evaluation] ( docs/evaluation.md ) | Validation metrics & results |
150158
151159## Development Setup
152160
@@ -155,15 +163,32 @@ Validated on 10 manually annotated scientific posters:
155163git clone https://github.com/fairdataihub/poster2json.git
156164cd poster2json
157165
158- # Install with Poetry
166+ # Create a virtual environment
167+ python -m venv .venv
168+
169+ # Activate the virtual environment
170+ source venv/bin/activate
171+ .venv\S cripts\a ctivate # On Windows
172+
173+ # Install poetry
159174pip install poetry
175+
176+ # Install dependencies
160177poetry install
161178
162179# Run tests
163- poetry run pytest
180+ poe test
164181
165182# Format code
166- poetry run poe format
183+ poe format
184+ ```
185+
186+ If you are on windows and have multiple python versions, you can use the following commands:
187+
188+ ``` bash
189+ py -0p # list all python versions
190+
191+ py -3.12 -m venv .venv
167192```
168193
169194## License
@@ -175,7 +200,7 @@ MIT License - see [LICENSE](LICENSE.md) for details.
175200``` bibtex
176201@software{poster2json2026,
177202 title = {poster2json: Scientific Poster to JSON Metadata Extraction},
178- author = {O'Neill, James and Soundarajan, Sanjay and Patel, Bhavesh},
203+ author = {O'Neill, James and Soundarajan, Sanjay and Portillo, Dorian and Patel, Bhavesh},
179204 year = {2026},
180205 url = {https://github.com/fairdataihub/poster2json},
181206 doi = {10.5281/zenodo.18320010}
0 commit comments