Skip to content

Commit 0d2ebf7

Browse files
Merge pull request #92 from AdemBoukhris457/release/v0.9.7
release: v0.9.7 - Add PaddleOCR VL parser installation documentation
2 parents 4549600 + c7321c3 commit 0d2ebf7

File tree

3 files changed

+42
-1
lines changed

3 files changed

+42
-1
lines changed

README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -376,6 +376,24 @@ parser = ChartTablePDFParser(
376376

377377
The `PaddleOCRVLPDFParser` uses PaddleOCRVL (Vision-Language Model) for end-to-end document parsing. It combines PaddleOCRVL's advanced document understanding capabilities with DocRes image restoration and split table merging, providing a comprehensive solution for complex document processing.
378378

379+
#### Installation Requirements
380+
381+
Before using `PaddleOCRVLPDFParser`, install the required dependencies:
382+
383+
```bash
384+
pip install -U "paddleocr[doc-parser]"
385+
```
386+
387+
**For Linux systems:**
388+
```bash
389+
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
390+
```
391+
392+
**For Windows systems:**
393+
```bash
394+
python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl
395+
```
396+
379397
#### Key Features:
380398
- **End-to-End Parsing**: Uses PaddleOCRVL for complete document understanding in a single pass
381399
- **Chart Recognition**: Automatically extracts and converts charts to structured table format

docs/user-guide/parsers/paddleocr-vl-parser.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,29 @@
22

33
Guide to using the `PaddleOCRVLPDFParser` for end-to-end document parsing with Vision-Language Model capabilities.
44

5+
## Installation Requirements
6+
7+
Before using the `PaddleOCRVLPDFParser`, you need to install the required dependencies:
8+
9+
```bash
10+
pip install -U "paddleocr[doc-parser]"
11+
```
12+
13+
Additionally, you need to install platform-specific safetensors wheels:
14+
15+
**For Linux systems:**
16+
```bash
17+
python -m pip install https://paddle-whl.bj.bcebos.com/nightly/cu126/safetensors/safetensors-0.6.2.dev0-cp38-abi3-linux_x86_64.whl
18+
```
19+
20+
**For Windows systems:**
21+
```bash
22+
python -m pip install https://xly-devops.cdn.bcebos.com/safetensors-nightly/safetensors-0.6.2.dev0-cp38-abi3-win_amd64.whl
23+
```
24+
25+
!!! warning "Required Before Use"
26+
These installation steps are **required** before using `PaddleOCRVLPDFParser`. Without them, you may encounter import errors.
27+
528
## Overview
629

730
The `PaddleOCRVLPDFParser` uses PaddleOCRVL (Vision-Language Model) for comprehensive document understanding. It combines PaddleOCRVL's advanced document parsing capabilities with DocRes image restoration and split table merging, providing a complete solution for complex document processing tasks.

doctra/version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
"""Version information for Doctra."""
2-
__version__ = '0.9.6'
2+
__version__ = '0.9.7'

0 commit comments

Comments
 (0)