|
44 | 44 | - no word transformation, e.g. rotation |
45 | 45 |
|
46 | 46 |
|
47 | | -## Installation |
48 | | - |
49 | | -### From Pypi |
50 | | - |
51 | | -``` |
52 | | -$ pip install pdf2docx |
53 | | -``` |
54 | | - |
55 | | -### From source code |
56 | | - |
57 | | -Clone or download this project, and navigate to the root directory: |
58 | | - |
59 | | -``` |
60 | | -$ python setup.py install |
61 | | -``` |
62 | | - |
63 | | -Or install it in developing mode: |
64 | | - |
65 | | -``` |
66 | | -$ python setup.py develop |
67 | | -``` |
68 | | - |
69 | | -### Uninstall |
70 | | - |
71 | | -``` |
72 | | -$ pip uninstall pdf2docx |
73 | | -``` |
74 | | - |
75 | | -## Usage |
76 | | - |
77 | | -`pdf2docx` can be used as either CLI or a library. |
78 | | - |
79 | | -### Command Line Interface |
80 | | - |
81 | | -``` |
82 | | -$ pdf2docx --help |
83 | | -
|
84 | | -NAME |
85 | | - pdf2docx - Command line interface for pdf2docx. |
86 | | -
|
87 | | -SYNOPSIS |
88 | | - pdf2docx COMMAND | - |
89 | | -
|
90 | | -DESCRIPTION |
91 | | - Command line interface for pdf2docx. |
92 | | -
|
93 | | -COMMANDS |
94 | | - COMMAND is one of the following: |
95 | | -
|
96 | | - convert |
97 | | - Convert pdf file to docx file. |
98 | | -
|
99 | | - debug |
100 | | - Convert one PDF page and plot layout information for debugging. |
101 | | -
|
102 | | - table |
103 | | - Extract table content from pdf pages. |
104 | | -``` |
105 | | - |
106 | | -- By range of pages |
107 | | - |
108 | | -Specify pages range by `--start` (from the first page if omitted) and `--end` (to the last page if omitted). Note the page index is zero-based by default, but can turn it off by `--zero_based_index=False`, i.e. the first page index starts from 1. |
109 | | - |
110 | | - |
111 | | -```bash |
112 | | -$ pdf2docx convert test.pdf test.docx # all pages |
113 | | - |
114 | | -$ pdf2docx convert test.pdf test.docx --start=1 # from the second page to the end |
115 | | - |
116 | | -$ pdf2docx convert test.pdf test.docx --end=3 # from the first page to the third (index=2) |
117 | | - |
118 | | -$ pdf2docx convert test.pdf test.docx --start=1 --end=3 # the second and third pages |
119 | | - |
120 | | -$ pdf2docx convert test.pdf test.docx --start=1 --end=3 --zero_based_index=False # the first and second pages |
121 | | - |
122 | | -``` |
123 | | - |
124 | | -- By page numbers |
125 | | - |
126 | | -```bash |
127 | | -$ pdf2docx convert test.pdf test.docx --pages=0,2,4 # the first, third and 5th pages |
128 | | -``` |
129 | | - |
130 | | -- Multi-Processing |
131 | | - |
132 | | -```bash |
133 | | -$ pdf2docx convert test.pdf test.docx --multi_processing=True # default count of CPU |
134 | | - |
135 | | -$ pdf2docx convert test.pdf test.docx --multi_processing=True --cpu_count=4 |
136 | | -``` |
137 | | - |
138 | | - |
139 | | -### Python Library |
140 | | - |
141 | | -We can use either the `Converter` class or a wrapped method `parse()`. |
142 | | - |
143 | | -- `Converter` |
144 | | - |
145 | | -```python |
146 | | -from pdf2docx import Converter |
147 | | - |
148 | | -pdf_file = '/path/to/sample.pdf' |
149 | | -docx_file = 'path/to/sample.docx' |
150 | | - |
151 | | -# convert pdf to docx |
152 | | -cv = Converter(pdf_file) |
153 | | -cv.convert(docx_file, start=0, end=None) |
154 | | -cv.close() |
155 | | -``` |
156 | | - |
157 | | - |
158 | | -- Wrapped method `parse()` |
159 | | - |
160 | | -```python |
161 | | -from pdf2docx import parse |
162 | | - |
163 | | -pdf_file = '/path/to/sample.pdf' |
164 | | -docx_file = 'path/to/sample.docx' |
165 | | - |
166 | | -# convert pdf to docx |
167 | | -parse(pdf_file, docx_file, start=0, end=None) |
168 | | -``` |
169 | | - |
170 | | -Or just to extract tables, |
171 | | - |
172 | | -```python |
173 | | -from pdf2docx import Converter |
174 | | - |
175 | | -pdf_file = '/path/to/sample.pdf' |
176 | | - |
177 | | -cv = Converter(pdf_file) |
178 | | -tables = cv.extract_tables(start=0, end=1) |
179 | | -cv.close() |
180 | | - |
181 | | -for table in tables: |
182 | | - print(table) |
183 | | - |
184 | | -# outputs |
185 | | -... |
186 | | -[['Input ', None, None, None, None, None], |
187 | | -['Description A ', 'mm ', '30.34 ', '35.30 ', '19.30 ', '80.21 '], |
188 | | -['Description B ', '1.00 ', '5.95 ', '6.16 ', '16.48 ', '48.81 '], |
189 | | -['Description C ', '1.00 ', '0.98 ', '0.94 ', '1.03 ', '0.32 '], |
190 | | -['Description D ', 'kg ', '0.84 ', '0.53 ', '0.52 ', '0.33 '], |
191 | | -['Description E ', '1.00 ', '0.15 ', None, None, None], |
192 | | -['Description F ', '1.00 ', '0.86 ', '0.37 ', '0.78 ', '0.01 ']] |
193 | | -``` |
| 47 | +## Documentation |
| 48 | + |
| 49 | +- [Installation](https://dothinking.github.io/pdf2docx/installation.html) |
| 50 | +- [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html) |
| 51 | + - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html) |
| 52 | + - [Extract table content](https://dothinking.github.io/pdf2docx/quickstart.table.html) |
| 53 | + - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html) |
| 54 | +- [API Documentation](https://dothinking.github.io/pdf2docx/modules.html) |
194 | 55 |
|
195 | 56 | ## Sample |
196 | 57 |
|
|
0 commit comments