ArtifexSoftware
diff --git a/‎.github/workflows/doc.yml‎
Lines changed: 40 additions & 0 deletions b/‎.github/workflows/doc.yml‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 1 deletion b/‎.gitignore‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎MANIFEST.in‎
Lines changed: 3 additions & 1 deletion b/‎MANIFEST.in‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎Makefile‎
Lines changed: 36 additions & 0 deletions b/‎Makefile‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 8 additions & 147 deletions b/‎README.md‎
Lines changed: 8 additions & 147 deletions
diff --git a/‎doc/Makefile‎
Lines changed: 18 additions & 0 deletions b/‎doc/Makefile‎
Lines changed: 18 additions & 0 deletions
diff --git a/‎doc/conf.py‎
Lines changed: 70 additions & 0 deletions b/‎doc/conf.py‎
Lines changed: 70 additions & 0 deletions
diff --git a/‎doc/index.rst‎
Lines changed: 21 additions & 0 deletions b/‎doc/index.rst‎
Lines changed: 21 additions & 0 deletions
@@ -0,0 +1,40 @@
+name: pdf2docx-doc
+
+on:
+  push:
+    tags:
+      - 'v[0-9]+.[0-9]+.[0-9]+'
+
+jobs:
+  publish_doc:
+    runs-on: ubuntu-18.04
+    steps:
+      - name: Check out code
+        uses: actions/checkout@v2
+
+      - name: Set up Python 3.x
+        uses: actions/setup-python@v1
+        with:
+          python-version: '3.x'
+
+      - name: Display Python version
+        run: python -c "import sys; print(sys.version)"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install sphinx sphinx_rtd_theme
+          pip install -r requirements.txt
+          python setup.py develop
+
+      # build package for tags, e.g. 3.2.1 extracted from 'refs/tags/v3.2.1'
+      - name: Create html doc
+        run: |
+          echo ${GITHUB_REF#refs/tags/v} > version.txt
+          make doc
+
+      - name: Deploy
+        uses: peaceiris/actions-gh-pages@v3
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: ./build/html
@@ -4,6 +4,7 @@
 *.txt
 *.docx
 layout.json
+.vscode/
 
 # pdf testing files
 *.pdf
@@ -16,4 +17,5 @@ feature-*/
 # building dir
 build/
 dist/
-*egg-info/
+*egg-info/
+pdf2docx*.rst
@@ -1,4 +1,6 @@
 include *.md
 include LICENSE*
 include requirements.txt
-recursive-include test *.py *.pdf
+prune test
+include test/*.py
+include test/samples/*.pdf
@@ -0,0 +1,36 @@
+# Project makefile
+
+# working directories and files
+#
+TOPDIR		:=$(shell pwd)
+SRC			:=$(TOPDIR)/pdf2docx
+BUILD		:=$(TOPDIR)/build
+DOCSRC		:=$(TOPDIR)/doc
+TEST		:=$(TOPDIR)/test
+CLEANDIRS	:=.pytest_cache pdf2docx.egg-info dist
+
+# pip install sphinx_rtd_theme
+
+.PHONY: src doc test clean
+
+src:
+	@python setup.py sdist --formats=gztar,zip && \
+	python setup.py bdist_wheel
+
+doc:
+	@if [ -f "$(DOCSRC)/Makefile" ] ; then \
+	    ( cd "$(DOCSRC)" && make html MODULEDIR="$(SRC)" BUILDDIR="$(BUILD)" ) || exit 1 ; \
+	fi
+
+test:
+	@pytest -v "$(TEST)/test.py" --cov="$(SRC)" --cov-report=xml
+
+clean:
+	@if [ -e "$(DOCSRC)/Makefile" ] ; then \
+	    ( cd "$(DOCSRC)" && make $@ BUILDDIR="$(BUILD)" ) || exit 1 ; \
+	fi
+	@for p in $(CLEANDIRS) ; do \
+	    if [ -d "$(TOPDIR)/$$p" ];  then rm -rf "$(TOPDIR)/$$p" ; fi ; \
+	done
+	@if [ -d "$(BUILD)" ];  then rm -rf "$(BUILD)" ; fi
+	@if [ -d "$(DOCTARGET)" ];  then rm -rf "$(DOCTARGET)" ; fi
@@ -44,153 +44,14 @@
     - no word transformation, e.g. rotation
 
 
-## Installation
-
-### From Pypi
-
-```
-$ pip install pdf2docx
-```
-
-### From source code
-
-Clone or download this project, and navigate to the root directory:
-
-```
-$ python setup.py install
-```
-
-Or install it in developing mode:
-
-```
-$ python setup.py develop
-```
-
-### Uninstall
-
-```
-$ pip uninstall pdf2docx
-```
-
-## Usage
-
-`pdf2docx` can be used as either CLI or a library.
-
-### Command Line Interface
-
-```
-$ pdf2docx --help
-
-NAME
-    pdf2docx - Command line interface for pdf2docx.
-
-SYNOPSIS
-    pdf2docx COMMAND | -
-
-DESCRIPTION
-    Command line interface for pdf2docx.
-
-COMMANDS
-    COMMAND is one of the following:
-
-     convert
-       Convert pdf file to docx file.
-
-     debug
-       Convert one PDF page and plot layout information for debugging.
-
-     table
-       Extract table content from pdf pages.
-```
-
-- By range of pages
-
-Specify pages range by `--start` (from the first page if omitted) and `--end` (to the last page if omitted). Note the page index is zero-based by default, but can turn it off by `--zero_based_index=False`, i.e. the first page index starts from 1.
-
-
-```bash
-$ pdf2docx convert test.pdf test.docx # all pages
-
-$ pdf2docx convert test.pdf test.docx --start=1 # from the second page to the end
-
-$ pdf2docx convert test.pdf test.docx --end=3 # from the first page to the third (index=2)
-
-$ pdf2docx convert test.pdf test.docx --start=1 --end=3 # the second and third pages
-
-$ pdf2docx convert test.pdf test.docx --start=1 --end=3 --zero_based_index=False # the first and second pages
-
-```
-
-- By page numbers
-
-```bash
-$ pdf2docx convert test.pdf test.docx --pages=0,2,4 # the first, third and 5th pages
-```
-
-- Multi-Processing
-
-```bash
-$ pdf2docx convert test.pdf test.docx --multi_processing=True # default count of CPU
-
-$ pdf2docx convert test.pdf test.docx --multi_processing=True --cpu_count=4
-```
-
-
-### Python Library
-
-We can use either the `Converter` class or a wrapped method `parse()`.
-
-- `Converter`
-
-```python
-from pdf2docx import Converter
-
-pdf_file = '/path/to/sample.pdf'
-docx_file = 'path/to/sample.docx'
-
-# convert pdf to docx
-cv = Converter(pdf_file)
-cv.convert(docx_file, start=0, end=None)
-cv.close()
-```
-
-
-- Wrapped method `parse()`
-
-```python
-from pdf2docx import parse
-
-pdf_file = '/path/to/sample.pdf'
-docx_file = 'path/to/sample.docx'
-
-# convert pdf to docx
-parse(pdf_file, docx_file, start=0, end=None)
-```
-
-Or just to extract tables,
-
-```python
-from pdf2docx import Converter
-
-pdf_file = '/path/to/sample.pdf'
-
-cv = Converter(pdf_file)
-tables = cv.extract_tables(start=0, end=1)
-cv.close()
-
-for table in tables:
-    print(table)
-
-# outputs
-...
-[['Input ', None, None, None, None, None], 
-['Description A ', 'mm ', '30.34 ', '35.30 ', '19.30 ', '80.21 '],
-['Description B ', '1.00 ', '5.95 ', '6.16 ', '16.48 ', '48.81 '],
-['Description C ', '1.00 ', '0.98 ', '0.94 ', '1.03 ', '0.32 '],
-['Description D ', 'kg ', '0.84 ', '0.53 ', '0.52 ', '0.33 '],
-['Description E ', '1.00 ', '0.15 ', None, None, None],
-['Description F ', '1.00 ', '0.86 ', '0.37 ', '0.78 ', '0.01 ']]
-```
+## Documentation
+
+- [Installation](https://dothinking.github.io/pdf2docx/installation.html)
+- [Quickstart](https://dothinking.github.io/pdf2docx/quickstart.html)
+    - [Convert PDF](https://dothinking.github.io/pdf2docx/quickstart.convert.html)
+    - [Extract table content](https://dothinking.github.io/pdf2docx/quickstart.table.html)
+    - [Command Line Interface](https://dothinking.github.io/pdf2docx/quickstart.cli.html)
+- [API Documentation](https://dothinking.github.io/pdf2docx/modules.html)
 
 ## Sample
 
 
@@ -0,0 +1,18 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# MODULEDIR and BUILDDIR are set in top makefile
+SOURCEDIR  = .
+TARGETDIR  = doctrees html
+
+.PHONY: html clean
+
+html: Makefile
+	@sphinx-apidoc --separate -o "$(SOURCEDIR)" "$(MODULEDIR)" && \
+	sphinx-build -M html "$(SOURCEDIR)" "$(BUILDDIR)"
+
+clean:
+	@for p in $(TARGETDIR) ; do \
+	    if [ -d "$(BUILDDIR)/$$p" ];  then rm -rf "$(BUILDDIR)/$$p" ; fi ; \
+	done
+	@if [ -e modules.rst ];  then rm pdf2docx*.rst ; fi
@@ -0,0 +1,70 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+import os
+import sys
+sys.path.insert(0, os.path.abspath("../pdf2docx/"))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'pdf2docx'
+copyright = '2021, dothinking'
+author = 'dothinking'
+
+# The full version, including alpha/beta/rc tags
+# read version number from version.txt, otherwise alpha version
+# Github CI can create version.txt dynamically.
+def get_version(fname):
+    if os.path.exists(fname):
+        with open(fname, 'r') as f:
+            version = f.readline().strip()
+    else:
+        version = 'alpha'
+
+    return version
+release = get_version('../version.txt')
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.autodoc',
+    'sphinx.ext.napoleon'
+]
+
+# Add any paths that contain templates here, relative to this directory.
+# templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = [    
+]
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+# html_theme = 'alabaster'
+html_theme = 'sphinx_rtd_theme'
+
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+# html_static_path = ['_static']
+
@@ -0,0 +1,21 @@
+Welcome to pdf2docx's documentation!
+====================================
+
+``pdf2docx`` is a Python library to parse PDF file with ``PyMuPDF`` and generate docx file with ``python-docx``.
+
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   installation
+   quickstart
+   modules
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`