Skip to content
This repository was archived by the owner on Apr 11, 2025. It is now read-only.

Commit e13a460

Browse files
authored
Merge branch 'main' into fix_version_for_cli
2 parents 2729b1f + 284c134 commit e13a460

27 files changed

+580
-526
lines changed

.github/workflows/release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ jobs:
5656
5757
- name: Publish package on PyPI
5858
if: steps.check-version.outputs.tag
59-
uses: pypa/gh-action-pypi-publish@v1.11.0
59+
uses: pypa/gh-action-pypi-publish@v1.12.2
6060
with:
6161
user: __token__
6262
password: ${{ secrets.FLIT_PASSWORD }}

HISTORY.md

Lines changed: 0 additions & 301 deletions
This file was deleted.

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,8 @@ Or follow the example below.
1919
You can check out the PDF used in this example [here](https://github.com/py-pdf/pypdf_table_extraction/blob/main/docs/_static/pdf/foo.pdf).
2020

2121
```python3
22-
>>> import camelot
23-
>>> tables = camelot.read_pdf('foo.pdf')
22+
>>> import pypdf_table_extraction
23+
>>> tables = pypdf_table_extraction.read_pdf('foo.pdf')
2424
>>> tables
2525
<TableList n=1>
2626
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, markdown, sqlite
@@ -50,6 +50,8 @@ pypdf_table_extraction also comes packaged with a [command-line interface](https
5050

5151
Refer to the [QuickStart Guide](https://github.com/py-pdf/pypdf_table_extraction/blob/main/docs/user/quickstart.rst#quickstart) to quickly get started with pypdf_table_extraction, extract tables from PDFs and explore some basic options.
5252

53+
**Tip:** Visit the `parser-comparison-notebook` to get an overview of all the packed parsers and their features. [![image](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/py-pdf/pypdf_table_extraction/blob/main/examples/parser-comparison-notebook.ipynb)
54+
5355
**Note:** pypdf_table_extraction only works with text-based PDFs and not scanned documents. (As Tabula [explains](https://github.com/tabulapdf/tabula#why-tabula), "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
5456

5557
You can check out some frequently asked questions [here](https://pypdf-table-extraction.readthedocs.io/en/latest/user/faq.html).
@@ -77,7 +79,7 @@ conda install -c conda-forge pypdf-table-extraction
7779
After [installing the dependencies](https://pypdf-table-extraction.readthedocs.io/en/latest/user/install-deps.html) ([tk](https://packages.ubuntu.com/bionic/python/python-tk) and [ghostscript](https://www.ghostscript.com/)), you can also just use pip to install pypdf_table_extraction:
7880

7981
```bash
80-
pip install pypdf-table-extraction[base]
82+
pip install pypdf-table-extraction
8183
```
8284

8385
### From the source code
@@ -91,8 +93,8 @@ git clone https://github.com/py-pdf/pypdf_table_extraction.git
9193
and install using pip:
9294

9395
```
94-
cd camelot
95-
pip install ".[base]"
96+
cd pypdf_table_extraction
97+
pip install "."
9698
```
9799

98100
## Documentation

camelot/cli.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,9 @@ def set_config(self, key, value):
3838

3939
@click.group(name="camelot")
4040
@click.version_option(version=__version__)
41-
@click.option("-q", "--quiet", is_flag=False, help="Suppress logs and warnings.")
41+
@click.option(
42+
"-q", "--quiet", is_flag=False, default=False, help="Suppress logs and warnings."
43+
)
4244
@click.option(
4345
"-p",
4446
"--pages",

camelot/core.py

Lines changed: 0 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@
2525

2626
from .backends import ImageConversionBackend
2727
from .utils import build_file_path_in_temp_dir
28-
from .utils import compute_whitespace
2928
from .utils import get_index_closest_point
3029
from .utils import get_textline_coords
3130

@@ -611,20 +610,6 @@ def parsing_report(self):
611610
}
612611
return report
613612

614-
def record_metadata(self, parser):
615-
"""Record data about the origin of the table."""
616-
self.flavor = parser.id
617-
self.filename = parser.filename
618-
self.debug_info = parser.debug_info
619-
if parser.copy_text is not None:
620-
self.copy_spanning_text(parser.copy_text)
621-
data = self.data
622-
self.df = pd.DataFrame(data)
623-
self.shape = self.df.shape
624-
625-
self.whitespace = compute_whitespace(data)
626-
self.pdf_size = (parser.pdf_width, parser.pdf_height)
627-
628613
def get_pdf_image(self):
629614
"""Compute pdf image and cache it."""
630615
if self._image is None:

0 commit comments

Comments
 (0)