Skip to content

Commit f30d7f7

Browse files
authored
ARROW-189 Update to match PyMongo configuration settings (#177)
* ARROW-189 Update to match PyMongo configuration settings * fixups * fix config * try again * try again * try again
1 parent ac2c5e7 commit f30d7f7

35 files changed

+355
-214
lines changed

.pre-commit-config.yaml

Lines changed: 55 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11

22
repos:
33
- repo: https://github.com/pre-commit/pre-commit-hooks
4-
rev: v4.1.0
4+
rev: v4.5.0
55
hooks:
66
- id: check-added-large-files
77
- id: check-case-conflict
88
- id: check-toml
99
- id: check-yaml
10+
exclude: template.yaml
1011
- id: debug-statements
1112
- id: end-of-file-fixer
1213
exclude: WHEEL
@@ -16,55 +17,80 @@ repos:
1617
exclude: .patch
1718
exclude_types: [json]
1819

19-
- repo: https://github.com/psf/black
20-
rev: 22.3.0
20+
- repo: https://github.com/astral-sh/ruff-pre-commit
21+
# Ruff version.
22+
rev: v0.1.3
2123
hooks:
22-
- id: black
23-
files: \.py$
24-
args: [--line-length=100]
24+
- id: ruff
25+
args: ["--fix", "--show-fixes"]
26+
- id: ruff-format
2527

26-
- repo: https://github.com/PyCQA/isort
27-
rev: 5.12.0
28+
- repo: https://github.com/adamchainz/blacken-docs
29+
rev: "1.16.0"
2830
hooks:
29-
- id: isort
30-
files: \.py$
31-
args: [--profile=black]
31+
- id: blacken-docs
32+
additional_dependencies:
33+
- black==22.3.0
3234

33-
- repo: https://github.com/PyCQA/flake8
34-
rev: 3.9.2
35+
- repo: https://github.com/pre-commit/pygrep-hooks
36+
rev: "v1.10.0"
3537
hooks:
36-
- id: flake8
37-
args: [--config=bindings/python/.flake8]
38-
types: [file]
39-
files: \.py$
40-
additional_dependencies: [
41-
'flake8-bugbear==20.1.4',
42-
'flake8-logging-format==0.6.0',
43-
'flake8-implicit-str-concat==0.2.0',
44-
]
38+
- id: rst-backticks
39+
- id: rst-directive-colons
40+
- id: rst-inline-touching-normal
4541

42+
- repo: https://github.com/rstcheck/rstcheck
43+
rev: v6.2.0
44+
hooks:
45+
- id: rstcheck
46+
additional_dependencies: [sphinx]
47+
args: ["--ignore-directives=doctest,testsetup,todo,automodule","--ignore-substitutions=release", "--report-level=error"]
4648

4749
# We use the Python version instead of the original version which seems to require Docker
4850
# https://github.com/koalaman/shellcheck-precommit
4951
- repo: https://github.com/shellcheck-py/shellcheck-py
50-
rev: v0.8.0.4
52+
rev: v0.9.0.6
5153
hooks:
5254
- id: shellcheck
5355
name: shellcheck
5456
args: ["--severity=warning"]
57+
stages: [manual]
58+
59+
- repo: https://github.com/PyCQA/doc8
60+
rev: v1.1.1
61+
hooks:
62+
- id: doc8
63+
args: ["--ignore=D001"] # ignore line length
64+
stages: [manual]
5565

5666
- repo: https://github.com/sirosen/check-jsonschema
57-
rev: 0.14.1
67+
rev: 0.27.0
5868
hooks:
5969
- id: check-jsonschema
6070
name: "Check GitHub Workflows"
6171
files: ^\.github/workflows/
6272
types: [yaml]
6373
args: ["--schemafile", "https://json.schemastore.org/github-workflow"]
74+
stages: [manual]
6475

65-
- repo: https://github.com/adamchainz/blacken-docs
66-
rev: "1.13.0"
76+
- repo: https://github.com/ariebovenberg/slotscheck
77+
rev: v0.17.0
6778
hooks:
68-
- id: blacken-docs
69-
additional_dependencies:
70-
- black==22.3.0
79+
- id: slotscheck
80+
files: \.py$
81+
exclude: "^(bindings/python/test|bindings/python)/"
82+
stages: [manual]
83+
args: ["--no-strict-imports"]
84+
85+
- repo: https://github.com/codespell-project/codespell
86+
rev: "v2.2.6"
87+
hooks:
88+
- id: codespell
89+
# Examples of errors or updates to justify the exceptions:
90+
# - test/test_on_demand_csfle.py:44: FLE ==> FILE
91+
# - test/test_bson.py:1043: fo ==> of, for, to, do, go
92+
# - test/bson_corpus/decimal128-4.json:98: Infinit ==> Infinite
93+
# - test/test_bson.py:267: isnt ==> isn't
94+
# - test/versioned-api/crud-api-version-1-strict.json:514: nin ==> inn, min, bin, nine
95+
# - test/test_client.py:188: te ==> the, be, we, to
96+
args: ["-L", "fle,fo,infinit,isnt,nin,te"]

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ df = production.invoices.find_pandas_all({'amount': {'$gt': 100.00}}, schema=inv
3131
```
3232

3333
Since PyMongoArrow can automatically infer the schema from the first batch of data, this can be
34-
further simplifed to:
34+
further simplified to:
3535

3636
```
3737
df = production.invoices.find_pandas_all({'amount': {'$gt': 100.00}})

bindings/python/.flake8

Lines changed: 0 additions & 15 deletions
This file was deleted.

bindings/python/RELEASE.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,13 +25,13 @@ Release Process
2525

2626
#. Check JIRA to ensure all the tickets in this version have been completed.
2727

28-
#. Add release notes to `doc/source/changelog.rst`. Generally just summarize/clarify
28+
#. Add release notes to ``doc/source/changelog.rst``. Generally just summarize/clarify
2929
the git log, but you might add some more long form notes for big changes.
3030

31-
#. Replace the `devN` version number w/ the new version number (see
31+
#. Replace the ``devN`` version number w/ the new version number (see
3232
note above in `Versioning`_). Make sure version number is updated in
33-
`pymongoarrow/version.py`. Commit the change and tag the release.
34-
Immediately bump the version number to `dev0` in a new commit::
33+
``pymongoarrow/version.py``. Commit the change and tag the release.
34+
Immediately bump the version number to ``dev0`` in a new commit::
3535

3636
$ # Bump to release version number
3737
$ git commit -a -m "BUMP <release version number>"

bindings/python/addtags.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
#!/usr/bin/env python3
21
# Dependencies:
32
# - auditwheel>=5,<6
43
# Requires AUDITWHEEL_PLAT to be set (e.g. manylinux2014_x86_64)
@@ -24,7 +23,8 @@ def repair_wheel(wheel_path, abi, wheel_dir):
2423

2524
def main(wheel_path, abi, wheel_dir):
2625
if not isfile(wheel_path):
27-
raise FileNotFoundError("cannot access wheel file %s" % (wheel_path,))
26+
msg = f"cannot access wheel file {wheel_path}"
27+
raise FileNotFoundError(msg)
2828

2929
if not exists(wheel_dir):
3030
os.makedirs(wheel_dir)
@@ -37,12 +37,12 @@ def main(wheel_path, abi, wheel_dir):
3737
if reqd_tag < get_priority_by_name(analyzed_tag):
3838
print(
3939
"Wheel is eligible for a higher priority tag. "
40-
"You requested %s but I have found this wheel is "
41-
"eligible for %s." % (abi, analyzed_tag)
40+
f"You requested {abi} but I have found this wheel is "
41+
f"eligible for {analyzed_tag}."
4242
)
4343
out_wheel = repair_wheel(wheel_path, analyzed_tag, wheel_dir)
4444

45-
print("Fixed-up wheel written to %s" % (out_wheel,))
45+
print(f"Fixed-up wheel written to {out_wheel}")
4646

4747

4848
if __name__ == "__main__":
@@ -51,4 +51,8 @@ def main(wheel_path, abi, wheel_dir):
5151
print(f"wheel path: {WHEEL_PATH}")
5252
print(f"target platform: {TARGET_PLATFORM}")
5353
print(f"wheel dir: {WHEEL_DIR}")
54-
main(wheel_path=abspath(WHEEL_PATH), abi=TARGET_PLATFORM, wheel_dir=abspath(WHEEL_DIR))
54+
main(
55+
wheel_path=abspath(WHEEL_PATH),
56+
abi=TARGET_PLATFORM,
57+
wheel_dir=abspath(WHEEL_DIR),
58+
)

bindings/python/benchmarks/benchmarks.py

Lines changed: 37 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,10 @@
1818

1919
import numpy as np
2020
import pandas as pd
21-
import pyarrow
21+
import pyarrow as pa
2222
import pymongo
2323
from bson import BSON, Binary, Decimal128
24+
2425
from pymongoarrow.api import (
2526
Schema,
2627
find_arrow_all,
@@ -31,7 +32,7 @@
3132
from pymongoarrow.types import BinaryType, Decimal128Type
3233

3334
N_DOCS = int(os.environ.get("N_DOCS"))
34-
assert pymongo.has_c()
35+
assert pymongo.has_c() # noqa: S101
3536
db = pymongo.MongoClient().pymongoarrow_test
3637

3738
LARGE_DOC_SIZE = 20
@@ -49,7 +50,11 @@ class Insert(ABC):
4950

5051
timeout = 100000 # The setup sometimes times out.
5152
number = 1
52-
repeat = (1, 10, 30.0) # Min repeat, max repeat, time limit (will stop sampling after this)
53+
repeat = (
54+
1,
55+
10,
56+
30.0,
57+
) # Min repeat, max repeat, time limit (will stop sampling after this)
5358
rounds = 1
5459

5560
@abc.abstractmethod
@@ -90,15 +95,19 @@ class Read(ABC):
9095

9196
timeout = 100000 # The setup sometimes times out.
9297
number = 3
93-
repeat = (1, 10, 30.0) # Min repeat, max repeat, time limit (will stop sampling after this)
98+
repeat = (
99+
1,
100+
10,
101+
30.0,
102+
) # Min repeat, max repeat, time limit (will stop sampling after this)
94103
rounds = 1
95104

96105
@abc.abstractmethod
97106
def setup(self):
98107
raise NotImplementedError
99108

100109
# We need this because the naive methods don't always convert nested objects.
101-
@staticmethod
110+
@staticmethod # noqa: B027
102111
def exercise_table(table):
103112
pass
104113

@@ -107,7 +116,10 @@ def time_conventional_ndarray(self):
107116
cursor = collection.find(projection={"_id": 0})
108117
dtype = self.dtypes
109118
if "Large" in type(self).__name__:
110-
np.array([tuple(doc[k] for k in self.large_doc_keys) for doc in cursor], dtype=dtype)
119+
np.array(
120+
[tuple(doc[k] for k in self.large_doc_keys) for doc in cursor],
121+
dtype=dtype,
122+
)
111123
else:
112124
np.array([(doc["x"], doc["y"]) for doc in cursor], dtype=dtype)
113125

@@ -132,7 +144,7 @@ def time_to_arrow(self):
132144
def time_conventional_arrow(self):
133145
c = db.benchmark
134146
f = list(c.find({}, projection={"_id": 0}))
135-
table = pyarrow.Table.from_pylist(f)
147+
table = pa.Table.from_pylist(f)
136148
self.exercise_table(table)
137149

138150
def peakmem_to_numpy(self):
@@ -154,17 +166,21 @@ def peakmem_conventional_arrow(self):
154166
class ProfileReadArray(Read):
155167
schema = Schema(
156168
{
157-
"x": pyarrow.int64(),
158-
"y": pyarrow.float64(),
159-
"emb": pyarrow.list_(pyarrow.float64()),
169+
"x": pa.int64(),
170+
"y": pa.float64(),
171+
"emb": pa.list_(pa.float64()),
160172
}
161173
)
162174

163175
def setup(self):
164176
coll = db.benchmark
165177
coll.drop()
166178
base_dict = dict(
167-
[("x", 1), ("y", math.pi), ("emb", [math.pi for _ in range(EMBEDDED_OBJECT_SIZE)])]
179+
[
180+
("x", 1),
181+
("y", math.pi),
182+
("emb", [math.pi for _ in range(EMBEDDED_OBJECT_SIZE)]),
183+
]
168184
)
169185
coll.insert_many([base_dict.copy() for _ in range(N_DOCS)])
170186
print(
@@ -176,7 +192,7 @@ def setup(self):
176192
@staticmethod
177193
def exercise_table(table):
178194
[
179-
[[n for n in i.values] if isinstance(i, pyarrow.ListScalar) else i for i in column]
195+
[[n for n in i.values] if isinstance(i, pa.ListScalar) else i for i in column]
180196
for column in table.columns
181197
]
182198

@@ -197,10 +213,10 @@ def time_conventional_pandas(self):
197213
class ProfileReadDocument(Read):
198214
schema = Schema(
199215
{
200-
"x": pyarrow.int64(),
201-
"y": pyarrow.float64(),
202-
"emb": pyarrow.struct(
203-
[pyarrow.field(f"a{i}", pyarrow.float64()) for i in range(EMBEDDED_OBJECT_SIZE)]
216+
"x": pa.int64(),
217+
"y": pa.float64(),
218+
"emb": pa.struct(
219+
[pa.field(f"a{i}", pa.float64()) for i in range(EMBEDDED_OBJECT_SIZE)]
204220
),
205221
}
206222
)
@@ -225,7 +241,7 @@ def setup(self):
225241
@staticmethod
226242
def exercise_table(table):
227243
[
228-
[[n for n in i.values()] if isinstance(i, pyarrow.StructScalar) else i for i in column]
244+
[[n for n in i.values()] if isinstance(i, pa.StructScalar) else i for i in column]
229245
for column in table.columns
230246
]
231247

@@ -244,7 +260,7 @@ def time_conventional_pandas(self):
244260

245261

246262
class ProfileReadSmall(Read):
247-
schema = Schema({"x": pyarrow.int64(), "y": pyarrow.float64()})
263+
schema = Schema({"x": pa.int64(), "y": pa.float64()})
248264
dtypes = np.dtype(np.dtype([("x", np.int64), ("y", np.float64)]))
249265

250266
def setup(self):
@@ -265,7 +281,7 @@ def setup(self):
265281

266282
class ProfileReadLarge(Read):
267283
large_doc_keys = [f"a{i}" for i in range(LARGE_DOC_SIZE)]
268-
schema = Schema({k: pyarrow.float64() for k in large_doc_keys})
284+
schema = Schema({k: pa.float64() for k in large_doc_keys})
269285
dtypes = np.dtype([(k, np.float64) for k in large_doc_keys])
270286

271287
def setup(self):
@@ -333,7 +349,7 @@ def time_insert_conventional(self):
333349

334350
class ProfileInsertSmall(Insert):
335351
large_doc_keys = [f"a{i}" for i in range(LARGE_DOC_SIZE)]
336-
schema = Schema({"x": pyarrow.int64(), "y": pyarrow.float64()})
352+
schema = Schema({"x": pa.int64(), "y": pa.float64()})
337353
dtypes = np.dtype([("x", np.int64), ("y", np.float64)])
338354

339355
def setup(self):
@@ -352,7 +368,7 @@ def setup(self):
352368

353369
class ProfileInsertLarge(Insert):
354370
large_doc_keys = [f"a{i}" for i in range(LARGE_DOC_SIZE)]
355-
schema = Schema({k: pyarrow.float64() for k in large_doc_keys})
371+
schema = Schema({k: pa.float64() for k in large_doc_keys})
356372
dtypes = np.dtype([(k, np.float64) for k in large_doc_keys])
357373

358374
def setup(self):

bindings/python/docs/source/changelog.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,13 +67,13 @@ Changes in Version 0.4.0
6767

6868
Changes in Version 0.3.0
6969
------------------------
70-
- Support for `PyArrow` 7.0.
70+
- Support for ``PyArrow`` 7.0.
7171
- Support for :class:`~bson.objectid.ObjectId` type.
7272
- Improve error message when schema contains an unsupported type.
7373
- Add support for BSON string type.
7474
- Add support for BSON boolean type.
7575
- Upgraded to bundle `libbson <http://mongoc.org/libbson/current/index.html>`_ 1.21.1. If installing from source, the minimum supported ``libbson`` version is now 1.21.0.
76-
- Dropped Python 3.6 support (it was dropped in `PyArrow` 7.0).
76+
- Dropped Python 3.6 support (it was dropped in ``PyArrow`` 7.0).
7777

7878
Changes in Version 0.2.0
7979
------------------------

0 commit comments

Comments
 (0)