Skip to content

Commit 6e1e3a4

Browse files
authored
Merge pull request #443 from aperture-data/release-0.4.28
Release 0.4.28
2 parents 69550db + aca1a86 commit 6e1e3a4

File tree

7 files changed

+231
-74
lines changed

7 files changed

+231
-74
lines changed

README.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# ApertureDB Client Python Module
22

3-
This is the python sdk for building applications with [ApertureDB](https://docs.aperturedata.io/Introduction/WhatIsAperture).
3+
This is the Python SDK for building applications with [ApertureDB](https://docs.aperturedata.io/Introduction/WhatIsAperture).
44

5-
This comprises of utilities to get Data in and out of ApertureDB in an optimal manner.
6-
A quick [getting started guide](https://docs.aperturedata.io/HowToGuides/start/Setup) is useful to start building with this sdk.
5+
This comprises of utilities to get sata in and out of ApertureDB in an optimal manner.
6+
A quick [getting started guide](https://docs.aperturedata.io/HowToGuides/start/Setup) is useful to start building with this SDK.
77
For more concrete examples, please refer to:
88
* [Simple examples and concepts](https://docs.aperturedata.io/category/simple-usage-examples)
99
* [Advanced usage examples](https://docs.aperturedata.io/category/advanced-usage-examples)
1010

11-
# Installing in a custom virtual enviroment.
11+
# Installing in a custom virtual enviroment
1212
```bash
1313
pip install aperturedb[complete]
1414
```
@@ -21,7 +21,7 @@ pip install aperturedb
2121
A complete [reference](https://docs.aperturedata.io/category/aperturedb-python-sdk) of this SDK is available on the offical [ApertureDB Documentation](https://docs.aperturedata.io)
2222

2323

24-
# Dvelopment setup.
24+
# Development setup
2525
The recommended way is to clone this repo, and do an editable install as follows:
2626
```bash
2727
git clone https://github.com/aperture-data/aperturedb-python.git
@@ -30,8 +30,8 @@ pip install -e .[dev]
3030
```
3131

3232

33-
# Running tests.
34-
The tests are inside the test dir.
33+
# Running tests
34+
The tests are inside the `test` dir.
3535

3636
All the tests can be run with:
3737

@@ -45,15 +45,15 @@ Running specefic tests can be accomplished by invoking it with pytest as follows
4545
cd test && docker compose up -d && PROJECT=aperturedata KAGGLE_username=ci KAGGLE_key=dummy coverage run -m python -m pytest test_Session.py -v --log-cli-level=DEBUG
4646
```
4747

48-
# Reporting bugs.
48+
# Reporting bugs
4949
Any error in the functionality / documentation / tests maybe reported by creating a
5050
[github issue](https://github.com/aperture-data/aperturedb-python/issues).
5151

52-
# Development guidelines.
52+
# Development guidelines
5353
For inclusion of any features, a PR may be created with a patch,
5454
and a brief description of the problem and the fix.
5555
The CI enforces a coding style guideline with autopep8 and
5656
a script to detect trailing white spaces.
5757

58-
In case a PR encounters failures, the log would describe the location of
58+
If a PR encounters failures, the log will describe the location of
5959
the offending line with a description of the problem.

aperturedb/ParallelQuery.py

Lines changed: 40 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -48,43 +48,13 @@ def execute_batch(q: Commands, blobs: Blobs, db: Connector,
4848

4949
if db.last_query_ok():
5050
if response_handler is not None:
51-
# We could potentially always call this handler function
52-
# and let the user deal with the error cases.
53-
blobs_returned = 0
54-
for i in range(math.ceil(len(q) / commands_per_query)):
55-
start = i * commands_per_query
56-
end = start + commands_per_query
57-
blobs_start = i * blobs_per_query
58-
blobs_end = blobs_start + blobs_per_query
59-
60-
b_count = 0
61-
if issubclass(type(r), list):
62-
for req, resp in zip(q[start:end], r[start:end]):
63-
for k in req:
64-
# Ref to https://docs.aperturedata.io/query_language/Reference/shared_command_parameters/blobs
65-
blobs_where_default_true = \
66-
k in ["FindImage", "FindBlob", "FindVideo"] and (
67-
"blobs" not in req[k] or req[k]["blobs"])
68-
blobs_where_default_false = \
69-
k in [
70-
"FindDescriptor", "FindBoundingBox"] and "blobs" in req[k] and req[k]["blobs"]
71-
if blobs_where_default_true or blobs_where_default_false:
72-
count = resp[k]["returned"]
73-
b_count += count
74-
75-
try:
76-
# The returned blobs need to be sliced to match the
77-
# returned entities per command in query.
78-
response_handler(
79-
q[start:end],
80-
blobs[blobs_start:blobs_end],
81-
r[start:end] if issubclass(type(r), list) else r,
82-
b[blobs_returned:blobs_returned + b_count] if len(b) >= blobs_returned + b_count else None)
83-
except BaseException as e:
84-
logger.exception(e)
85-
if strict_response_validation:
86-
raise e
87-
blobs_returned += b_count
51+
try:
52+
ParallelQuery.map_response_to_handler(response_handler,
53+
q, blobs, r, b, commands_per_query, blobs_per_query)
54+
except BaseException as e:
55+
logger.exception(e)
56+
if strict_response_validation:
57+
raise e
8858
else:
8959
# Transaction failed entirely.
9060
logger.error(f"Failed query = {q} with response = {r}")
@@ -140,6 +110,39 @@ def setSuccessStatus(cls, statuses: list[int]):
140110
def getSuccessStatus(cls):
141111
return cls.success_statuses
142112

113+
@classmethod
114+
def map_response_to_handler(cls, handler, query, query_blobs, response, response_blobs,
115+
commands_per_query, blobs_per_query):
116+
# We could potentially always call this handler function
117+
# and let the user deal with the error cases.
118+
blobs_returned = 0
119+
for i in range(math.ceil(len(query) / commands_per_query)):
120+
start = i * commands_per_query
121+
end = start + commands_per_query
122+
blobs_start = i * blobs_per_query
123+
blobs_end = blobs_start + blobs_per_query
124+
125+
b_count = 0
126+
if issubclass(type(response), list):
127+
for req, resp in zip(query[start:end], response[start:end]):
128+
for k in req:
129+
blob_returning_commands = ["FindImage", "FindBlob", "FindVideo",
130+
"FindDescriptor", "FindBoundingBox"]
131+
if k in blob_returning_commands and "blobs" in req[k] and req[k]["blobs"]:
132+
count = resp[k]["returned"]
133+
b_count += count
134+
135+
# The returned blobs need to be sliced to match the
136+
# returned entities per command in query.
137+
handler(
138+
query[start:end],
139+
query_blobs[blobs_start:blobs_end],
140+
response[start:end] if issubclass(
141+
type(response), list) else response,
142+
response_blobs[blobs_returned:blobs_returned + b_count] if
143+
len(response_blobs) >= blobs_returned + b_count else None)
144+
blobs_returned += b_count
145+
143146
def __init__(self, db: Connector, dry_run: bool = False):
144147

145148
super().__init__()

aperturedb/ParallelQuerySet.py

Lines changed: 36 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
from typing import Any, Callable, List, Tuple
33
import itertools
44
import logging
5+
import math
56

67
import numpy as np
78

@@ -28,7 +29,7 @@ def remove_blobs(item: Any) -> Any:
2829
return item
2930

3031

31-
def gen_execute_batch_sets(base_executor, per_batch_response_handler: Callable = None):
32+
def gen_execute_batch_sets(base_executor):
3233

3334
#
3435
# execute_batch_sets - executes multiple sets of queries with optional constraints on follow on sets
@@ -47,7 +48,7 @@ def gen_execute_batch_sets(base_executor, per_batch_response_handler: Callable =
4748
# execution
4849
#
4950
def execute_batch_sets(query_set, blob_set, db, success_statuses: list[int] = [0],
50-
response_handler: Callable = None, commands_per_query: list[int] = -1,
51+
response_handler: Optional[Callable] = None, commands_per_query: list[int] = -1,
5152
blobs_per_query: list[int] = -1, strict_response_validation: bool = False):
5253

5354
logger.info("Execute Batch Sets = Batch Size {0} Comands Per Query {1} Blobs Per Query {2}".format(
@@ -69,13 +70,21 @@ def execute_batch_sets(query_set, blob_set, db, success_statuses: list[int] = [0
6970
# verify layout if a complex set
7071
if per_set_blobs:
7172
first_element_blobs = blob_set[0]
73+
74+
if len(first_element_blobs) == 0 or len(first_element_blobs) != set_total:
75+
# user has confused blob format for sure.
76+
logger.error("Malformed blobs for first element. Blob return from your loader "
77+
"should be [query_blobs] where query_blobs = [ first_cmd_list, second_cmd_list, ... ] ")
78+
raise Exception(
79+
"Malformed blobs input. Expected First element to have a list of blobs for each set.")
80+
7281
first_query_blobs = first_element_blobs[0]
7382
# If someone is looking for info logging from PQS, it is likely that blobs are not being set properly.
7483
# The wrapping of blobs in general can be confusing. Best suggestion is looking at a loader.
7584
logger.info("Blobs for first set = " +
76-
str(remove_blobs(blob_set[0])))
85+
str(remove_blobs(first_element_blobs)))
7786
logger.info("First Blob for first set = " +
78-
str(remove_blobs(blob_set[0][0])))
87+
str(remove_blobs(first_query_blobs)))
7988
if not isinstance(first_query_blobs, list):
8089
logger.error(
8190
"Expected a list of lists for the first element's blob sets")
@@ -111,7 +120,7 @@ def set_blob_filter(all_blobs, strike_list, set_nm):
111120
# the list comprehension pulls out the blob set for the requested set
112121
# the blob set is then flattened as the query expects a flat array using blobs_per_query as the iterator
113122
# the flat list is them zipped with the strike list, which determines which blobs are unused
114-
# the filter checks if the blob is to be struc
123+
# the filter checks if the blob is to be struck
115124
# the map pulls the remaining blobs out
116125

117126
return list(map(lambda pair: pair[0],
@@ -155,9 +164,9 @@ def first_only_blobs(all_blobs, strike_list, set_nm):
155164

156165
# allowed layouts for commands other than the seed command
157166
# { "cmd" : {} } -> standard single command
158-
# [{ "cmd1": {}, "cmd2} : {}] -> standard multiple command
159-
# [{ "constraint" : {} , { "cmd" : {} }] -> constraint with a single command
160-
# [{ "constraints: {} , [{"cmd1" : {} }, {"cmd2": {} }]] -> constraint with multiple command
167+
# [{ "cmd1": {} },{ "cmd2" : {} }] -> standard multiple command
168+
# [{ constraints } , { "cmd" : {} }] -> constraint with a single command
169+
# [{ constraints } , [{"cmd1" : {} }, {"cmd2": {} }]] -> constraint with multiple command
161170

162171
known_constraint_keys = ["results", "apply"]
163172
constraints = None
@@ -202,6 +211,10 @@ def constraint_filter(single_line, single_results):
202211
passed_all_constraints = True
203212
for result_number in result_constraints:
204213

214+
if not isinstance(result_number, int):
215+
raise Exception("Keys for result constraints must be numbers: "
216+
f"{result_number} is {type(result_number)}")
217+
205218
if len(single_results) < result_number or single_results[result_number] is None:
206219
# in theory here we have two possibilities: a user can have a correctly formed constraint which didn't execute by design
207220
# ( which is what process here )
@@ -278,17 +291,24 @@ def constraint_filter(single_line, single_results):
278291
blob_strike_list = list(map(lambda q: q is None, queries))
279292

280293
# filter out struck blobs
281-
used_blobs = filter(lambda b: b is not None,
282-
blob_filter(blob_set, blob_strike_list, i))
294+
used_blobs = list(filter(lambda b: b is not None,
295+
blob_filter(blob_set, blob_strike_list, i)))
283296

284-
# TODO: add wrapped response_handler.
285-
if response_handler != None:
286-
logger.warning(
287-
"ParallelQuerySet does not yet support a response_handler which will identify which set is being worked on")
288297
if len(executable_queries) > 0:
289298
result_code, db_results, db_blobs = base_executor(executable_queries, used_blobs,
290299
db, local_success_statuses,
291300
None, commands_per_query[i], blobs_per_query[i], strict_response_validation=strict_response_validation)
301+
if response_handler != None and db.last_query_ok():
302+
def map_to_set(query, query_blobs, resp, resp_blobs):
303+
response_handler(
304+
i, query, query_blobs, resp, resp_blobs)
305+
try:
306+
ParallelQuery.map_response_to_handler(map_to_set,
307+
executable_queries, used_blobs, db_results, db_blobs, commands_per_query[i], blobs_per_query[i])
308+
except BaseException as e:
309+
logger.exception(e)
310+
if strict_response_validation:
311+
raise e
292312
else:
293313
logger.info(
294314
f"Skipped executing set {i}, no executable queries")
@@ -364,10 +384,8 @@ def do_batch(self, db: Connector, data: List[Tuple[Commands, Blobs]]) -> None:
364384
self.commands_per_query = self.generator.commands_per_query
365385
self.blobs_per_query = self.generator.blobs_per_query
366386
set_response_handler = None
367-
if hasattr(self.generator, "set_response_handler") and callable(self.generator.set_response_handler):
368-
set_response_handler = self.generator.set_response_handler
369387
self.batch_command = gen_execute_batch_sets(
370-
self.base_batch_command, set_response_handler)
388+
self.base_batch_command)
371389

372390
ParallelQuery.do_batch(self, db, data)
373391

@@ -388,7 +406,7 @@ def print_stats(self) -> None:
388406
else:
389407
mean = np.mean(times)
390408
std = np.std(times)
391-
tp = 1 / mean * self.numthreads
409+
tp = 0 if mean == 0 else 1 / mean * self.numthreads
392410

393411
print(f"Avg Query time (s): {mean}")
394412
print(f"Query time std: {std}")

aperturedb/Utils.py

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import os
77
import importlib
88
import sys
9-
from typing import List
9+
from typing import List, Optional, Dict
1010

1111
from graphviz import Source, Digraph
1212

@@ -522,7 +522,8 @@ def count_connections(self, connections_class, constraints=None) -> int:
522522

523523
return total_connections
524524

525-
def add_descriptorset(self, name: str, dim: int, metric="L2", engine="FaissFlat") -> bool:
525+
def add_descriptorset(self, name: str, dim: int, metric="L2", engine="FaissFlat",
526+
properties: Optional[Dict] = None) -> bool:
526527
"""
527528
Add a descriptor set to the database.
528529
@@ -531,6 +532,7 @@ def add_descriptorset(self, name: str, dim: int, metric="L2", engine="FaissFlat"
531532
dim (int): The dimension of the descriptors.
532533
metric (str, optional): The metric to use for the descriptors.
533534
engine (str, optional): The engine to use for the descriptors.
535+
properties (dict, optional): The properties of the descriptor set.
534536
535537
Returns:
536538
success (bool): True if the operation was successful, False otherwise.
@@ -544,6 +546,9 @@ def add_descriptorset(self, name: str, dim: int, metric="L2", engine="FaissFlat"
544546
}
545547
}]
546548

549+
if properties is not None:
550+
query[0]["AddDescriptorSet"]["properties"] = properties
551+
547552
try:
548553
self.execute(query)
549554
except:

aperturedb/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
logger = logging.getLogger(__name__)
99

10-
__version__ = "0.4.27"
10+
__version__ = "0.4.28"
1111

1212
# set log level
1313
logger.setLevel(logging.DEBUG)

docker/notebook/Dockerfile

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,19 @@ RUN chmod 755 /start.sh
1313
# ADD https://github.com/krallin/tini/releases/download/${TINI_VERSION}/tini /usr/bin/tini
1414
# RUN chmod +x /usr/bin/tini
1515
# ENTRYPOINT ["/usr/bin/tini", "--"]
16-
RUN cd /aperturedata && pip install -e ".[notebook]"
16+
RUN cd /aperturedata && pip install -e ".[dev]"
17+
RUN echo "adb --install-completion" | bash
18+
19+
# Install useful JupyterLab extensions
20+
RUN pip install jupyter-resource-usage
21+
22+
# Suppress the annoying announcements popup
23+
RUN jupyter labextension disable "@jupyterlab/apputils-extension:announcements"
24+
25+
# Install CLIP (for running transformers)
26+
RUN pip install git+https://github.com/openai/CLIP.git
27+
28+
RUN apt update && apt install -y curl && apt clean
1729

1830
EXPOSE 8888
1931
CMD ["/start.sh"]

0 commit comments

Comments
 (0)