Skip to content

Commit 6aa0836

Browse files
luisremisdrewaoglegsaluja9
authored
Release 0.2.0
* autopep changes * Add REST API Example * Add error handling on Utils for descriptors * More explicit, and up to date docstrings for loaders * Fix Connector * Update DataLoaders. Add Examples. * Example 1 : Interaction with pytorch * Example 2 : classification using alexnet. * Added a readme with some instruction to run example scripts. * Better data frame rendetion in sphinx. * Example 1 and 2 after refactoring common code. * Example 3 : Similarity Search * Introduces Changes for ingesting images from kaggle. * Clean up the requirements file to not include transitive deps. * Refactored DataHelper to KaggleDataset * Refactoring of the manner in which data is ingested into aperturedb. * Reintroduced the Generator + Loader mechanism with depreciation warning. * Addressed some typos in docs. * Bugfix based on default args being a mutable object in Utils. * Add method for counting descriptors on Utils * Add method for counting descriptors on Utils * Add testcase for count_descriptors_in_set on Utils * Better logging and post mortem for data integrity. * Upgrade protobuf generated file using protoc 3.20.0 (#91) * Bump up version to 0.2.0 (protobuf change) * Update requirements.txt Co-authored-by: Drew Ogle <[email protected]> Co-authored-by: Gautam Saluja <[email protected]>
1 parent d1d446a commit 6aa0836

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+5207
-365
lines changed

.github/workflows/checks.yml

Lines changed: 0 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -14,19 +14,5 @@ jobs:
1414
with:
1515
python-version: '3.8.10'
1616
- uses: pre-commit/[email protected]
17-
1817
- uses: actions/checkout@v2
1918
- uses: luisremis/find-trailing-whitespace@master
20-
21-
trailing-spaces:
22-
23-
# The type of runner that the job will run on Ubuntu 18.04 (latest)
24-
runs-on: ubuntu-latest
25-
26-
# Steps represent a sequence of tasks that will be
27-
# executed as part of the job
28-
steps:
29-
# Checks-out your repository under $GITHUB_WORKSPACE,
30-
# so your job can access it
31-
- uses: actions/checkout@v2
32-
- uses: luisremis/find-trailing-whitespace@master

.github/workflows/main.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ jobs:
3838
AWS_DEFAULT_REGION: ${{ secrets.AWS_DEFAULT_REGION }}
3939
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
4040
run: |
41-
pip3 install ipython torch torchvision boto3
4241
cd test
4342
bash run_test.sh
4443

.gitignore

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,12 @@ dmypy.json
132132
*.adb.csv
133133
*.jpg
134134
*.npy
135-
env3/
136135
test/aperturedb/db/
137136
test/input/blobs/
137+
docs/examples/
138+
examples/*/coco
139+
examples/*/classification.txt
140+
kaggleds/
141+
examples/*/kaggleds/
142+
docs/*/*.svg
143+
test/aperturedb/logs

README.md

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,32 @@ the open source connector for [VDMS](https://github.com/IntelLabs/vdms).
88
It also implements an Object-Mapper API to interact with
99
elements in ApertureDB at the object level.
1010

11-
* Status.py provides helper methods to retrieve information about the db.
11+
* Utils.py provides helper methods to retrieve information about the db.
1212
* Images.py provides the Object-Mapper for image related objetcs (images, bounding boxes, etc)
1313
* NotebookHelpers.py provides helpers to show images/bounding boxes on Jupyter Notebooks
1414

15-
For more information, visit https://aperturedata.io
15+
For more information, visit https://python.docs.aperturedata.io
16+
17+
# Running tests.
18+
The tests are inside the test dir.
19+
20+
All the tests can be run with:
21+
22+
``bash run_test.sh``
23+
24+
Running specefic tests can be accomplished by invoking it with pytest as follows:
25+
26+
``python -m pytest test_Session.py -v --log-cli-level=DEBUG``
27+
28+
# Reporting bugs.
29+
Any error in the functionality / documentation / tests maybe reported by creating a
30+
[github issue](https://github.com/aperture-data/aperturedb-python/issues).
31+
32+
# Development guidelines.
33+
For inclusion of any features, a PR may be created with a patch,
34+
and a brief description of the problem and the fix.
35+
The CI enforces a coding style guideline with autopep8 and
36+
a script to detect trailing white spaces.
37+
38+
In case a PR encounters failures, the log would describe the location of
39+
the offending line with a description of the problem.

aperturedb/BBoxDataCSV.py

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
from matplotlib.transforms import Bbox
2+
from aperturedb import ParallelLoader
3+
from aperturedb import CSVParser
4+
5+
HEADER_X_POS = "x_pos"
6+
HEADER_Y_POS = "y_pos"
7+
HEADER_WIDTH = "width"
8+
HEADER_HEIGHT = "height"
9+
IMG_KEY_PROP = "img_key_prop"
10+
IMG_KEY_VAL = "img_key_value"
11+
12+
13+
class BBoxDataCSV(CSVParser.CSVParser):
14+
"""
15+
**ApertureDB BBox Data.**
16+
17+
This class loads the Bounding Box Data which is present in a csv file,
18+
and converts it into a series of aperturedb queries.
19+
20+
.. note::
21+
Is backed by a csv file with the following columns:
22+
23+
``IMG_KEY``, ``x_pos``, ``y_pos``, ``width``, ``height``, ``BBOX_PROP_NAME_1``, ... ``BBOX_PROP_NAME_N``, ``constraint_BBOX_PROP_NAME_1``
24+
25+
**IMG_KEY**: column has the property name of the image property that
26+
the bounding box will be connected to, and each row has the value
27+
that will be used for finding the image.
28+
29+
**x_pos, y_pos**: Specify the coordinates of top left of the bounding box.
30+
31+
**width, height**: Specify the dimensions of the bounding box, as integers (unit is in pixels).
32+
33+
**BBOX_PROP_NAME_N**: is an arbitrary name of the property of the bounding
34+
box, and each row has the value for that property.
35+
36+
**constraint_BBOX_PROP_NAME_1**: Constraints against specific property, used for conditionally adding a Bounding Box.
37+
38+
Example csv file::
39+
40+
img_unique_id,x_pos,y_pos,width,height,type,dataset_id,constraint_dataset_id
41+
d5b25253-9c1e,257,154,84,125,manual,12345,12345
42+
d5b25253-9c1e,7,537,522,282,manual,12346,12346
43+
...
44+
45+
Example usage:
46+
47+
.. code-block:: python
48+
49+
data = BBoxDataCSV("/path/to/BoundingBoxesData.csv")
50+
loader = ParallelLoader(db)
51+
loader.ingest(data)
52+
53+
54+
.. important::
55+
In the above example, the constraint_dataset_id ensures that a bounding box with the specified
56+
dataset_id would be only inserted if it does not already exist in the database.
57+
58+
"""
59+
60+
def __init__(self, filename):
61+
62+
super().__init__(filename)
63+
64+
self.props_keys = [x for x in self.header[5:]
65+
if not x.startswith(CSVParser.CONTRAINTS_PREFIX)]
66+
self.constraints_keys = [x for x in self.header[5:]
67+
if x.startswith(CSVParser.CONTRAINTS_PREFIX)]
68+
69+
self.img_key = self.header[0]
70+
self.command = "AddBoundingBox"
71+
72+
def getitem(self, idx):
73+
74+
val = self.df.loc[idx, self.img_key]
75+
box_data_headers = [HEADER_X_POS,
76+
HEADER_Y_POS, HEADER_WIDTH, HEADER_HEIGHT]
77+
box_data = [int(self.df.loc[idx, h]) for h in box_data_headers]
78+
79+
q = []
80+
81+
ref_counter = idx + 1
82+
# TODO we could reuse image references within the batch
83+
# instead of creating a new find for every image.
84+
img_ref = ref_counter
85+
fi = {
86+
"FindImage": {
87+
"_ref": img_ref,
88+
}
89+
}
90+
91+
key = self.img_key
92+
val = val
93+
constraints = {}
94+
constraints[key] = ["==", val]
95+
fi["FindImage"]["constraints"] = constraints
96+
q.append(fi)
97+
98+
rect_attrs = ["x", "y", "width", "height"]
99+
custom_fields = {
100+
"image": img_ref,
101+
"rectangle": {
102+
attr: val for attr, val in zip(rect_attrs, box_data)
103+
},
104+
}
105+
abb = self._basic_command(idx, custom_fields)
106+
107+
properties = self.parse_properties(self.df, idx)
108+
if properties:
109+
props = properties
110+
if "_label" in props:
111+
abb[self.command]["label"] = props["_label"]
112+
props.pop("_label")
113+
# Check if props is not empty after removing "_label"
114+
if props:
115+
abb[self.command]["properties"] = props
116+
q.append(abb)
117+
118+
return q, []
119+
120+
def validate(self):
121+
122+
self.header = list(self.df.columns.values)
123+
124+
if self.header[1] != HEADER_X_POS:
125+
raise Exception("Error with CSV file field: " + HEADER_X_POS)
126+
if self.header[2] != HEADER_Y_POS:
127+
raise Exception("Error with CSV file field: " + HEADER_Y_POS)
128+
if self.header[3] != HEADER_WIDTH:
129+
raise Exception("Error with CSV file field: " + HEADER_WIDTH)
130+
if self.header[4] != HEADER_HEIGHT:
131+
raise Exception("Error with CSV file field: " + HEADER_HEIGHT)

aperturedb/BBoxLoader.py

Lines changed: 43 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -10,28 +10,34 @@
1010

1111

1212
class BBoxGeneratorCSV(CSVParser.CSVParser):
13-
"""**ApertureDB BBox Data loader.**
13+
"""**ApertureDB Bounding Box Data generator.**
14+
15+
.. warning::
16+
Deprecated. Use :class:`~aperturedb.BBoxDataCSV.BBoxDataCSV` instead.
1417
1518
.. note::
16-
Expects a csv file with the following columns:
19+
Is backed by a csv file with the following columns:
1720
18-
``IMG_KEY``, ``x_pos``, ``y_pos``, ``width``, ``height``, ``BBOX_PROP_NAME_1``, ... ``BBOX_PROP_NAME_N``
21+
``IMG_KEY``, ``x_pos``, ``y_pos``, ``width``, ``height``, ``BBOX_PROP_NAME_1``, ... ``BBOX_PROP_NAME_N``, ``constraint_BBOX_PROP_NAME_1``
1922
20-
IMG_KEY column has the property name of the image property that
23+
**IMG_KEY**: column has the property name of the image property that
2124
the bounding box will be connected to, and each row has the value
2225
that will be used for finding the image.
2326
24-
x_pos,y_pos,width,height are the coordinates of the bounding boxes,
25-
as integers (unit is in pixels)
27+
**x_pos, y_pos**: Specify the coordinates of top left of the bounding box.
28+
29+
**width, height**: Specify the dimensions of the bounding box, as integers (unit is in pixels).
2630
27-
BBOX_PROP_NAME_N is an arbitrary name of the property of the bounding
31+
**BBOX_PROP_NAME_N**: is an arbitrary name of the property of the bounding
2832
box, and each row has the value for that property.
2933
34+
**constraint_BBOX_PROP_NAME_1**: Constraints against specific property, used for conditionally adding a Bounding Box.
35+
3036
Example csv file::
3137
32-
img_unique_id,x_pos,y_pos,width,height,type
33-
d5b25253-9c1e,257,154,84,125,manual
34-
d5b25253-9c1e,7,537,522,282,manual
38+
img_unique_id,x_pos,y_pos,width,height,type,dataset_id,constraint_dataset_id
39+
d5b25253-9c1e,257,154,84,125,manual,12345,12345
40+
d5b25253-9c1e,7,537,522,282,manual,12346,12346
3541
...
3642
"""
3743

@@ -46,7 +52,7 @@ def __init__(self, filename):
4652

4753
self.img_key = self.header[0]
4854

49-
def __getitem__(self, idx):
55+
def getitem(self, idx):
5056

5157
data = {
5258
"x": int(self.df.loc[idx, HEADER_X_POS]),
@@ -87,6 +93,32 @@ def validate(self):
8793

8894

8995
class BBoxLoader(ParallelLoader.ParallelLoader):
96+
"""
97+
**A loader that helps to ingest Bounding box information against existing images**
98+
99+
This class facilitates the insertions of Bounding box information connecting them with
100+
the images already inserted in the database.
101+
102+
This executes in conjunction with a **generator** object for example :class:`~aperturedb.BBoxLoader.BBoxGeneratorCSV`,
103+
which is a class that implements iterable inteface and generates "bounding box" elements.
104+
105+
Example::
106+
107+
bbox_data = {
108+
"x": ,
109+
"y": ,
110+
"width": ,
111+
"height":,
112+
"properties: {
113+
"BBOX_PROP_NAME_1": "BBOX_PROP_VALUE_1",
114+
.
115+
.
116+
.
117+
"BBOX_PROP_NAME_N": "BBOX_PROP_VALUE_N"
118+
}
119+
}
120+
121+
"""
90122

91123
def __init__(self, db, dry_run=False):
92124

aperturedb/BlobDataCSV.py

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
import logging
2+
from aperturedb import CSVParser
3+
4+
PROPERTIES = "properties"
5+
CONSTRAINTS = "constraints"
6+
BLOB_PATH = "filename"
7+
8+
logger = logging.getLogger(__name__)
9+
10+
11+
class BlobDataCSV(CSVParser.CSVParser):
12+
"""**ApertureDB Blob Data.**
13+
14+
This class loads the Blob Data which is present in a csv file,
15+
and converts it into a series of aperturedb queries.
16+
17+
.. note::
18+
Is backed by a csv file with the following columns:
19+
20+
``FILENAME``, ``PROP_NAME_1``, ... ``PROP_NAME_N``, ``constraint_PROP_NAME_1``
21+
22+
**FILENAME**: The path of the blob object on the file system.
23+
24+
**PROP_NAME_1 ... PROP_NAME_N**: Arbitrary property names associated with this blob.
25+
26+
**constraint_PROP_NAME_1**: Constraints against specific property, used for conditionally adding a Blob.
27+
28+
Example csv file::
29+
30+
filename,name,lastname,age,id,constraint_id
31+
/mnt/blob1,John,Salchi,69,321423532,321423532
32+
/mnt/blob2,Johna,Salchi,63,42342522,42342522
33+
...
34+
35+
Example usage:
36+
37+
.. code-block:: python
38+
39+
data = BlobDataCSV("/path/to/BlobData.csv")
40+
loader = ParallelLoader(db)
41+
loader.ingest(data)
42+
43+
44+
45+
.. important::
46+
In the above example, the constraint_id ensures that a blob with the specified
47+
id would be only inserted if it does not already exist in the database.
48+
"""
49+
50+
def __init__(self, filename):
51+
52+
super().__init__(filename)
53+
54+
self.props_keys = [x for x in self.header[1:]
55+
if not x.startswith(CSVParser.CONTRAINTS_PREFIX) and x != BLOB_PATH]
56+
self.constraints_keys = [x for x in self.header[1:]
57+
if x.startswith(CSVParser.CONTRAINTS_PREFIX)]
58+
self.command = "AddBlob"
59+
60+
def getitem(self, idx):
61+
filename = self.df.loc[idx, BLOB_PATH]
62+
63+
blob_ok, blob = self.load_blob(filename)
64+
if not blob_ok:
65+
logger.error("Error loading blob: " + filename)
66+
raise Exception("Error loading blob: " + filename)
67+
68+
q = []
69+
ab = self._basic_command(idx)
70+
q.append(ab)
71+
72+
return q, [blob]
73+
74+
def load_blob(self, filename):
75+
76+
try:
77+
fd = open(filename, "rb")
78+
buff = fd.read()
79+
fd.close()
80+
return True, buff
81+
except Exception as e:
82+
logger.exception(e)
83+
84+
return False, None
85+
86+
def validate(self):
87+
88+
self.header = list(self.df.columns.values)
89+
90+
if self.header[0] != BLOB_PATH:
91+
raise Exception("Error with CSV file field: " + BLOB_PATH)

0 commit comments

Comments
 (0)