Skip to content

Commit 15a628e

Browse files
committed
2 parents a2e338d + f39d655 commit 15a628e

20 files changed

+249
-65
lines changed

.github/workflows/development.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ jobs:
4040
PY_VER: ${{matrix.py_ver}}
4141
MYSQL_VER: ${{matrix.mysql_ver}}
4242
ALPINE_VER: "3.10"
43-
MINIO_VER: RELEASE.2019-09-26T19-42-35Z
43+
MINIO_VER: RELEASE.2021-09-03T03-56-13Z
4444
COMPOSE_HTTP_TIMEOUT: "120"
4545
COVERALLS_SERVICE_NAME: travis-ci
4646
COVERALLS_REPO_TOKEN: fd0BoXG46TPReEem0uMy7BJO5j0w1MQiY

CHANGELOG.md

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,15 @@
11
## Release notes
22

3-
### 0.13.3 -- May 28, 2021
3+
### 0.13.3 -- TBD
4+
* Add - Expose proxy feature for S3 external stores (#961) PR #962
45
* Bugfix - Dependencies not properly loaded on populate. (#902) PR #919
6+
* Bugfix - Replace use of numpy aliases of built-in types with built-in type. (#938) PR #939
7+
* Bugfix - `ExternalTable.delete` should not remove row on error (#953) PR #956
8+
* Bugfix - Fix error handling of remove_object function in `s3.py` (#952) PR #955
9+
* Bugfix - Fix regression issue with `DISTINCT` clause and `GROUP_BY` (#914) PR #963
10+
* Bugfix - Fix sql code generation to comply with sql mode `ONLY_FULL_GROUP_BY` (#916) PR #965
11+
* Bugfix - Fix count for left-joined `QueryExpressions` (#951) PR #966
12+
* Bugfix - Fix assertion error when performing a union into a join (#930) PR #967
513

614
### 0.13.2 -- May 7, 2021
715
* Update `setuptools_certificate` dependency to new name `otumat`
@@ -44,13 +52,13 @@
4452
* Fix display of part tables in `schema.save`. (#821) PR #833
4553
* Add `schema.list_tables`. (#838) PR #844
4654
* Fix minio new version regression. PR #847
47-
* Add more S3 logging for debugging. (#831) PR #832
55+
* Add more S3 logging for debugging. (#831) PR #832
4856
* Convert testing framework from TravisCI to GitHub Actions (#841) PR #840
49-
57+
5058
### 0.12.7 -- Oct 27, 2020
5159
* Fix case sensitivity issues to adapt to MySQL 8+. PR #819
5260
* Fix pymysql regression bug (#814) PR #816
53-
* Adapted attribute types now have dtype=object in all recarray results. PR #811
61+
* Adapted attribute types now have dtype=object in all recarray results. PR #811
5462

5563
### 0.12.6 -- May 15, 2020
5664
* Add `order_by` to `dj.kill` (#668, #779) PR #775, #783
@@ -142,9 +150,9 @@
142150
* Bugfix in restriction of the form (A & B) * B (#463)
143151
* Improved error messages (#466)
144152

145-
### 0.10.0 -- Jan 10, 2018
153+
### 0.10.0 -- Jan 10, 2018
146154
* Deletes are more efficient (#424)
147-
* ERD shows table definition on tooltip hover in Jupyter (#422)
155+
* ERD shows table definition on tooltip hover in Jupyter (#422)
148156
* S3 external storage
149157
* Garbage collection for external sorage
150158
* Most operators and methods of tables can be invoked as class methods rather than instance methods (#407)
@@ -158,7 +166,7 @@
158166
* Implement union operator +
159167
* Implement file-based external storage
160168

161-
### 0.8.0 -- Jul 26, 2017
169+
### 0.8.0 -- Jul 26, 2017
162170
Documentation and tutorials available at https://docs.datajoint.io and https://tutorials.datajoint.io
163171
* improved the ERD graphics and features using the graphviz libraries (#207, #333)
164172
* improved password handling logic (#322, #321)
@@ -177,11 +185,11 @@ Documentation and tutorials available at https://docs.datajoint.io and https://t
177185
* Added `dj.create_virtual_module`
178186

179187
### 0.4.10 (#286) -- Feb 6, 2017
180-
* Removed Vagrant and Readthedocs support
188+
* Removed Vagrant and Readthedocs support
181189
* Explicit saving of configuration (issue #284)
182190

183191
### 0.4.9 (#285) -- Feb 2, 2017
184-
* Fixed setup.py for pip install
192+
* Fixed setup.py for pip install
185193

186194
### 0.4.7 (#281) -- Jan 24, 2017
187195
* Fixed issues related to order of attributes in projection.
@@ -210,10 +218,10 @@ Documentation and tutorials available at https://docs.datajoint.io and https://t
210218

211219
### 0.3.8 -- Aug 2, 2016
212220
* added the `_update` method in `base_relation`. It allows updating values in existing tuples.
213-
* bugfix in reading values of type double. Previously it was cast as float32.
221+
* bugfix in reading values of type double. Previously it was cast as float32.
214222

215223
### 0.3.7 -- Jul 31, 2016
216-
* added parameter `ignore_extra_fields` in `insert`
224+
* added parameter `ignore_extra_fields` in `insert`
217225
* `insert(..., skip_duplicates=True)` now relies on `SELECT IGNORE`. Previously it explicitly checked if tuple already exists.
218226
* table previews now include blob attributes displaying the string <BLOB>
219227

LNX-docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ services:
3232
interval: 1s
3333
fakeservices.datajoint.io:
3434
<<: *net
35-
image: datajoint/nginx:v0.0.16
35+
image: datajoint/nginx:v0.0.18
3636
environment:
3737
- ADD_db_TYPE=DATABASE
3838
- ADD_db_ENDPOINT=db:3306

README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ A number of labs are currently adopting DataJoint and we are quickly getting the
108108
PY_VER=3.7
109109
ALPINE_VER=3.10
110110
MYSQL_VER=5.7
111-
MINIO_VER=RELEASE.2019-09-26T19-42-35Z
111+
MINIO_VER=RELEASE.2021-09-03T03-56-13Z
112112
UID=1000
113113
GID=1000
114114
```
@@ -136,6 +136,8 @@ GID=1000
136136
* Add entry in `/etc/hosts` for `127.0.0.1 fakeservices.datajoint.io`
137137

138138

139+
140+
139141
### Launch Jupyter Notebook for Interactive Use
140142
* Navigate to `localhost:8888`
141143
* Input Jupyter password

datajoint/blob.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -164,7 +164,7 @@ def pack_blob(self, obj):
164164
return self.pack_recarray(np.array(obj))
165165
if isinstance(obj, np.number):
166166
return self.pack_array(np.array(obj))
167-
if isinstance(obj, (np.bool, np.bool_)):
167+
if isinstance(obj, (bool, np.bool_)):
168168
return self.pack_array(np.array(obj))
169169
if isinstance(obj, (datetime.datetime, datetime.date, datetime.time)):
170170
return self.pack_datetime(obj)
@@ -365,7 +365,7 @@ def read_struct(self):
365365
raw_data = [
366366
tuple(self.read_blob(n_bytes=int(self.read_value('uint64'))) for _ in range(n_fields))
367367
for __ in range(n_elem)]
368-
data = np.array(raw_data, dtype=list(zip(field_names, repeat(np.object))))
368+
data = np.array(raw_data, dtype=list(zip(field_names, repeat(object))))
369369
return self.squeeze(data.reshape(shape, order="F"), convert_to_scalar=False).view(MatStruct)
370370

371371
def pack_struct(self, array):

datajoint/diagram.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -296,7 +296,7 @@ def make_dot(self):
296296
node.set_style('filled')
297297

298298
for edge in dot.get_edges():
299-
# see http://www.graphviz.org/content/attrs
299+
# see https://graphviz.org/doc/info/attrs.html
300300
src = edge.get_source().strip('"')
301301
dest = edge.get_destination().strip('"')
302302
props = graph.get_edge_data(src, dest)

datajoint/expression.py

Lines changed: 14 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ class QueryExpression:
4444
_heading = None
4545
_support = None
4646

47+
# If the query will be using distinct
48+
_distinct = False
49+
4750
@property
4851
def connection(self):
4952
""" a dj.Connection object """
@@ -106,9 +109,8 @@ def make_sql(self, fields=None):
106109
Make the SQL SELECT statement.
107110
:param fields: used to explicitly set the select attributes
108111
"""
109-
distinct = self.heading.names == self.primary_key
110112
return 'SELECT {distinct}{fields} FROM {from_}{where}'.format(
111-
distinct="DISTINCT " if distinct else "",
113+
distinct="DISTINCT " if self._distinct else "",
112114
fields=self.heading.as_sql(fields or self.heading.names),
113115
from_=self.from_clause(), where=self.where_clause())
114116

@@ -266,9 +268,11 @@ def join(self, other, semantic_check=True, left=False):
266268
- join_attributes)
267269
# need subquery if any of the join attributes are derived
268270
need_subquery1 = (need_subquery1 or isinstance(self, Aggregation) or
269-
any(n in self.heading.new_attributes for n in join_attributes))
271+
any(n in self.heading.new_attributes for n in join_attributes)
272+
or isinstance(self, Union))
270273
need_subquery2 = (need_subquery2 or isinstance(other, Aggregation) or
271-
any(n in other.heading.new_attributes for n in join_attributes))
274+
any(n in other.heading.new_attributes for n in join_attributes)
275+
or isinstance(self, Union))
272276
if need_subquery1:
273277
self = self.make_subquery()
274278
if need_subquery2:
@@ -440,8 +444,10 @@ def tail(self, limit=25, **fetch_kwargs):
440444
def __len__(self):
441445
""":return: number of elements in the result set e.g. ``len(q1)``."""
442446
return self.connection.query(
443-
'SELECT count(DISTINCT {fields}) FROM {from_}{where}'.format(
444-
fields=self.heading.as_sql(self.primary_key, include_aliases=False),
447+
'SELECT {select_} FROM {from_}{where}'.format(
448+
select_=('count(*)' if any(self._left)
449+
else 'count(DISTINCT {fields})'.format(fields=self.heading.as_sql(
450+
self.primary_key, include_aliases=False))),
445451
from_=self.from_clause(),
446452
where=self.where_clause())).fetchone()[0]
447453

@@ -554,7 +560,7 @@ def create(cls, arg, group, keep_all_rows=False):
554560
if inspect.isclass(group) and issubclass(group, QueryExpression):
555561
group = group() # instantiate if a class
556562
assert isinstance(group, QueryExpression)
557-
if keep_all_rows and len(group.support) > 1:
563+
if keep_all_rows and len(group.support) > 1 or group.heading.new_attributes:
558564
group = group.make_subquery() # subquery if left joining a join
559565
join = arg.join(group, left=keep_all_rows) # reuse the join logic
560566
result = cls()
@@ -718,6 +724,7 @@ def __and__(self, other):
718724
if not isinstance(other, QueryExpression):
719725
raise DataJointError('Set U can only be restricted with a QueryExpression.')
720726
result = copy.copy(other)
727+
result._distinct = True
721728
result._heading = result.heading.set_primary_key(self.primary_key)
722729
result = result.proj()
723730
return result

datajoint/external.py

Lines changed: 19 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,10 @@ def _remove_external_file(self, external_path):
127127
if self.spec['protocol'] == 's3':
128128
self.s3.remove_object(external_path)
129129
elif self.spec['protocol'] == 'file':
130-
Path(external_path).unlink()
130+
try:
131+
Path(external_path).unlink()
132+
except FileNotFoundError:
133+
pass
131134

132135
def exists(self, external_filepath):
133136
"""
@@ -314,11 +317,12 @@ def used(self):
314317
return self & [FreeTable(self.connection, ref['referencing_table']).proj(hash=ref['column_name'])
315318
for ref in self.references]
316319

317-
def delete(self, *, delete_external_files=None, limit=None, display_progress=True):
320+
def delete(self, *, delete_external_files=None, limit=None, display_progress=True, errors_as_string=True):
318321
"""
319322
:param delete_external_files: True or False. If False, only the tracking info is removed from the
320323
external store table but the external files remain intact. If True, then the external files
321324
themselves are deleted too.
325+
:param errors_as_string: If True any errors returned when deleting from external files will be strings
322326
:param limit: (integer) limit the number of items to delete
323327
:param display_progress: if True, display progress as files are cleaned up
324328
:return: if deleting external files, returns errors
@@ -337,16 +341,20 @@ def delete(self, *, delete_external_files=None, limit=None, display_progress=Tru
337341
# delete items one by one, close to transaction-safe
338342
error_list = []
339343
for uuid, external_path in items:
340-
try:
341-
count = (self & {'hash': uuid}).delete_quick(get_count=True) # optimize
342-
except Exception:
343-
pass # if delete failed, do not remove the external file
344-
else:
345-
assert count in (0, 1)
344+
row = (self & {'hash': uuid}).fetch()
345+
if row.size:
346346
try:
347-
self._remove_external_file(external_path)
348-
except Exception as error:
349-
error_list.append((uuid, external_path, str(error)))
347+
(self & {'hash': uuid}).delete_quick()
348+
except Exception:
349+
pass # if delete failed, do not remove the external file
350+
else:
351+
try:
352+
self._remove_external_file(external_path)
353+
except Exception as error:
354+
# adding row back into table after failed delete
355+
self.insert1(row[0], skip_duplicates=True)
356+
error_list.append((uuid, external_path,
357+
str(error) if errors_as_string else error))
350358
return error_list
351359

352360

datajoint/s3.py

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
"""
44
from io import BytesIO
55
import minio # https://docs.minio.io/docs/python-client-api-reference
6+
import urllib3
67
import warnings
78
import uuid
89
import logging
@@ -16,9 +17,24 @@ class Folder:
1617
"""
1718
A Folder instance manipulates a flat folder of objects within an S3-compatible object store
1819
"""
19-
def __init__(self, endpoint, bucket, access_key, secret_key, *, secure=False, **_):
20-
self.client = minio.Minio(endpoint, access_key=access_key, secret_key=secret_key,
21-
secure=secure)
20+
def __init__(self, endpoint, bucket, access_key, secret_key, *, secure=False,
21+
proxy_server=None, **_):
22+
# from https://docs.min.io/docs/python-client-api-reference
23+
self.client = minio.Minio(
24+
endpoint,
25+
access_key=access_key,
26+
secret_key=secret_key,
27+
secure=secure,
28+
http_client=(
29+
urllib3.ProxyManager(proxy_server,
30+
timeout=urllib3.Timeout.DEFAULT_TIMEOUT,
31+
cert_reqs="CERT_REQUIRED",
32+
retries=urllib3.Retry(total=5,
33+
backoff_factor=0.2,
34+
status_forcelist=[500, 502, 503,
35+
504]))
36+
if proxy_server else None),
37+
)
2238
self.bucket = bucket
2339
if not self.client.bucket_exists(bucket):
2440
raise errors.BucketInaccessible('Inaccessible s3 bucket %s' % bucket)
@@ -76,12 +92,11 @@ def get_size(self, name):
7692
except minio.error.S3Error as e:
7793
if e.code == 'NoSuchKey':
7894
raise errors.MissingExternalFile
79-
else:
80-
raise e
95+
raise e
8196

8297
def remove_object(self, name):
8398
logger.debug('remove_object: {}:{}'.format(self.bucket, name))
8499
try:
85100
self.client.remove_object(self.bucket, str(name))
86-
except minio.ResponseError:
87-
return errors.DataJointError('Failed to delete %s from s3 storage' % name)
101+
except minio.error.MinioException:
102+
raise errors.DataJointError('Failed to delete %s from s3 storage' % name)

datajoint/settings.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,8 @@ def get_store_spec(self, store):
137137
spec['subfolding'] = spec.get('subfolding', DEFAULT_SUBFOLDING)
138138
spec_keys = { # REQUIRED in uppercase and allowed in lowercase
139139
'file': ('PROTOCOL', 'LOCATION', 'subfolding', 'stage'),
140-
's3': ('PROTOCOL', 'ENDPOINT', 'BUCKET', 'ACCESS_KEY', 'SECRET_KEY', 'LOCATION', 'secure', 'subfolding', 'stage')}
140+
's3': ('PROTOCOL', 'ENDPOINT', 'BUCKET', 'ACCESS_KEY', 'SECRET_KEY', 'LOCATION',
141+
'secure', 'subfolding', 'stage', 'proxy_server')}
141142

142143
try:
143144
spec_keys = spec_keys[spec.get('protocol', '').lower()]

0 commit comments

Comments
 (0)