Skip to content

Commit e2f2b18

Browse files
Merge pull request #707 from datajoint/master
Align dev with master
2 parents 90ed2b0 + 236e46e commit e2f2b18

36 files changed

+1220
-129
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,5 @@ build/
2121
*.env
2222
local-docker-compose.yml
2323
notebooks/*
24-
__main__.py
24+
__main__.py
25+
jupyter_custom.js

.travis.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,15 @@ services:
1313
main: &main
1414
stage: Alpine
1515
os: linux
16+
dist: xenial # precise, trusty, xenial, bionic
1617
language: shell
1718
script:
1819
- docker-compose -f LNX-docker-compose.yml up --build --exit-code-from dj
1920
jobs:
2021
include:
2122
- <<: *main
2223
env:
23-
- PY_VER: "3.8-rc"
24+
- PY_VER: "3.8"
2425
- MYSQL_VER: "5.7"
2526
- <<: *main
2627
env:
@@ -36,7 +37,7 @@ jobs:
3637
- MYSQL_VER: "5.7"
3738
- <<: *main
3839
env:
39-
- PY_VER: "3.8-rc"
40+
- PY_VER: "3.8"
4041
- MYSQL_VER: "8.0"
4142
- <<: *main
4243
env:
@@ -52,7 +53,7 @@ jobs:
5253
- MYSQL_VER: "8.0"
5354
- <<: *main
5455
env:
55-
- PY_VER: "3.8-rc"
56+
- PY_VER: "3.8"
5657
- MYSQL_VER: "5.6"
5758
- <<: *main
5859
env:

CHANGELOG.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,13 @@
11
## Release notes
22

3-
### 0.12.0 -- October 1, 2019
3+
### 0.12.2 -- Nov 11, 2019
4+
* Bugfix - Convoluted error thrown if there is a reference to a non-existent table attribute (#691)
5+
* Bugfix - Insert into external does not trim leading slash if defined in `dj.config['stores']['<store>']['location']` (#692)
6+
7+
### 0.12.1 -- Nov 2, 2019
8+
* Bugfix - AttributeAdapter converts into a string (#684)
9+
10+
### 0.12.0 -- Oct 31, 2019
411
* Dropped support for Python 3.4
512
* Support secure connections with TLS (aka SSL) PR #620
613
* Convert numpy array from python object to appropriate data type if all elements are of the same type (#587) PR #608
@@ -31,7 +38,7 @@
3138
### 0.11.3 -- Jul 26, 2019
3239
* Fix incompatibility with pyparsing 2.4.1 (#629) PR #631
3340

34-
### 0.11.2 -- July 25, 2019
41+
### 0.11.2 -- Jul 25, 2019
3542
* Fix #628 - incompatibility with pyparsing 2.4.1
3643

3744
### 0.11.1 -- Nov 15, 2018

README.md

Lines changed: 37 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -24,58 +24,67 @@ pip3 install --upgrade datajoint
2424
```
2525
## Python Native Blobs
2626

27-
For the v0.12 release, the variable `enable_python_native_blobs` can be
28-
safely enabled for improved blob support of python datatypes if the following
29-
are true:
27+
DataJoint 0.12 adds full support for all native python data types in blobs: tuples, lists, sets, dicts, strings, bytes, `None`, and all their recursive combinations.
28+
The new blobs are a superset of the old functionality and are fully backward compatible.
29+
In previous versions, only MATLAB-style numerical arrays were fully supported.
30+
Some Python datatypes such as dicts were coerced into numpy recarrays and then fetched as such.
3031

31-
* This is a new DataJoint installation / pipeline(s)
32-
* You have not used DataJoint prior to v0.12 with your pipeline(s)
33-
* You do not share blob data between Python and Matlab
32+
However, since some Python types were coerced into MATLAB types, old blobs and new blobs may now be fetched as different types of objects even if they were inserted the same way.
33+
For example, new `dict` objects will be returned as `dict` while the same types of objects inserted with `datajoint 0.11` will be recarrays.
3434

35-
Otherwise, please read the following carefully:
35+
Since this is a big change, we chose to disable full blob support by default as a temporary precaution, which will be removed in version 0.13.
36+
37+
You may enable it by setting the `enable_python_native_blobs` flag in `dj.config`.
38+
39+
```python
40+
import datajoint as dj
41+
dj.config["enable_python_native_blobs"] = True
42+
```
43+
44+
You can safely enable this setting if both of the following are true:
45+
46+
* The only kinds of blobs your pipeline have inserted previously were numerical arrays.
47+
* You do not need to share blob data between Python and MATLAB.
48+
49+
Otherwise, read the following explanation.
3650

3751
DataJoint v0.12 expands DataJoint's blob serialization mechanism with
3852
improved support for complex native python datatypes, such as dictionaries
3953
and lists of strings.
4054

4155
Prior to DataJoint v0.12, certain python native datatypes such as
4256
dictionaries were 'squashed' into numpy structured arrays when saved into
43-
blob attributes. This facilitated easier data sharing between Matlab
57+
blob attributes. This facilitated easier data sharing between MATLAB
4458
and Python for certain record types. However, this created a discrepancy
4559
between insert and fetch datatypes which could cause problems in other
4660
portions of users pipelines.
4761

48-
For v0.12, it was decided to remove the type squashing behavior, instead
49-
creating a separate storage encoding which improves support for storing
50-
native python datatypes in blobs without squashing them into numpy
51-
structured arrays. However, this change creates a compatibility problem
52-
for pipelines which previously relied on the type squashing behavior
53-
since records saved via the old squashing format will continue to fetch
62+
DataJoint v0.12, removes the squashing behavior, instead encoding native python datatypes in blobs directly.
63+
However, this change creates a compatibility problem for pipelines
64+
which previously relied on the type squashing behavior since records
65+
saved via the old squashing format will continue to fetch
5466
as structured arrays, whereas new record inserted in DataJoint 0.12 with
5567
`enable_python_native_blobs` would result in records returned as the
56-
appropriate native python type (dict, etc). Read support for python
57-
native blobs also not yet implemented in DataJoint for Matlab.
68+
appropriate native python type (dict, etc).
69+
Furthermore, DataJoint for MATLAB does not yet support unpacking native Python datatypes.
5870

59-
To prevent data from being stored in mixed format within a table across
60-
upgrades from previous versions of DataJoint, the
61-
`enable_python_native_blobs` flag was added as a temporary guard measure
62-
for the 0.12 release. This flag will trigger an exception if any of the
63-
ambiguous cases are encountered during inserts in order to allow testing
64-
and migration of pre-0.12 pipelines to 0.11 in a safe manner.
71+
With `dj.config["enable_python_native_blobs"]` set to `False` (default),
72+
any attempt to insert any datatype other than a numpy array will result in an exception.
73+
This is meant to get users to read this message in order to allow proper testing
74+
and migration of pre-0.12 pipelines to 0.12 in a safe manner.
6575

6676
The exact process to update a specific pipeline will vary depending on
6777
the situation, but generally the following strategies may apply:
6878

6979
* Altering code to directly store numpy structured arrays or plain
7080
multidimensional arrays. This strategy is likely best one for those
71-
tables requiring compatibility with Matlab.
72-
* Adjust code to deal with both structured array and native fetched data.
81+
tables requiring compatibility with MATLAB.
82+
* Adjust code to deal with both structured array and native fetched data
83+
for those tables that are populated with `dict`s in blobs in pre-0.12 version.
7384
In this case, insert logic is not adjusted, but downstream consumers
7485
are adjusted to handle records saved under the old and new schemes.
75-
* Manually convert data using fetch/insert into a fresh schema.
76-
In this approach, DataJoint's create_virtual_module functionality would
77-
be used in conjunction with a a fetch/convert/insert loop to update
78-
the data to the new native_blob functionality.
86+
* Migrate data into a fresh schema, fetching the old data, converting blobs to
87+
a uniform data type and re-inserting.
7988
* Drop/Recompute imported/computed tables to ensure they are in the new
8089
format.
8190

datajoint/blob.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ def __init__(self, squeeze=False):
7373

7474
def set_dj0(self):
7575
if not config.get('enable_python_native_blobs'):
76-
raise DataJointError('v0.12+ python native blobs disabled. see also: https://github.com/datajoint/datajoint-python/blob/master/README.md')
76+
raise DataJointError('v0.12+ python native blobs disabled. see also: https://github.com/datajoint/datajoint-python#python-native-blobs')
7777

7878
self.protocol = b"dj0\0" # when using new blob features
7979

datajoint/connection.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,14 +39,18 @@ def translate_query_error(client_error, query):
3939
return errors.DuplicateError(*client_error.args[1:])
4040
if isinstance(client_error, client.err.IntegrityError) and client_error.args[0] == 1452:
4141
return errors.IntegrityError(*client_error.args[1:])
42-
# Syntax Errors
42+
# Syntax errors
4343
if isinstance(client_error, client.err.ProgrammingError) and client_error.args[0] == 1064:
4444
return errors.QuerySyntaxError(client_error.args[1], query)
45-
# Existence Errors
45+
# Existence errors
4646
if isinstance(client_error, client.err.ProgrammingError) and client_error.args[0] == 1146:
4747
return errors.MissingTableError(client_error.args[1], query)
4848
if isinstance(client_error, client.err.InternalError) and client_error.args[0] == 1364:
4949
return errors.MissingAttributeError(*client_error.args[1:])
50+
if isinstance(client_error, client.err.InternalError) and client_error.args[0] == 1054:
51+
return errors.UnknownAttributeError(*client_error.args[1:])
52+
# all the other errors are re-raised in original form
53+
return client_error
5054

5155

5256
logger = logging.getLogger(__name__)
@@ -282,4 +286,3 @@ def transaction(self):
282286
raise
283287
else:
284288
self.commit_transaction()
285-

datajoint/diagram.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -236,8 +236,8 @@ def _make_graph(self):
236236
for name in self.nodes_to_show:
237237
foreign_attributes = set(
238238
attr for p in self.in_edges(name, data=True) for attr in p[2]['attr_map'] if p[2]['primary'])
239-
self.node[name]['distinguished'] = (
240-
'primary_key' in self.node[name] and foreign_attributes < self.node[name]['primary_key'])
239+
self.nodes[name]['distinguished'] = (
240+
'primary_key' in self.nodes[name] and foreign_attributes < self.nodes[name]['primary_key'])
241241
# include aliased nodes that are sandwiched between two displayed nodes
242242
gaps = set(nx.algorithms.boundary.node_boundary(self, self.nodes_to_show)).intersection(
243243
nx.algorithms.boundary.node_boundary(nx.DiGraph(self).reverse(), self.nodes_to_show))
@@ -307,7 +307,7 @@ def make_dot(self):
307307
props = graph.get_edge_data(src, dest)
308308
edge.set_color('#00000040')
309309
edge.set_style('solid' if props['primary'] else 'dashed')
310-
master_part = graph.node[dest]['node_type'] is Part and dest.startswith(src+'.')
310+
master_part = graph.nodes[dest]['node_type'] is Part and dest.startswith(src+'.')
311311
edge.set_weight(3 if master_part else 1)
312312
edge.set_arrowhead('none')
313313
edge.set_penwidth(.75 if props['multi'] else 2)

datajoint/errors.py

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,12 +45,6 @@ class AccessError(QueryError):
4545
"""
4646

4747

48-
class UnknownAttributeError(DataJointError):
49-
"""
50-
Error caused by referencing to a non-existing attributes
51-
"""
52-
53-
5448
class MissingTableError(DataJointError):
5549
"""
5650
Query on a table that has not been declared
@@ -69,6 +63,12 @@ class IntegrityError(QueryError):
6963
"""
7064

7165

66+
class UnknownAttributeError(QueryError):
67+
"""
68+
User requests an attribute name not found in query heading
69+
"""
70+
71+
7272
class MissingAttributeError(QueryError):
7373
"""
7474
An error arising when a required attribute value is not provided in INSERT

datajoint/external.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from pathlib import Path, PurePosixPath
1+
from pathlib import Path, PurePosixPath, PureWindowsPath
22
from collections import Mapping
33
from tqdm import tqdm
44
from .settings import config
@@ -74,7 +74,20 @@ def s3(self):
7474

7575
def _make_external_filepath(self, relative_filepath):
7676
"""resolve the complete external path based on the relative path"""
77-
return PurePosixPath(Path(self.spec['location']), relative_filepath)
77+
# Strip root
78+
if self.spec['protocol'] == 's3':
79+
posix_path = PurePosixPath(PureWindowsPath(self.spec['location']))
80+
location_path = Path(
81+
*posix_path.parts[1:]) if len(
82+
self.spec['location']) > 0 and any(
83+
case in posix_path.parts[0] for case in (
84+
'\\', ':')) else Path(posix_path)
85+
return PurePosixPath(location_path, relative_filepath)
86+
# Preserve root
87+
elif self.spec['protocol'] == 'file':
88+
return PurePosixPath(Path(self.spec['location']), relative_filepath)
89+
else:
90+
assert False
7891

7992
def _make_uuid_path(self, uuid, suffix=''):
8093
"""create external path based on the uuid hash"""
@@ -251,7 +264,7 @@ def download_filepath(self, filepath_hash):
251264
checksum = uuid_from_file(local_filepath)
252265
if checksum != contents_hash: # this should never happen without outside interference
253266
raise DataJointError("'{file}' downloaded but did not pass checksum'".format(file=local_filepath))
254-
return local_filepath, contents_hash
267+
return str(local_filepath), contents_hash
255268

256269
# --- UTILITIES ---
257270

datajoint/fetch.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ def _get(connection, attr, data, squeeze, download_path):
5050
adapt = attr.adapter.get if attr.adapter else lambda x: x
5151

5252
if attr.is_filepath:
53-
return str(adapt(extern.download_filepath(uuid.UUID(bytes=data))[0]))
53+
return adapt(extern.download_filepath(uuid.UUID(bytes=data))[0])
5454

5555
if attr.is_attachment:
5656
# Steps:
@@ -65,22 +65,22 @@ def _get(connection, attr, data, squeeze, download_path):
6565
if local_filepath.is_file():
6666
attachment_checksum = _uuid if attr.is_external else hash.uuid_from_buffer(data)
6767
if attachment_checksum == hash.uuid_from_file(local_filepath, init_string=attachment_name + '\0'):
68-
return str(adapt(local_filepath)) # checksum passed, no need to download again
68+
return adapt(str(local_filepath)) # checksum passed, no need to download again
6969
# generate the next available alias filename
7070
for n in itertools.count():
7171
f = local_filepath.parent / (local_filepath.stem + '_%04x' % n + local_filepath.suffix)
7272
if not f.is_file():
7373
local_filepath = f
7474
break
7575
if attachment_checksum == hash.uuid_from_file(f, init_string=attachment_name + '\0'):
76-
return str(adapt(f)) # checksum passed, no need to download again
76+
return adapt(str(f)) # checksum passed, no need to download again
7777
# Save attachment
7878
if attr.is_external:
7979
extern.download_attachment(_uuid, attachment_name, local_filepath)
8080
else:
8181
# write from buffer
8282
safe_write(local_filepath, data.split(b"\0", 1)[1])
83-
return str(adapt(local_filepath)) # download file from remote store
83+
return adapt(str(local_filepath)) # download file from remote store
8484

8585
return adapt(uuid.UUID(bytes=data) if attr.uuid else (
8686
blob.unpack(extern.get(uuid.UUID(bytes=data)) if attr.is_external else data, squeeze=squeeze)

0 commit comments

Comments
 (0)