-
Notifications
You must be signed in to change notification settings - Fork 93
Closed
Labels
bugIndicates an unexpected problem or unintended behaviorIndicates an unexpected problem or unintended behavior
Description
Bug Report
Description
Reading an empty cell array inserted with mym MATLAB fails to be read in Datajoint python
Reproducibility
- OS (WIN (MATLAB) & MACOS (Python)
- Python Version (3.7) & MATLAB Version (2019b)
- MySQL Version (10.2.33-MariaDB)
- DataJoint Version (0.13.7)
I have a corner case for reading some special. blobs in Datajoint Python when these are stored with mym Matlab:
Here is the type of blob stored in the DB and read on Matlab:
la = bdata('select protocol_data from bdata.sessions where sessid=889527');
la{1}.crash_comments
ans =
3×1 cell array
{0×0 double}
{0×0 double}
{0×0 double}
As you can see, what is stored in a part of the blob is a 3x1 cell array composed of empty items:
When trying to read this data in Python, I got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/var/folders/sg/5bw1t8p11nx09k7kmytnmfb40000gp/T/ipykernel_31219/4072205635.py in <module>
3 session_key = {'sessid': 889527}
4 # session_key = {'sessid': 889664}
----> 5 session_data = (bdata.Sessions & session_key).fetch('protocol_data', as_dict=True)
6 parsed_events = (bdata.ParsedEvents & session_key).fetch(as_dict=True)
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/fetch.py in __call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
234 squeeze=squeeze,
235 download_path=download_path,
--> 236 format="array",
237 )
238 if attrs_as_dict:
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/fetch.py in __call__(self, offset, limit, order_by, format, as_dict, squeeze, download_path, *attrs)
287 for name in heading:
288 # unpack blobs and externals
--> 289 ret[name] = list(map(partial(get, heading[name]), ret[name]))
290 if format == "frame":
291 ret = pandas.DataFrame(ret).set_index(heading.primary_key)
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/fetch.py in _get(connection, attr, data, squeeze, download_path)
112 squeeze=squeeze,
113 )
--> 114 if attr.is_blob
115 else data
116 )
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in unpack(blob, squeeze)
619 return blob
620 if blob is not None:
--> 621 return Blob(squeeze=squeeze).unpack(blob)
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in unpack(self, blob)
127 blob_format = self.read_zero_terminated_string()
128 if blob_format in ("mYm", "dj0"):
--> 129 return self.read_blob(n_bytes=len(self._blob) - self._pos)
130
131 def read_blob(self, n_bytes=None):
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in read_blob(self, n_bytes)
161 % data_structure_code
162 )
--> 163 v = call()
164 if n_bytes is not None and self._pos - start != n_bytes:
165 raise DataJointError("Blob length check failed! Invalid blob")
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in read_struct(self)
463 self.read_blob(n_bytes=int(self.read_value())) for _ in range(n_fields)
464 )
--> 465 for __ in range(n_elem)
466 ]
467
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in <listcomp>(.0)
463 self.read_blob(n_bytes=int(self.read_value())) for _ in range(n_fields)
464 )
--> 465 for __ in range(n_elem)
466 ]
467
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in <genexpr>(.0)
461 raw_data = [
462 tuple(
--> 463 self.read_blob(n_bytes=int(self.read_value())) for _ in range(n_fields)
464 )
465 for __ in range(n_elem)
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in read_blob(self, n_bytes)
161 % data_structure_code
162 )
--> 163 v = call()
164 if n_bytes is not None and self._pos - start != n_bytes:
165 raise DataJointError("Blob length check failed! Invalid blob")
~/opt/anaconda3/envs/bl_pipeline_python_env/lib/python3.7/site-packages/datajoint/blob.py in read_cell_array(self)
508 return (
509 self.squeeze(
--> 510 np.array(result).reshape(shape, order="F"), convert_to_scalar=False
511 )
512 ).view(MatCell)
ValueError: cannot reshape array of size 0 into shape (3,1)
I have “patched” the blob.py code read_cell_array function with:
if result.size == 0:
return (
self.squeeze(
np.array(np.empty(shape, dtype=type(result[0]))), convert_to_scalar=False
)
).view(MatCell)
else:
return (
self.squeeze(
np.array(result).reshape(shape, order="F"), convert_to_scalar=False
)
).view(MatCell)
Just to add the case that the size of the array is zero (numpy array size is 0 if it’s filled with empty arrays)
Probably not the cleanest way to do it.
Expected Behavior
To get something similar to this when reading this kind of blobs:
session_data['crash_comments']
MatCell([[None],
[None],
[None]], dtype=object)
Metadata
Metadata
Assignees
Labels
bugIndicates an unexpected problem or unintended behaviorIndicates an unexpected problem or unintended behavior