Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions mat73/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import h5py
import logging
from typing import Iterable
from datetime import datetime, timezone, timedelta # Added for datetime handling
from mat73.version import __version__

logger = logging.getLogger('mat73')
Expand Down Expand Up @@ -303,6 +304,88 @@ def convert_mat(self, dataset, depth, MATLAB_class=None):
elif mtype=='canonical empty':
return None

elif mtype == 'datetime': # Added datetime handling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new elif mtype == 'datetime': block introduces significant logic for handling MATLAB datetime objects. To improve maintainability and understanding, could you add a more detailed comment at the beginning of this block (or update the convert_mat method's docstring)?

This comment could explain:

  • The expected HDF5 structure for MATLAB datetime objects this code is designed to parse.
  • The meaning of dataset.attrs.get('MATLAB_object_decode') == 3.
  • The expected 6 components in the last dimension of the dataset (e.g., original_shape[-1] == 6 on line 314).
  • How ref_entry[4] (line 345) is used to find the reference index.
  • The scheme for target_ref_char (e.g., chr(ord('c') + data_ref_idx - 1) on line 352) to look up timestamps in self.refs.
  • The format of the timestamp data found in self.refs (e.g., milliseconds since epoch as a double, as implied by line 364).

Providing this context directly in the code would be very helpful.

if dataset.attrs.get('MATLAB_object_decode') != 3:
if self.verbose:
logger.warning(f"Dataset {dataset.name} is 'datetime' but lacks MATLAB_object_decode=3.")
return None

original_shape = dataset.shape
if not (dataset.ndim >= 1 and original_shape[-1] == 6):
if self.verbose:
logger.warning(f"Datetime dataset {dataset.name} has unexpected shape {original_shape}.")
return None

try:
ref_data_np = np.asarray(dataset, dtype=np.uint32)
except Exception as e:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Catching a generic Exception (here and on line 365) can sometimes make debugging harder by obscuring the specific type of error that occurred.

Would it be feasible to catch more specific exceptions that you anticipate might be raised by np.asarray(dataset, ...) or by accessing timestamp_dataset (e.g., TypeError, ValueError from NumPy/HDF5, or IndexError, KeyError)?

If a general Exception is indeed necessary due to the wide variety of potential underlying library errors, a brief comment explaining this choice would be beneficial for future maintainers.

if self.verbose:
logger.error(f"Could not convert HDF5 dataset {dataset.name} to NumPy array for datetime: {e}")
return None

if len(original_shape) == 1:
dt_array_shape = ()
num_datetimes = 1
ref_data_np_flat = ref_data_np.reshape(1, 6)
else:
dt_array_shape = original_shape[:-1]
num_datetimes = np.prod(dt_array_shape).astype(int)
ref_data_np_flat = ref_data_np.reshape(num_datetimes, 6)

if num_datetimes == 0:
return np.array([], dtype=object)

py_datetimes_flat = []
epoch_utc = datetime(1970, 1, 1, tzinfo=timezone.utc)
ref_start_char_code = ord('c')

for i in range(num_datetimes):
ref_entry = ref_data_np_flat[i]
try:
data_ref_idx = int(ref_entry[4])
except (IndexError, ValueError):
if self.verbose:
logger.warning(f"Invalid data_ref_idx for datetime {dataset.name} element {i}.")
py_datetimes_flat.append(None)
continue

target_ref_char = chr(ref_start_char_code + data_ref_idx - 1)

if self.refs is None or target_ref_char not in self.refs:
if self.verbose:
logger.warning(f"Missing #refs#/{target_ref_char} for datetime {dataset.name} element {i}.")
py_datetimes_flat.append(None)
continue

timestamp_dataset = self.refs[target_ref_char]

try:
# MATLAB stores these as (1,1) double arrays in #refs#
milliseconds_since_epoch = float(timestamp_dataset[0,0])
except Exception as e:
if self.verbose:
logger.error(f"Error reading timestamp from #refs#/{target_ref_char} for {dataset.name} element {i}: {e}")
py_datetimes_flat.append(None)
continue

seconds_since_epoch = milliseconds_since_epoch / 1000.0
try:
dt_object_utc = epoch_utc + timedelta(seconds=seconds_since_epoch)
py_datetimes_flat.append(dt_object_utc)
except OverflowError:
if self.verbose:
logger.warning(f"Overflow converting timestamp for {dataset.name} element {i}.")
py_datetimes_flat.append(None)

if not py_datetimes_flat and num_datetimes > 0:
return None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This condition if not py_datetimes_flat and num_datetimes > 0: appears to be unreachable.

The loop starting on line 342 iterates num_datetimes times. In each iteration that doesn't continue, py_datetimes_flat.append(...) is called (either with a datetime object or None). Therefore, if num_datetimes > 0, py_datetimes_flat should always contain num_datetimes elements, and not py_datetimes_flat would be False.

This would make the entire if condition on line 380 always false, rendering these lines (380-381) effectively dead code. Could you verify this and remove these lines if they are indeed unreachable?


if not dt_array_shape: # Scalar case
return py_datetimes_flat[0] if py_datetimes_flat else None

dt_numpy_array = np.array(py_datetimes_flat, dtype=object).reshape(dt_array_shape)
return squeeze(dt_numpy_array)

# complex numbers need to be filtered out separately
elif 'imag' in str(dataset.dtype):
if dataset.attrs['MATLAB_class']==b'single':
Expand Down