Conversation
* small change to correctly load in 2D char arrays * python 3.7 has been removed from runners see actions/runner-images#10893 * Update mat73/core.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update test_mat73.py * Update create_mat.m * Update core.py * fix string array reading * remove dependency on StringDType numpy>=2 --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Simon Kern <14980558+skjerns@users.noreply.github.com>
There was a problem hiding this comment.
Hello @skjerns, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request. The primary goal of this PR is to address an issue with correctly loading multi-dimensional MATLAB character arrays (char type) when reading .mat files using the mat73 library. Previously, the loading logic for char arrays would flatten the data and join it into a single string, which is incorrect for 2D or higher-dimensional character arrays (which represent arrays of strings in MATLAB). This PR updates the loading mechanism to handle these multi-dimensional char arrays correctly, returning them as Python lists of strings. To support this, new test data (testfile16.mat) containing 1D, 2D, and 3D char arrays has been added, along with corresponding tests. Additionally, the PR includes updates to the CI configuration, removing Python 3.7 support due to upstream runner image changes and removing a dependency on a specific NumPy feature (StringDType) which was causing issues.
Highlights
- Correct 2D/3D Char Array Loading: The core logic for loading MATLAB
chararrays has been significantly revised to correctly interpret multi-dimensional char arrays as lists of strings, rather than flattening them into a single string. - New Test Data and Cases: A new MATLAB script (
create_mat.m) and test file (testfile16.mat) have been added to specifically test the loading of 1D, 2D, and 3D character arrays. A new Python test case (test_file16_2d_char_array) validates the updated loading logic. - CI and Dependency Updates: Python 3.7 has been removed from the CI runners due to changes in GitHub Actions runner images. A dependency on NumPy's
StringDType(potentially related to NumPy >= 2) has also been removed to improve compatibility.
Changelog
Click here to see the changelog
- mat73/core.py
- Completely refactored the
mtype=='char'handling inconvert_matto correctly process multi-dimensional character arrays into lists of strings. - Uses
np.asarraywithuint16,np.vectorizewithchrandobjectdtype, andnp.apply_along_axisto join characters into strings. - Includes logic to handle 1D, 2D, and >2D arrays, returning a list of strings.
- Adds a warning for char arrays with
ndim > 2as their structure might be ambiguous. - Updated the
if __name__ == '__main__':block to testtestfile16.mat.
- Completely refactored the
- tests/create_mat.m
- Added a new section to create
testfile16.mat. - Defines and saves
char_arr_1d(1D),char_arr_2d(2D), andchar_arr_3d(3D) character arrays using the-v7.3format.
- Added a new section to create
- tests/test_mat73.py
- Removed a trailing space in the import section.
- Updated the
setUpmethod loop range from16to17to includetestfile16.matwhen checking for test files. - Renamed the test method
test_file10_nullcharstotest_file8_nullchars. - Added a new test method
test_file16_2d_char_arrayto specifically verify the correct loading of the 1D, 2D, and 3D char arrays fromtestfile16.mat.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Char arrays, flat no more,
Dimensions now they store.
Strings in a list,
Correctly kissed,
Mat data, we adore.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request significantly improves the loading of MATLAB character arrays, especially for 2D and N-D cases, by correctly interpreting their structure as stored in HDF5 format. The changes are well-reasoned, and the new test cases (testfile16.mat and its associated Python test) provide good coverage for 1D, 2D, and 3D character arrays, including non-ASCII characters.
The core logic in mat73/core.py for mtype=='char' now correctly determines the axis along which characters form strings, considering MATLAB's column-major order and HDF5's row-major storage (which involves dimension permutation). The use of np.vectorize(chr, otypes=[object]) is a good choice for robust character conversion, especially for handling null characters.
The warning added for character arrays with more than two dimensions is a prudent measure, acknowledging potential complexities with very high-dimensional char arrays.
Overall, these changes enhance the correctness and robustness of mat73 for a tricky data type. Great work on this!
Summary of Findings
- Logging URL Format: In
mat73/core.py, the URL in the logging warning for N-D char arrays (github.com/skjerns/mat7.3) could be more complete by including the scheme (e.g.,https://github.com/skjerns/mat7.3). This is a minor stylistic point and was not commented on directly due to review settings focusing on medium or higher severity issues. - Test Naming Correction: A test method in
tests/test_mat73.pywas renamed fromtest_file10_nullcharstotest_file8_nullchars, which aligns the test name with the file it actually uses (self.testfile8). This is a good cleanup. - Python 3.7 Runner Removal: The PR description mentions Python 3.7 removal from runners. This change is likely in CI configuration files not included in the provided patch and thus not reviewed here.
Merge Readiness
The code changes appear to be of high quality and address the problem of loading multi-dimensional character arrays effectively. The provided tests cover the main scenarios well. Based on the review of the provided code patches, I believe this pull request is in good shape for merging. As always, ensure all CI checks pass and consider if any further end-user documentation or examples might be beneficial for this enhanced char array handling. I am not authorized to approve pull requests, so please ensure it undergoes any further required review processes.
small change to correctly load in 2D char arrays
python 3.7 has been removed from runners
see actions/runner-images#10893
Update mat73/core.py
Update test_mat73.py
Update create_mat.m
Update core.py
fix string array reading
remove dependency on StringDType numpy>=2