Skip to content

correctly load 2D char arrays (#62)#64

Merged
skjerns merged 1 commit intomainfrom
master
May 26, 2025
Merged

correctly load 2D char arrays (#62)#64
skjerns merged 1 commit intomainfrom
master

Conversation

@skjerns
Copy link
Owner

@skjerns skjerns commented May 26, 2025

  • small change to correctly load in 2D char arrays

  • python 3.7 has been removed from runners

see actions/runner-images#10893

  • Update mat73/core.py

  • Update test_mat73.py

  • Update create_mat.m

  • Update core.py

  • fix string array reading

  • remove dependency on StringDType numpy>=2


* small change to correctly load in 2D char arrays

* python 3.7 has been removed from runners

see actions/runner-images#10893

* Update mat73/core.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update test_mat73.py

* Update create_mat.m

* Update core.py

* fix string array reading

* remove dependency on StringDType numpy>=2

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Simon Kern <14980558+skjerns@users.noreply.github.com>
@skjerns skjerns merged commit f425119 into main May 26, 2025
12 checks passed
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @skjerns, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini here, providing a summary of this pull request. The primary goal of this PR is to address an issue with correctly loading multi-dimensional MATLAB character arrays (char type) when reading .mat files using the mat73 library. Previously, the loading logic for char arrays would flatten the data and join it into a single string, which is incorrect for 2D or higher-dimensional character arrays (which represent arrays of strings in MATLAB). This PR updates the loading mechanism to handle these multi-dimensional char arrays correctly, returning them as Python lists of strings. To support this, new test data (testfile16.mat) containing 1D, 2D, and 3D char arrays has been added, along with corresponding tests. Additionally, the PR includes updates to the CI configuration, removing Python 3.7 support due to upstream runner image changes and removing a dependency on a specific NumPy feature (StringDType) which was causing issues.

Highlights

  • Correct 2D/3D Char Array Loading: The core logic for loading MATLAB char arrays has been significantly revised to correctly interpret multi-dimensional char arrays as lists of strings, rather than flattening them into a single string.
  • New Test Data and Cases: A new MATLAB script (create_mat.m) and test file (testfile16.mat) have been added to specifically test the loading of 1D, 2D, and 3D character arrays. A new Python test case (test_file16_2d_char_array) validates the updated loading logic.
  • CI and Dependency Updates: Python 3.7 has been removed from the CI runners due to changes in GitHub Actions runner images. A dependency on NumPy's StringDType (potentially related to NumPy >= 2) has also been removed to improve compatibility.

Changelog

Click here to see the changelog
  • mat73/core.py
    • Completely refactored the mtype=='char' handling in convert_mat to correctly process multi-dimensional character arrays into lists of strings.
    • Uses np.asarray with uint16, np.vectorize with chr and object dtype, and np.apply_along_axis to join characters into strings.
    • Includes logic to handle 1D, 2D, and >2D arrays, returning a list of strings.
    • Adds a warning for char arrays with ndim > 2 as their structure might be ambiguous.
    • Updated the if __name__ == '__main__': block to test testfile16.mat.
  • tests/create_mat.m
    • Added a new section to create testfile16.mat.
    • Defines and saves char_arr_1d (1D), char_arr_2d (2D), and char_arr_3d (3D) character arrays using the -v7.3 format.
  • tests/test_mat73.py
    • Removed a trailing space in the import section.
    • Updated the setUp method loop range from 16 to 17 to include testfile16.mat when checking for test files.
    • Renamed the test method test_file10_nullchars to test_file8_nullchars.
    • Added a new test method test_file16_2d_char_array to specifically verify the correct loading of the 1D, 2D, and 3D char arrays from testfile16.mat.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


Char arrays, flat no more,
Dimensions now they store.
Strings in a list,
Correctly kissed,
Mat data, we adore.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@skjerns skjerns deleted the master branch May 26, 2025 13:45
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the loading of MATLAB character arrays, especially for 2D and N-D cases, by correctly interpreting their structure as stored in HDF5 format. The changes are well-reasoned, and the new test cases (testfile16.mat and its associated Python test) provide good coverage for 1D, 2D, and 3D character arrays, including non-ASCII characters.

The core logic in mat73/core.py for mtype=='char' now correctly determines the axis along which characters form strings, considering MATLAB's column-major order and HDF5's row-major storage (which involves dimension permutation). The use of np.vectorize(chr, otypes=[object]) is a good choice for robust character conversion, especially for handling null characters.

The warning added for character arrays with more than two dimensions is a prudent measure, acknowledging potential complexities with very high-dimensional char arrays.

Overall, these changes enhance the correctness and robustness of mat73 for a tricky data type. Great work on this!

Summary of Findings

  • Logging URL Format: In mat73/core.py, the URL in the logging warning for N-D char arrays (github.com/skjerns/mat7.3) could be more complete by including the scheme (e.g., https://github.com/skjerns/mat7.3). This is a minor stylistic point and was not commented on directly due to review settings focusing on medium or higher severity issues.
  • Test Naming Correction: A test method in tests/test_mat73.py was renamed from test_file10_nullchars to test_file8_nullchars, which aligns the test name with the file it actually uses (self.testfile8). This is a good cleanup.
  • Python 3.7 Runner Removal: The PR description mentions Python 3.7 removal from runners. This change is likely in CI configuration files not included in the provided patch and thus not reviewed here.

Merge Readiness

The code changes appear to be of high quality and address the problem of loading multi-dimensional character arrays effectively. The provided tests cover the main scenarios well. Based on the review of the provided code patches, I believe this pull request is in good shape for merging. As always, ensure all CI checks pass and consider if any further end-user documentation or examples might be beneficial for this enhanced char array handling. I am not authorized to approve pull requests, so please ensure it undergoes any further required review processes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments