correctly load in 2D char arrays#62
Conversation
There was a problem hiding this comment.
Hello @sjvenditto, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini or gemini-code-assist here, providing a summary of this pull request. This PR addresses Issue #61 by implementing a fix for loading 2D character arrays from MATLAB v7.3 .mat files. Previously, these arrays were incorrectly flattened into a single string. The changes modify the mat73.core.convert_mat function to detect 2D character arrays and load them as a list of strings, preserving the row structure as seen in MATLAB. A new MATLAB script is added to generate a test file containing a 2D character array, and a corresponding Python test case is included to verify the correct loading behavior.
Highlights
- Bug Fix: Correctly loads 2D character arrays from MATLAB .mat files, resolving Issue #61.
- Data Structure: 2D character arrays are now loaded as a list of strings, where each string represents a row from the original MATLAB array.
- Testing: Adds a new MATLAB script (
create_mat.m) to generate a test file with a 2D character array and a new Python test case (test_mat73.py) to validate the fix.
Changelog
- mat73/core.py
- Modified the
convert_matfunction to check for 2D character arrays (dataset.shape[1] > 1). - Implemented logic to extract rows from the flattened array based on the original shape and MATLAB's column-major order.
- Returns a list of strings for 2D character arrays, falling back to a single string for 1D arrays.
- Modified the
- tests/create_mat.m
- Added a new section to create
testfile16.matcontaining a sample 2D character array (char_arr_2d).
- Added a new section to create
- tests/test_mat73.py
- Updated the
setUpmethod to includetestfile16.matin the list of test files. - Added a new test method
test_file16_2d_char_arrayto loadtestfile16.matand assert the correct loading of the 2D character array as a list of strings.
- Updated the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
MATLAB data flows,
Python reads the rows it knows,
Strings in a list.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces a fix for loading 2D char arrays from MATLAB files, ensuring they are represented as a list of strings in Python. The addition of a dedicated test case (testfile16.mat and corresponding Python test) is excellent for verifying this functionality.
However, there's a potential IndexError in mat73/core.py where accessing dataset.shape[1] can lead to an IndexError if the dataset is a 1D char array. This needs to be fixed to ensure robust handling of all char array types.
Once the critical issue is addressed, this PR will be a good improvement to the library.
Summary of Findings
- Potential IndexError for 1D char arrays: The code in
mat73/core.pyaccessesdataset.shape[1]without first checking ifdatasetis 1D. This will cause anIndexErrorif a 1D char array is processed, as 1D arrays do not have ashape[1].
Merge Readiness
The pull request addresses an important issue with loading 2D char arrays and includes a good test case. However, due to the potential IndexError bug identified in mat73/core.py when handling 1D char arrays, I recommend that these changes not be merged until this issue is resolved. My code suggestion aims to fix this. I am unable to approve pull requests myself; please ensure the fix is reviewed and approved by a maintainer before merging.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
mat73/core.py
Outdated
| elif mtype=='char': | ||
| string_array = np.ravel(dataset) | ||
| string_array = ''.join([chr(x) for x in string_array]) | ||
| if dataset.ndim == 2 and dataset.shape[1] > 1: |
There was a problem hiding this comment.
Not too familiar with MATLAB, but can a MATLAB char array every be ndim>2?
There was a problem hiding this comment.
Yes, they're rare since all strings have to be the same length, but there's a dataset I've been working with where someone saved their data in this format by padding the lines with spaces (it was the char array that I put in the issue). I loaded it in Matlab to verify how it was supposed to be formatted.
Also, from the same file, I noticed that even 1D char arrays were parsed with 2 dimensions with shape (n, 1). Don't know how true this is in general, which is why I accepted the change for checking ndim first
|
I've tried the MATLAB file you provided, but with my MATLAB R2024b the array is somehow not 2d, but saved as a 1D with edit: changing |
|
@sjvenditto I've pushed my changes to the repo - could you run the Check if the code runs on your machine thanks for the PR and raising this issue! |
|
Oh no, StringDType is a) using different methods for string conversion edit: fixed it! |
|
thanks for the addition! I've merged it to main |
* small change to correctly load in 2D char arrays * python 3.7 has been removed from runners see actions/runner-images#10893 * Update mat73/core.py * Update test_mat73.py * Update create_mat.m * Update core.py * fix string array reading * remove dependency on StringDType numpy>=2 --------- Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sorry about this! I actually didn't run the tests on the file generated in |




This is addressing Issue #61. I've made a small fix when loading in char arrays to handle 2D shapes. Currently, I have it return a list of strings that matches the original format in Matlab. I also added a short test for this, which loads in and checks the char array that I noted in the issue.
Let me know if there's anything else that should be fixed or changed with how I implemented this!