Skip to content

correctly load in 2D char arrays#62

Merged
skjerns merged 8 commits intoskjerns:masterfrom
sjvenditto:master
May 26, 2025
Merged

correctly load in 2D char arrays#62
skjerns merged 8 commits intoskjerns:masterfrom
sjvenditto:master

Conversation

@sjvenditto
Copy link
Contributor

This is addressing Issue #61. I've made a small fix when loading in char arrays to handle 2D shapes. Currently, I have it return a list of strings that matches the original format in Matlab. I also added a short test for this, which loads in and checks the char array that I noted in the issue.

Let me know if there's anything else that should be fixed or changed with how I implemented this!

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @sjvenditto, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello! Gemini or gemini-code-assist here, providing a summary of this pull request. This PR addresses Issue #61 by implementing a fix for loading 2D character arrays from MATLAB v7.3 .mat files. Previously, these arrays were incorrectly flattened into a single string. The changes modify the mat73.core.convert_mat function to detect 2D character arrays and load them as a list of strings, preserving the row structure as seen in MATLAB. A new MATLAB script is added to generate a test file containing a 2D character array, and a corresponding Python test case is included to verify the correct loading behavior.

Highlights

  • Bug Fix: Correctly loads 2D character arrays from MATLAB .mat files, resolving Issue #61.
  • Data Structure: 2D character arrays are now loaded as a list of strings, where each string represents a row from the original MATLAB array.
  • Testing: Adds a new MATLAB script (create_mat.m) to generate a test file with a 2D character array and a new Python test case (test_mat73.py) to validate the fix.

Changelog

  • mat73/core.py
    • Modified the convert_mat function to check for 2D character arrays (dataset.shape[1] > 1).
    • Implemented logic to extract rows from the flattened array based on the original shape and MATLAB's column-major order.
    • Returns a list of strings for 2D character arrays, falling back to a single string for 1D arrays.
  • tests/create_mat.m
    • Added a new section to create testfile16.mat containing a sample 2D character array (char_arr_2d).
  • tests/test_mat73.py
    • Updated the setUp method to include testfile16.mat in the list of test files.
    • Added a new test method test_file16_2d_char_array to load testfile16.mat and assert the correct loading of the 2D character array as a list of strings.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


MATLAB data flows,
Python reads the rows it knows,
Strings in a list.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a fix for loading 2D char arrays from MATLAB files, ensuring they are represented as a list of strings in Python. The addition of a dedicated test case (testfile16.mat and corresponding Python test) is excellent for verifying this functionality.

However, there's a potential IndexError in mat73/core.py where accessing dataset.shape[1] can lead to an IndexError if the dataset is a 1D char array. This needs to be fixed to ensure robust handling of all char array types.

Once the critical issue is addressed, this PR will be a good improvement to the library.

Summary of Findings

  • Potential IndexError for 1D char arrays: The code in mat73/core.py accesses dataset.shape[1] without first checking if dataset is 1D. This will cause an IndexError if a 1D char array is processed, as 1D arrays do not have a shape[1].

Merge Readiness

The pull request addresses an important issue with loading 2D char arrays and includes a good test case. However, due to the potential IndexError bug identified in mat73/core.py when handling 1D char arrays, I recommend that these changes not be merged until this issue is resolved. My code suggestion aims to fix this. I am unable to approve pull requests myself; please ensure the fix is reviewed and approved by a maintainer before merging.

sjvenditto and others added 2 commits May 21, 2025 14:14
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
mat73/core.py Outdated
elif mtype=='char':
string_array = np.ravel(dataset)
string_array = ''.join([chr(x) for x in string_array])
if dataset.ndim == 2 and dataset.shape[1] > 1:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with MATLAB, but can a MATLAB char array every be ndim>2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they're rare since all strings have to be the same length, but there's a dataset I've been working with where someone saved their data in this format by padding the lines with spaces (it was the char array that I put in the issue). I loaded it in Matlab to verify how it was supposed to be formatted.

Also, from the same file, I noticed that even 1D char arrays were parsed with 2 dimensions with shape (n, 1). Don't know how true this is in general, which is why I accepted the change for checking ndim first

@skjerns
Copy link
Owner

skjerns commented May 23, 2025

I've tried the MATLAB file you provided, but with my MATLAB R2024b the array is somehow not 2d, but saved as a 1D with shape=(342, 1), very strange! However, creating a 3d char array was possible for me. Any idea what might the the problem here?

image
image

edit: changing , to ; fixed it :) I'll push to your branch soon

@skjerns
Copy link
Owner

skjerns commented May 23, 2025

@sjvenditto I've pushed my changes to the repo - could you run the create_mat.m on your repo and create the new testfile16.mat? I somehow are not allowed to push it to your repo.

Check if the code runs on your machine

thanks for the PR and raising this issue!

@skjerns
Copy link
Owner

skjerns commented May 25, 2025

Oh no, StringDType is numpy>=2.0 only - need to think how to solve this by

a) using different methods for string conversion
b) using a workaround for \x00 and continue use of np.str_
c) making numpy>=2 a requirement

edit: fixed it!

@skjerns skjerns merged commit d686b80 into skjerns:master May 26, 2025
12 checks passed
@skjerns
Copy link
Owner

skjerns commented May 26, 2025

thanks for the addition! I've merged it to main

skjerns added a commit that referenced this pull request May 26, 2025
* small change to correctly load in 2D char arrays

* python 3.7 has been removed from runners

see actions/runner-images#10893

* Update mat73/core.py



* Update test_mat73.py

* Update create_mat.m

* Update core.py

* fix string array reading

* remove dependency on StringDType numpy>=2

---------

Co-authored-by: Sarah Jo Venditto <sjvenditto@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@sjvenditto
Copy link
Contributor Author

I've tried the MATLAB file you provided, but with my MATLAB R2024b the array is somehow not 2d, but saved as a 1D with shape=(342, 1), very strange! However, creating a 3d char array was possible for me. Any idea what might the the problem here?

image image

edit: changing , to ; fixed it :) I'll push to your branch soon

Sorry about this! I actually didn't run the tests on the file generated in create_mat.m, I ran it on a file I saved directly in MATLAB and didn't check whether create_mat.m made the same file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments