Skip to content

Make sure string column with vec size 1 work#427

Open
esheldon wants to merge 13 commits intomasterfrom
stringarr1
Open

Make sure string column with vec size 1 work#427
esheldon wants to merge 13 commits intomasterfrom
stringarr1

Conversation

@esheldon
Copy link
Copy Markdown
Owner

closes #426

This fixes a bug where the shape of the column was not correct.

Adds tests that string and number vec1 column data get returned, even though read as scalar

@esheldon esheldon requested review from beckermr and erykoff May 14, 2025 18:13
Copy link
Copy Markdown
Collaborator

@erykoff erykoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Copy Markdown
Collaborator

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks fine but I think I am missing some details on what the special case is exactly. I've asked for a few comments in places to help future us understand this edge case.

Copy link
Copy Markdown
Collaborator

@beckermr beckermr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed that this is a breaking change wrt the behavior of returning dims of (1,) versus a scalar. We should figure that bit out.

See also #426 (comment)

@esheldon
Copy link
Copy Markdown
Owner Author

esheldon commented May 14, 2025

It is a breaking change, in that those columns will now be properly round tripped whereas they were not before.

On the other hand, I consider this a bug in numpy that I couldn't get this to work in 1.X. Seen that way it is breaking in the sense of correcting behavior.

@beckermr
Copy link
Copy Markdown
Collaborator

Is there a way we can make it work for non-strings?

@esheldon
Copy link
Copy Markdown
Owner Author

There is no way to set the TDIM for length-1 vector columns, they must be length 2 or more.

I can see hacks working. We could add a header keyword that tells us the situation for a given column so fitsio can reshape the data.

@beckermr
Copy link
Copy Markdown
Collaborator

My feeling on this specific item is that this specific behavior of FITSIO is so ingrained in people's codes + minds that if we break it, we're going to cause silent or at least very weird bugs. I get that it can be considered a bug, but long-standing bugs that people code around effectively become part of the API.

@beckermr
Copy link
Copy Markdown
Collaborator

beckermr commented May 14, 2025

I can see hacks working. We could add a header keyword that tells us the situation for a given column so fitsio can reshape the data.

Right. My feeling is that we

  1. Keep the current behavior before this PR where columns of shape (1,) are cast to scalars for all types.

  2. If we do want to fix this, we fix it for all types AND we bump fitsio to major version 2.

This seems to me to be the only way to do this without causing lots of bad side effects.

@beckermr
Copy link
Copy Markdown
Collaborator

We could include some of the other big ideas we've discussed in version 2 as well, including the new I/O modes+features.

@esheldon
Copy link
Copy Markdown
Owner Author

I don't actually know how to workaround the new numpy behavior, will have to figure it out.

@beckermr
Copy link
Copy Markdown
Collaborator

Can't we just change the output dtype for strings if we find TDIM has 2 elements?

@esheldon
Copy link
Copy Markdown
Owner Author

There is code reuse for read and write. It seems like different behavior might be needed for read vs write in this case, but need to look into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Crash when trying to write a table with a string array of length 1

3 participants