Skip to content

Conversation

Alvaro-Kothe
Copy link
Contributor

@Alvaro-Kothe Alvaro-Kothe commented Aug 29, 2025

@swt2c
Copy link
Contributor

swt2c commented Aug 29, 2025

Thanks @Alvaro-Kothe, I can confirm this resolves the memory leak that I am seeing. Also, your solution makes sense.

👍

@mroeschke mroeschke requested a review from WillAyd August 29, 2025 19:02
@Alvaro-Kothe Alvaro-Kothe marked this pull request as ready for review August 29, 2025 19:04
Copy link
Contributor

@swt2c swt2c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might also need to add a GET_TC(tc)->freeCStr = 1; in Object_getBigNumStringValue

@Alvaro-Kothe
Copy link
Contributor Author

Thanks for pointing that out. Just fixed it.

@@ -107,6 +107,7 @@ typedef struct __TypeContext {
JSINT64 longValue;

const char *cStr;
int freeCStr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally the problem with the JSON code is that over time we have incrementally added many different state management operations, and it is not always clear how they should work in tandem. So I am a little hesitant to add more state to solve the current issue. Is it possible to solve the issue without changing this struct?

Copy link
Contributor

@swt2c swt2c Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions on how to do that? Revert the changes in fb6c4e3 that make some of the uses of cStr stack allocated? It doesn't seem possible to make all uses of cStr stack allocated, at least at first glance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure that might be a reasonable approach

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the freeCStr flag and stored all the strings assigned in *IterNext in dynamic memory.

Revert "fix: flag to free `cStr` in `Object_getBigNumStringValue`"

Revert "fix: memory leak in JSON datetime serialization"

fix: fix memory leak in `PyDateTimeToIsoCallback`

fix: ensure successful malloc.
@@ -1880,6 +1905,7 @@ static void Object_endTypeContext(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
GET_TC(tc)->rowLabels = NULL;
NpyArr_freeLabels(GET_TC(tc)->columnLabels, GET_TC(tc)->columnLabelsLen);
GET_TC(tc)->columnLabels = NULL;
PyObject_Free((void *)GET_TC(tc)->cStr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too sure about this approach - generally the CPython API only wants you to call this function when you've allocated with the corresponding function(s)

https://docs.python.org/3/c-api/memory.html#c.PyObject_Free

Frees the memory block pointed to by p, which must have been returned by a previous call to PyObject_Malloc(), PyObject_Realloc() or PyObject_Calloc(). Otherwise, or if PyObject_Free(p) has been called before, undefined behavior occurs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I misread your PR. I see you allocate 20 bytes to cStr, so this is clean in that regards. However, I think you want to use the PyMem_ functions since we are dealing with raw C constructs and not Python objects, so maybe replace this with PyMem_Free, any PyObject_Malloc calls with PyMem_Malloc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something, PyObject_Free() is being called on memory that's been allocated with PyObject_Malloc()...

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any object types where it is possible to go through the destructor without initializing the cStr member? Looking at say the dict iteration code it seems like that may be happening?

Its a bit confusing here that the type context begin functions have an implicit requirement to allocate the cStr member. I wonder if it wouldn't be cleaner to allocate that up front, and leave it to individual functions to free or replace memory as required

@@ -1007,16 +1008,24 @@ static const char *List_iterGetName(JSOBJ Py_UNUSED(obj),
//=============================================================================
static void Index_iterBegin(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
GET_TC(tc)->index = 0;
GET_TC(tc)->cStr = PyObject_Malloc(20);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its probably better to create a static const variable for a size within the module, rather than repeating the hard-coded 20 multiple times

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created the variable CSTR_SIZE

if (index == 0) {
GET_TC(tc)->cStr = "name";
strcpy((char *)GET_TC(tc)->cStr, "name");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if you want to use strcpy here you will need to drop the const qualifier on the struct member; casting away const-ness like this is going to trigger undefined behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the const qualifier.

@@ -1880,6 +1905,7 @@ static void Object_endTypeContext(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
GET_TC(tc)->rowLabels = NULL;
NpyArr_freeLabels(GET_TC(tc)->columnLabels, GET_TC(tc)->columnLabelsLen);
GET_TC(tc)->columnLabels = NULL;
PyObject_Free((void *)GET_TC(tc)->cStr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cast to a void pointer here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was because of the const qualifier in cStr, but since I removed this qualifier, I am no longer casting.

@@ -49,7 +49,7 @@ char *int64ToIso(int64_t value, NPY_DATETIMEUNIT valueUnit,
pandas_datetime_to_datetimestruct(value, valueUnit, &dts);

*len = (size_t)get_datetime_iso_8601_strlen(0, base);
char *result = PyObject_Malloc(*len);
char *result = PyMem_Malloc(*len);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the focus of the PR is to fix the JSON modules, let's leave the others untouched for now. These may be viable fixes, but they can always be done in a follow up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind - I guess you are doing this because the JSON module makes use of this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exacly, I made this change because of the assignment of cStr in NpyDateTimeToIsoCallback.

I also made changes in other functions to allocate with PyMem_Malloc in int64ToIsoDuration and PyDateTimeToIso, but I think that the CI errors are because of this changes. Is it correct to use PyMem_Malloc in them or should I change back to PyObject_Malloc?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange - I don't think PyObject_Malloc is correct. Upstream, it looks like they have made a similar change (see python/cpython#114569)

However, if its a blocker for now sure lets stick with the PyObject_ functions; can figure out the rest later

@WillAyd
Copy link
Member

WillAyd commented Sep 2, 2025

@Alvaro-Kothe the address and undefined behavior violations can be pretty tricky to debug. If you aren't aware, you can turn on sanitizers locally to better understand what is happening (assuming your compiler supports them):

python -m pip install -e . -Csetup-args="-Db_sanitize=address,undefined" -Csetupargs="-Dbuildtype=debug"

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool nice work

@@ -1880,6 +1907,7 @@ static void Object_endTypeContext(JSOBJ Py_UNUSED(obj), JSONTypeContext *tc) {
GET_TC(tc)->rowLabels = NULL;
NpyArr_freeLabels(GET_TC(tc)->columnLabels, GET_TC(tc)->columnLabelsLen);
GET_TC(tc)->columnLabels = NULL;
PyObject_Free(GET_TC(tc)->cStr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check the paths that actually hit this? Right now it seems like we either implicitly have some kind of relation with the types allocate storage for the struct member and this function getting called, or we have undefined behavior when always calling free here, if the type doesn't allocate this in the first place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked these paths where cStr is assigned to a dynamic memory allocated with PyObject_Malloc:

  • NpyDateTimeToIsoCallback -> int64ToIso
  • NpyTimeDeltaToIsoCallback -> int64ToIsoDuration
  • PyDateTimeToIsoCallback -> PyDateTimeToIso
  • Index_iterBegin
  • Series_iterBegin
  • DataFrame_iterBegin
  • Object_getBigNumStringValue

There is no *_Free call for cStr outside of the Object_endTypeContext function to cause an undefined behaviour.

Other cases seems like cStr isn't assigned and end up being NULL, making the PyObject_Free a no-op.

@WillAyd WillAyd merged commit b9da662 into pandas-dev:main Sep 3, 2025
38 checks passed
@WillAyd
Copy link
Member

WillAyd commented Sep 3, 2025

Great thanks @Alvaro-Kothe ! Definitely open to any other improvements you can find here

@Alvaro-Kothe Alvaro-Kothe deleted the fix/memleak-json-iso branch September 3, 2025 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: memory leak in to_json when converting DateTime values
3 participants