-
-
Notifications
You must be signed in to change notification settings - Fork 33.1k
gh-116946: fully implement GC protocol for bz2
objects
#138266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Modules/_bz2module.c
Outdated
} | ||
|
||
// explicit fields initialization as PyObject_GC_New() does not change them | ||
self->flushed = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this to be explicitly initialized otherwise tests fail because flushed
would be arbitrary.
Modules/_bz2module.c
Outdated
if (self->unused_data == NULL) | ||
goto error; | ||
|
||
self->bzs = (bz_stream){.opaque = NULL, .bzalloc = NULL, .bzfree = NULL}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this being set, we have a segmentation fault.
Modules/_bz2module.c
Outdated
} | ||
|
||
// explicit fields initialization as PyObject_GC_New() does not change them | ||
self->eof = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this being set by default, tests fail as eof is most likely "true" by default (that's because of the call to PyObject_GC_New which initializes the memory but doesn't zero it).
🤖 New build scheduled with the buildbot fleet by @picnixz for commit a8e8501 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F138266%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-refleak-buildbots label again. |
Modules/_bz2module.c
Outdated
|
||
assert(type != NULL && type->tp_alloc != NULL); | ||
self = (BZ2Compressor *)type->tp_alloc(type, 0); | ||
self = PyObject_GC_New(BZ2Compressor, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this seems like introducing additional maintenance for no reason. If a type has Py_TPFLAGS_HAVE_GC
set, then tp_alloc
will act like PyObject_GC_New
anyway, except the data will also be zero-ed (so we wouldn't need the additional change below).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like those kind of "will act like" [...]. I prefer using the public API correctly rather than relying on implementation details. I know it's an additional maintenance burden but I think it's worth it because we would follow our own rules.
Also, I think we should perhaps add a function for zero data with PyObject_GC_New to prevent these additional changes while using public functions (and not just accessing tp_alloc directly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you're saying, but this really isn't an implementation detail. The docs explicitly tell users to prefer tp_alloc
instead of manually allocating objects. The only place where I'd say it's ok to use PyObject_GC_New
is when using the limited API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we're not reading the same docs: https://docs.python.org/3/c-api/gcsupport.html#supporting-cycle-detection
Constructors for container types must conform to two rules:
The memory for the object must be allocated using PyObject_GC_New or PyObject_GC_NewVar.
And https://docs.python.org/3/c-api/typeobj.html#c.Py_TPFLAGS_HEAPTYPE:
Heap types should also support garbage collection as they can form a reference cycle with their own module object
and in https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_traverse:
More information about Python’s garbage collection scheme can be found in section Supporting Cyclic Garbage Collection.
All those documented area point to the page where we explicitly ask users to use PyObject_GC_New[Var]
. I haven't seen anywhere something recommending tp_alloc
. And https://docs.python.org/3/c-api/type.html#c.PyType_GenericAlloc doesn't document that this can be used instead of GC_New + GC_Track.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this was part of the big object lifecycle documentation update in 3.14: #125962
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to block like this, but I feel strongly -1 on the switch from tp_alloc
to PyObject_GC_New
. In 3.14 and 3.15, all of the C API documentation asks that you choose tp_alloc
over manual object allocation functions.
Modules/_bz2module.c
Outdated
|
||
assert(type != NULL && type->tp_alloc != NULL); | ||
self = (BZ2Compressor *)type->tp_alloc(type, 0); | ||
self = PyObject_GC_New(BZ2Compressor, type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get what you're saying, but this really isn't an implementation detail. The docs explicitly tell users to prefer tp_alloc
instead of manually allocating objects. The only place where I'd say it's ok to use PyObject_GC_New
is when using the limited API.
When you're done making the requested changes, leave the comment: |
Ah that's why. The stable docs being 3.13, this wasn't indicated! I can switch to do this then, though I really hate invoking directly the slot function. |
I'll update the other PRs for 3.14+, that I can do. However, does the behavior also hold in 3.13? or is a 3.14+ feature? |
The behavior holds for 3.13, we just changed the preference in the docs in 3.14.
How come? I think it's just nicer to use. If you change the type from GC to non-GC, for example, you don't have to update all your |
Modules/_bz2module.c
Outdated
|
||
PyTypeObject *tp = Py_TYPE(self); | ||
tp->tp_free((PyObject *)self); | ||
PyObject_GC_Del(self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I just swa that as well, so I'm going to add another commit (I'm modifying the other 3 PRs at the same time)
I would have preferred invoking a consumer function that is, |
I'd be okay with a |
Hum, now it feels wrong but is it ok that the GC is tracking the object when I'm not done yet with setting its fields? because it appears that now alloc functions automatically track the object after the call and this violates the recommendations for https://docs.python.org/3.14/c-api/gcsupport.html#c.PyObject_GC_Track. |
Yeah, that's fine. In practice, most object won't call into Python code before they're fully initialized, and even if one does, that's why there's a big red note at |
Yeah, but this is still wrong. What if someone is calling some method on the argument? that would call into Python code. I think we should also have a function for doing the alloc but not tracking the object yet. For now, I'll live with this but I really think this should be documented at the level of |
Unless they're playing with
Sure, do you want to make the PR? |
I have made the requested changes; please review again.
I hope you're right. OTOH, users could first construct the fields separately and set them afterwards (though they won't be able to rely on
I don't mind but I think it's also an easy contribution for newcomers. |
I think we should still backport the recommendations to 3.13 if nothing changed on our side (if this was always the behavior of tp_alloc & co, then it's probably worth doing this). However, we should still update https://docs.python.org/3.14/c-api/gcsupport.html#supporting-cycle-detection because it says "Use GC_New" and under "GC_New" it says "use tp_alloc" and this would be confusing. |
I'll create a doc issue for that because there are two tasks:
|
Sounds good, but up to you on whether to put in the effort, because 3.14's documentation will be the default in a month anyway. |
Mmmh. Right. No need for the docs backports. |
Off-topic, but I generally default to reading https://docs.python.org/dev/ -- this is the version that reflects HEAD, whereas A |
Thanks @picnixz for the PR 🌮🎉.. I'm working now to backport this PR to: 3.13. |
Thanks @picnixz for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
Sorry, @picnixz, I could not cleanly backport this to
|
…nGH-138266) (cherry picked from commit 9be91f6) Co-authored-by: Bénédikt Tran <[email protected]>
GH-138320 is a backport of this pull request to the 3.14 branch. |
…pythonGH-138266) (cherry picked from commit 9be91f6) Co-authored-by: Bénédikt Tran <[email protected]>
GH-138322 is a backport of this pull request to the 3.13 branch. |
… objects (pythonGH-138266) (python#138322)" This reverts commit 90036f5.
python#138266)" This reverts commit 9be91f6.
…H-138322, GH-138323, GH-138326) (#138337) * Revert "[3.13] gh-116946: fully implement GC protocol for `bz2` objects (GH-138266) (#138322)" This reverts commit 90036f5. * Revert "[3.13] gh-116946: fully implement GC protocol for `lzma` objects (GH-138288) (#138323)" This reverts commit 828682d. * Revert "[3.13] gh-116946: fully implement GC protocol for `_hashlib` objects (GH-138289) (#138326)" This reverts commit 21b5932.
…, GH-138288, GH-138289) (#138338) * Revert "gh-116946: fully implement GC protocol for `bz2` objects (#138266)" This reverts commit 9be91f6. * Revert "gh-116946: fully implement GC protocol for `lzma` objects (#138288)" This reverts commit 3ea16f9. * Revert "gh-116946: fully implement GC protocol for `_hashlib` objects (#138289)" This reverts commit 6f1dd95.
…ythonGH-138266, pythonGH-138288, pythonGH-138289) (python#138338) * Revert "pythongh-116946: fully implement GC protocol for `bz2` objects (python#138266)" This reverts commit 9be91f6. * Revert "pythongh-116946: fully implement GC protocol for `lzma` objects (python#138288)" This reverts commit 3ea16f9. * Revert "pythongh-116946: fully implement GC protocol for `_hashlib` objects (python#138289)" This reverts commit 6f1dd95.
AFAIU,
tp_traverse
will not be called ifPy_TPFLAGS_HAVE_GC
is not specified. Quoting the specs:As such, I believe that this is the correct fix (that is, we need
tp_traverse
to be called).cc @vstinner