- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
gh-126554: ctypes: Correctly handle NULL dlsym values #126555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the  | 
b436c5f    to
    bc3ce76      
    Compare
  
    For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In our case, we also subjectively treat a NULL return value from dlsym() as an error, so we format a special string for that. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]>
bc3ce76    to
    3c3c29d      
    Compare
  
    | 
 This is direct quotation from the man page you linked. 
 | 
| Indeed, I failed to mention that at some point the man page contradicts itself, and I believe that the part you quoted is something the maintainer forgot to remove. 
 I say this because further up, in the DESCRIPTION, the man page states: 
 Additionally, further down, in the NOTES segment, the man page states:  | 
| Understood. IMO, it would be better if we can make an actual reproducible test case, and put this case into regular testing procedure. | 
| See reproducer here: #126554 (comment) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good to see new contributors :)
Unfortunately, this change does need a test. If it's really, really difficult to add one, then I guess we'll figure things out, but in general, it's difficult for us to accept even tiny bugfixes without proper tests.
For something like this, as long as the test works, it's probably OK. You could do something as hacky as hard-coding a byte string into a test file and unpacking that into a .so at runtime, and just disable all the systems that don't work. But it would be preferable if you could find a library that actually triggers this problem in the wild.
        
          
                Misc/NEWS.d/next/C_API/2024-11-07-20-24-58.gh-issue-126554.ri12eb.rst
              
                Outdated
          
            Show resolved
            Hide resolved
        
      …2eb.rst Co-authored-by: Peter Bierma <[email protected]>
Co-authored-by: Peter Bierma <[email protected]>
Co-authored-by: Peter Bierma <[email protected]>
| 
 | 
| (2) is done, pushing  a commit. I also added a comment that explicitly states why we call  From a project maintaner's POV, if we have 3 instances of this phenomenon, do we add a comment disclaimer at every instance? (This is the approach I took, but I'm unsure what the best / most common practice is.) | 
| Proceeding with the test case (3), will let you know when ready. | 
Signed-off-by: Georgios Alexopoulos <[email protected]>
| 
 Yup. In fact, that's the preferred workflow for us. Force pushing onto one commit is very much discouraged. 
 That looks like a problem too, yeah. Thanks! I'll create a separate issue if I can come up with a repro for the locale issue. (I'll CC you on it, if you want to author that PR as well.) | 
Signed-off-by: Georgios Alexopoulos <[email protected]>
| Added a commit with the test case under  Running on the PR branch, it passes :) On HEAD of main, it explodes (segfault). Do we have any way of handling this gracefully (maybe running a subprocess with the interpreter?), or is it the correct way to go? | 
| 
 Could you please explain, what exactly are you speaking about? | 
| If we want to be pedantic, we must check all 3 code paths that invoke  In this case, we only check one ( On one hand, since our amendment is the same across all 3 funcs, we should theoretically be OK. | 
| 
 If I run the test on the main branch, the Python interpreter gets a segmentation fault. What I'm saying is that in a scenario where someone runs all tests, when the  Maybe I have misunderstood something regarding the philosophy of testing. | 
| Run the test in a subprocess and check the return code. | 
Co-authored-by: Petr Viktorin <[email protected]>
Signed-off-by: Georgios Alexopoulos <[email protected]>
| I'm not sure if the ASan failures are related. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One PEP-7 nitpick and two de-chop suggestions. I didn't write suggestions for the PyErr_SetString usages since I wasn't sure it fits on 80 chars.
Co-authored-by: Bénédikt Tran <[email protected]>
Signed-off-by: Georgios Alexopoulos <[email protected]>
| 🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit ab22349 🤖 If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. | 
| Let's try the build bots again and see if it holds up now. | 
| Buildbots are all good, the one failure is a problem with the DRC API that got merged the other day, which is unrelated. I'll submit a (separate) patch to fix that. I think we're good to merge! | 
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Co-authored-by: George Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
| Sorry, @grgalex and @encukou, I could not cleanly backport this to   | 
| GH-126861 is a backport of this pull request to the 3.13 branch. | 
| @grgalex, you've finished it! Congrats with your first contribution to CPython! | 
| Thanks for the guidance @efimov-mikhail, @ZeroIntensity, @encukou. Looking forward to collaborating again in the future! | 
…) (#126861) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html (cherry picked from commit 8717f79) Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: George Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
| 3.13 backports were done but not 3.12; @encukou Do you want the 3.12 backport in the end or can we close the issue with only 3.13 and 3.14 being affected? | 
| As I said, I'm fine not backporting. If anyone disagrees, open the PR :) | 
| GH-127764 is a backport of this pull request to the 3.12 branch. | 
…) (GH-127764) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: George Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
…-126555) For dlsym(), a return value of NULL does not necessarily indicate an error [1]. Therefore, to avoid using stale (or NULL) dlerror() values, we must: 1. clear the previous error state by calling dlerror() 2. call dlsym() 3. call dlerror() If the return value of dlerror() is not NULL, an error occured. In ctypes we choose to treat a NULL return value from dlsym() as a "not found" error. This is the same as the fallback message we use on Windows, Cygwin or when getting/formatting the error reason fails. [1]: https://man7.org/linux/man-pages/man3/dlsym.3.html Signed-off-by: Georgios Alexopoulos <[email protected]> Signed-off-by: Georgios Alexopoulos <[email protected]> Co-authored-by: Peter Bierma <[email protected]> Co-authored-by: Bénédikt Tran <[email protected]> Co-authored-by: Petr Viktorin <[email protected]>
For dlsym(), a return value of NULL does not necessarily indicate an error [1].
Therefore, to avoid using stale (or NULL) dlerror() values, we must:
If the return value of dlerror() is not NULL, an error occured.
In our case of ctypes, I understand that we subjectively treat a NULL return value from dlsym() as an error, so we must format a special string for that.
[1] https://man7.org/linux/man-pages/man3/dlsym.3.html