- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
          gh-119109: improve functools.partial vectorcall with keywords
          #124584
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
  
    gh-119109: improve functools.partial vectorcall with keywords
  
  #124584
              Conversation
| Perhaps @vstinner has the time and interest in looking at this. | 
| I think it is a good compromise between simplicity and performance now. One micro-optimization that I couldn't figure out how to do simply is pre-storing  Not sure how much sense it makes yet, but I posted faster-cpython/ideas#699 in relation to this. Ready for review now. | 
| Was wandering if it might be worth factoring out macros for private use. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. 👍
| No refleaks: ❯ ./python.exe -m test -R 3:3 test_functools
Using random seed: 458260461
0:00:00 load avg: 1.72 Run 1 test sequentially in a single process
0:00:00 load avg: 1.72 [1/1] test_functools
beginning 6 repetitions. Showing number of leaks (. for 0 or less, X for 10 or more)
123:456
XX. ...
0:00:02 load avg: 1.66 [1/1] test_functools passed
== Tests result: SUCCESS ==
1 test OK.
Total duration: 2.1 sec
Total tests: run=321 skipped=1
Total test files: run=1/1
Result: SUCCESS | 
| 
 I get the same on  And also the same if I remove all tests except  Maybe not related to this PR? EDIT: I assumed this shows that there are refleaks? Or am I failing to interpret something? Never used  | 
| 
 The output I posted means that there are no leaks, -R is used to check for reference leaks and 3:3 indicates the numbers of runs to checks for leaks. | 
| I think the resizing logic is unnecessary and adds extra complexity, would you simplify it to what Serhiy suggested or alternatively remove it altogether @dg-pb? | 
| 
 I don't think it was simplification. He did propose an alternative and I incorporated it into the logic. Why I kept the old one  # only: (noveralloc > init_stack_size / 2)
p = partial(a=1)
f(a=2)
# initial stack size: 3
# used stack size: 1Thus, would be kicking in too often. 10% is I think a good number, given we are talking about the pool of 0.01% of cases. I feel that what is currently there provides necessary protection at minimal cost and is unlikely to cause any issues. But if you have better rationale in mind, I am open to further amendments. 
 Things like recursion of  "Anything that can go wrong will go wrong." Thus, given enough time it is very likely that more than 1 instance of issues with this will happen. Given it is pretty much for free, and the block can be easily removed if there is a need for it in the future, I am in favour of being "better safe than sorry" here. | 
| 
 I don't have better resizing strategy in mind atm so I'll not block this PR on it now, I'll merge this now thanks. | 
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
| Is a 3.14 backport possible for this PR, given it fixes a free threading race also? | 
| 
 This is performance improvement so it cannot be backported, I haven't seen any real crashes on this so adding supression should be fine for 3.14. | 
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
…python#124584) Co-authored-by: Kumar Aditya <[email protected]> Co-authored-by: Serhiy Storchaka <[email protected]>
(Potentially closes #128050)
This IMO is the best approach to resolve fallback "issue". It:
a) Eliminates the need for the fallback or any need to switch between implementation after initial construction
b) Delivers performance benefits for vectorcall when
partialhas keywordsBenchmark:
No penalty for calls without
pto_kwds.Non negligible speed improvement for calls with
pto_kwds: 27 - 55%