-
-
Notifications
You must be signed in to change notification settings - Fork 33.4k
Open
Labels
performancePerformance or resource usagePerformance or resource usagetopic-C-APItype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
Description
Bug report
Bug description:
There are callables implemented with the METH_METHOD|METH_FASTCALL signature in C. They can be 5%-15% less efficient than using only METH_FASTCALL (or METH_O) with a PyType_GetModuleByDef function call.
For example, I measured the difference on Windows PGO builds by duplicating functions:
-
CDataType_from_buffer_copy()in_ctypes.c, which is not called when profiling:from timeit import timeit setup = """if 1: import ctypes buf = bytearray(16) cls = ctypes.c_char * len(buf) """ # with a warmup for _ in range(2): # METH_METHOD|METH_FASTCALL (as-is) r0 = timeit(s0 := f'cls.from_buffer_copy (buf)', setup) # METH_FASTCALL (no `defining_class`) + PyType_GetModuleByDef r1 = timeit(s1 := f'cls.from_buffer_copy1(buf)', setup) print(s0, r0, 1 + (1 - r0 / r0)) print(s1, r1, 1 + (1 - r1 / r0))
cls.from_buffer_copy (buf) 0.15552800190635024 1.0 cls.from_buffer_copy1(buf) 0.13187471489945893 1.1520837837364741
-
dec_mpd_qquantize()in_decimal.cprofiled with 6800 calls (unfair?):# legacy (as-is) d1.quantize (d2) 0.1694609627971658 1.0 # METH_METHOD|METH_FASTCALL (`defining_class`) + _PyType_GetModuleState d1.quantize1(d2) 0.1408861404022900 1.168621857938327 # METH_FASTCALL (no `defining_class`) + PyType_GetModuleByDef d1.quantize2(d2) 0.1258157708973158 1.257553074049807
Script (expand)
from timeit import timeit setup = """if 1: from _decimal import Decimal d1,d2 = Decimal(1.414), Decimal('0.01') """ for _ in range(2): r0 = timeit(s0 := f'd1.quantize (d2)', setup) r1 = timeit(s1 := f'd1.quantize1(d2)', setup) r2 = timeit(s2 := f'd1.quantize2(d2)', setup) print(s0, r0, 1 + (1 - r0 / r0)) print(s1, r1, 1 + (1 - r1 / r0)) print(s2, r2, 1 + (1 - r2 / r0))
Observations:
- The number of arguments had little to do with this.
- The gaps seem to be consistent as long as they are equally (un)exercised.
- The same goes for non-PGO builds and builtin modules (e.g.
_sre), where the impacts may be less significant.
CPython versions tested on:
CPython main branch
Operating systems tested on:
Windows
Metadata
Metadata
Assignees
Labels
performancePerformance or resource usagePerformance or resource usagetopic-C-APItype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error