Skip to content

METH_METHOD calling convention is now not so efficient #123500

@neonene

Description

@neonene

Bug report

Bug description:

There are callables implemented with the METH_METHOD|METH_FASTCALL signature in C. They can be 5%-15% less efficient than using only METH_FASTCALL (or METH_O) with a PyType_GetModuleByDef function call.

For example, I measured the difference on Windows PGO builds by duplicating functions:

  • CDataType_from_buffer_copy() in _ctypes.c, which is not called when profiling:

    from timeit import timeit
    setup = """if 1:
        import ctypes
        buf = bytearray(16)
        cls = ctypes.c_char * len(buf)
    """
    # with a warmup
    for _ in range(2):
        # METH_METHOD|METH_FASTCALL (as-is)
        r0 = timeit(s0 := f'cls.from_buffer_copy (buf)', setup)
    
        # METH_FASTCALL (no `defining_class`) + PyType_GetModuleByDef
        r1 = timeit(s1 := f'cls.from_buffer_copy1(buf)', setup)
    
    print(s0, r0, 1 + (1 - r0 / r0))
    print(s1, r1, 1 + (1 - r1 / r0))
    cls.from_buffer_copy (buf) 0.15552800190635024 1.0
    cls.from_buffer_copy1(buf) 0.13187471489945893 1.1520837837364741
  • dec_mpd_qquantize() in _decimal.c profiled with 6800 calls (unfair?):

    # legacy (as-is)
    d1.quantize (d2) 0.1694609627971658 1.0
    
    # METH_METHOD|METH_FASTCALL (`defining_class`) + _PyType_GetModuleState
    d1.quantize1(d2) 0.1408861404022900 1.168621857938327
    
    # METH_FASTCALL (no `defining_class`) + PyType_GetModuleByDef
    d1.quantize2(d2) 0.1258157708973158 1.257553074049807
    Script (expand)
    from timeit import timeit
    setup = """if 1:
        from _decimal import Decimal
        d1,d2 = Decimal(1.414), Decimal('0.01')
    """
    for _ in range(2):
        r0 = timeit(s0 := f'd1.quantize (d2)', setup)
        r1 = timeit(s1 := f'd1.quantize1(d2)', setup)
        r2 = timeit(s2 := f'd1.quantize2(d2)', setup)
    
    print(s0, r0, 1 + (1 - r0 / r0))
    print(s1, r1, 1 + (1 - r1 / r0))
    print(s2, r2, 1 + (1 - r2 / r0))

Observations:

  • The number of arguments had little to do with this.
  • The gaps seem to be consistent as long as they are equally (un)exercised.
  • The same goes for non-PGO builds and builtin modules (e.g. _sre), where the impacts may be less significant.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

Metadata

Metadata

Assignees

No one assigned

    Labels

    performancePerformance or resource usagetopic-C-APItype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions