Skip to content

Move type checks from switch cases to beginning of _Py_Specialize_BinaryOp #128310

@WolframAlph

Description

@WolframAlph

Feature or enhancement

Proposal:

_Py_Specialize_BinaryOp:

cpython/Python/specialize.c

Lines 2380 to 2443 in aeb9b65

_Py_Specialize_BinaryOp(_PyStackRef lhs_st, _PyStackRef rhs_st, _Py_CODEUNIT *instr,
int oparg, _PyStackRef *locals)
{
PyObject *lhs = PyStackRef_AsPyObjectBorrow(lhs_st);
PyObject *rhs = PyStackRef_AsPyObjectBorrow(rhs_st);
assert(ENABLE_SPECIALIZATION_FT);
assert(_PyOpcode_Caches[BINARY_OP] == INLINE_CACHE_ENTRIES_BINARY_OP);
switch (oparg) {
case NB_ADD:
case NB_INPLACE_ADD:
if (!Py_IS_TYPE(lhs, Py_TYPE(rhs))) {
break;
}
if (PyUnicode_CheckExact(lhs)) {
_Py_CODEUNIT next = instr[INLINE_CACHE_ENTRIES_BINARY_OP + 1];
bool to_store = (next.op.code == STORE_FAST);
if (to_store && PyStackRef_AsPyObjectBorrow(locals[next.op.arg]) == lhs) {
specialize(instr, BINARY_OP_INPLACE_ADD_UNICODE);
return;
}
specialize(instr, BINARY_OP_ADD_UNICODE);
return;
}
if (PyLong_CheckExact(lhs)) {
specialize(instr, BINARY_OP_ADD_INT);
return;
}
if (PyFloat_CheckExact(lhs)) {
specialize(instr, BINARY_OP_ADD_FLOAT);
return;
}
break;
case NB_MULTIPLY:
case NB_INPLACE_MULTIPLY:
if (!Py_IS_TYPE(lhs, Py_TYPE(rhs))) {
break;
}
if (PyLong_CheckExact(lhs)) {
specialize(instr, BINARY_OP_MULTIPLY_INT);
return;
}
if (PyFloat_CheckExact(lhs)) {
specialize(instr, BINARY_OP_MULTIPLY_FLOAT);
return;
}
break;
case NB_SUBTRACT:
case NB_INPLACE_SUBTRACT:
if (!Py_IS_TYPE(lhs, Py_TYPE(rhs))) {
break;
}
if (PyLong_CheckExact(lhs)) {
specialize(instr, BINARY_OP_SUBTRACT_INT);
return;
}
if (PyFloat_CheckExact(lhs)) {
specialize(instr, BINARY_OP_SUBTRACT_FLOAT);
return;
}
break;
}
SPECIALIZATION_FAIL(BINARY_OP, binary_op_fail_kind(oparg, lhs, rhs));
unspecialize(instr);
}

uses

if (!Py_IS_TYPE(lhs, Py_TYPE(rhs))) { 
    break; 
} 

check inside every switch case which is redundant. Placing this check before switch results in same performance but less generated code. Compiler with -O3 option seems to not recognize it. Attaching generated machine code (clang, apple silicon) for old and new versions
new.txt
old.txt

Feel free to close this if not worth it.

Has this already been discussed elsewhere?

No response given

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)type-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions