Skip to content

Improve Parser/token.c to be more branch predictor friendly and easier to read. #126049

@DefinitelyNotAnOrca

Description

@DefinitelyNotAnOrca

Feature or enhancement

Proposal:

I was digging through the python parser, and noticed the file Parser/token.c was pretty hard to follow because it had a series of nested switch statements that looked like:

int
_PyToken_TwoChars(int c1, int c2)
{
    switch (c1) {
    case '!':
        switch (c2) {
        case '=': return NOTEQUAL;
        }
        break;
    case '%':
        switch (c2) {
        case '=': return PERCENTEQUAL;
        }
        break;

this type of nested switching leads to branch predictors not being able to easily guess the branch paths (and is harder to read).

I am proposing to change _PyToken_TwoChars and _PyToken_ThreeChars to use a single switch statement with a C macro to bind the switches into single statements.

Before:

int
_PyToken_TwoChars(int c1, int c2)
{
    switch (c1) {
    case '!':
        switch (c2) {
        case '=': return NOTEQUAL;
        }
        break;
    case '%':
        switch (c2) {
        case '=': return PERCENTEQUAL;
        }
        break;

After:

int
_PyToken_TwoChars(int c1, int c2)
{
    switch (GENERATE_2CHAR_CODE(c1, c2)) {
        case GENERATE_2CHAR_CODE('!', '='): return NOTEQUAL;
        case GENERATE_2CHAR_CODE('%', '='): return PERCENTEQUAL;
        case GENERATE_2CHAR_CODE('&', '='): return AMPEREQUAL;

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions