Skip to content

Conversation

@qqwqqw689
Copy link
Contributor

@qqwqqw689 qqwqqw689 commented Nov 6, 2024

In file lexer.c we have

    Py_ssize_t invalid = _PyUnicode_ScanIdentifier(s);
    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

link

But for function _PyUnicode_ScanIdentifier in unicodeobject.c file

Py_ssize_t
_PyUnicode_ScanIdentifier(PyObject *self)
{
    Py_ssize_t i;
    Py_ssize_t len = PyUnicode_GET_LENGTH(self);
    if (len == 0) {
        /* an empty string is not a valid identifier */
        return 0;
    }

    int kind = PyUnicode_KIND(self);
    const void *data = PyUnicode_DATA(self);
    Py_UCS4 ch = PyUnicode_READ(kind, data, 0);
    /* PEP 3131 says that the first character must be in
       XID_Start and subsequent characters in XID_Continue,
       and for the ASCII range, the 2.x rules apply (i.e
       start with letters and underscore, continue with
       letters, digits, underscore). However, given the current
       definition of XID_Start and XID_Continue, it is sufficient
       to check just for these, except that _ must be allowed
       as starting an identifier.  */
    if (!_PyUnicode_IsXidStart(ch) && ch != 0x5F /* LOW LINE */) {
        return 0;
    }

    for (i = 1; i < len; i++) {
        ch = PyUnicode_READ(kind, data, i);
        if (!_PyUnicode_IsXidContinue(ch)) {
            return i;
        }
    }
    return i;
}

link
This function will never return a invalid valiable <0, so

    if (invalid < 0) {
        Py_DECREF(s);
        tok->done = E_ERROR;
        return 0;
    }

will never be executed.

I make this pr to fix it.

@qqwqqw689 qqwqqw689 changed the title gh-126469: fix a error in lexer.c. gh-126469: fix an error in lexer.c. Nov 6, 2024
@skirpichev
Copy link
Contributor

Unfortunately, fix seems to be wrong: pr has CI failures, related to it.

@diegorusso
Copy link
Contributor

diegorusso commented Nov 6, 2024

It is true that the if statement is not executed (at least in the tests) but the fix is not right as tests are failing. Maybe we should really focus on writing a test that exercise that code path and if we cannot, we could refactor the logic there.

@qqwqqw689
Copy link
Contributor Author

qqwqqw689 commented Nov 6, 2024

Actually I'm a rookie, I'd like to wait for some advices.😁

@tomasr8
Copy link
Member

tomasr8 commented Nov 6, 2024

The tests seem to be failing because the error that is being raised is unknown parsing error while the tests expect either invalid character or invalid non-printable character so it doesn't seem to be entirely wrong. I think you can simply remove the if block and the tests should pass:

-    if (invalid < 0) {
-        Py_DECREF(s);
-        tok->done = E_ERROR;
-        return 0;
-    }

@qqwqqw689
Copy link
Contributor Author

qqwqqw689 commented Nov 6, 2024

The tests seem to be failing because the error that is being raised is unknown parsing error while the tests expect either invalid character or invalid non-printable character so it doesn't seem to be entirely wrong. I think you can simply remove the if block and the tests should pass:

-    if (invalid < 0) {
-        Py_DECREF(s);
-        tok->done = E_ERROR;
-        return 0;
-    }

I will try.

@picnixz picnixz changed the title gh-126469: fix an error in lexer.c. gh-126469: remove unnecessary error-checking branch in lexer.c Nov 10, 2024
@pablogsal pablogsal enabled auto-merge (squash) January 1, 2025 20:46
@pablogsal pablogsal disabled auto-merge January 1, 2025 20:46
@pablogsal pablogsal enabled auto-merge (squash) January 1, 2025 20:46
@pablogsal
Copy link
Member

Thanks for your contribution @qqwqqw689

@picnixz
Copy link
Member

picnixz commented Jan 1, 2025

(I've updated the PR since we added new CI checks)

@pablogsal pablogsal merged commit c810ed7 into python:main Jan 1, 2025
39 checks passed
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants