-
-
Notifications
You must be signed in to change notification settings - Fork 33.3k
gh-103997: Fix and test _PyUnicode_Dedent
#138620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| :option:`-c` now dedents like :func:`textwrap.dedent` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -14309,9 +14309,9 @@ unicode_getnewargs(PyObject *v, PyObject *Py_UNUSED(ignored)) | |
| } | ||
|
|
||
| /* | ||
| This function searchs the longest common leading whitespace | ||
| This function searches the longest common leading whitespace | ||
| of all lines in the [src, end). | ||
| It returns the length of the common leading whitespace and sets `output` to | ||
| It returns the length of the common leading whitespace and sets *output* to | ||
StanFromIreland marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| point to the beginning of the common leading whitespace if length > 0. | ||
| */ | ||
| static Py_ssize_t | ||
|
|
@@ -14331,9 +14331,7 @@ search_longest_common_leading_whitespace( | |
|
|
||
| // scan the whole line | ||
| while (iter < end && *iter != '\n') { | ||
| if (!leading_whitespace_end && *iter != ' ' && *iter != '\t') { | ||
| /* `iter` points to the first non-whitespace character | ||
| in this line */ | ||
| if (!leading_whitespace_end && !Py_ISSPACE(Py_CHARMASK(*iter))) { | ||
| if (iter == line_start) { | ||
| // some line has no indent, fast exit! | ||
| return 0; | ||
|
|
@@ -14389,8 +14387,8 @@ search_longest_common_leading_whitespace( | |
| } | ||
|
|
||
| /* Dedent a string. | ||
| Behaviour is expected to be an exact match of `textwrap.dedent`. | ||
| Return a new reference on success, NULL with exception set on error. | ||
| Behaviour is expected to be an exact match of textwrap.dedent. | ||
StanFromIreland marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Return a new reference on success, NULL with an exception set on error. | ||
| */ | ||
| PyObject * | ||
| _PyUnicode_Dedent(PyObject *unicode) | ||
|
|
@@ -14410,13 +14408,9 @@ _PyUnicode_Dedent(PyObject *unicode) | |
| // [whitespace_start, whitespace_start + whitespace_len) | ||
| // describes the current longest common leading whitespace | ||
| const char *whitespace_start = NULL; | ||
| Py_ssize_t whitespace_len = search_longest_common_leading_whitespace( | ||
| const Py_ssize_t whitespace_len = search_longest_common_leading_whitespace( | ||
StanFromIreland marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| src, end, &whitespace_start); | ||
|
|
||
| if (whitespace_len == 0) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Keep the fast path. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can't, we need to clear lines, see the tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we really need to do it? in general there is nothing to dedent, so it'll slow down normal cases. Even if the comment says that it's meant to match textwrap.dedent(), I don't think it's needed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to respect the behavior of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But I don't think we need to! It's an internal function. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But our documentation says we do:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If you would rather just remove the false claims, we can do: https://github.com/python/cpython/compare/main...StanFromIreland:remove-misleading-notes?expand=1 Though I think this may cause confusion in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, but we can just say that we don't normalize whitespaces and only consider spaces and tabs. No one should ever write spaces with other space-like characters. Let's just amend the NEWS. As for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I am working on the PEP :-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I opened #138620 with your alternative, though I still think fixing it is better. |
||
| return Py_NewRef(unicode); | ||
| } | ||
|
|
||
| // now we should trigger a dedent | ||
| char *dest = PyMem_Malloc(src_len); | ||
| if (!dest) { | ||
|
|
@@ -14431,25 +14425,26 @@ _PyUnicode_Dedent(PyObject *unicode) | |
|
|
||
| // iterate over a line to find the end of a line | ||
| while (iter < end && *iter != '\n') { | ||
| if (in_leading_space && *iter != ' ' && *iter != '\t') { | ||
| if (in_leading_space && !Py_ISSPACE(Py_CHARMASK(*iter))) { | ||
| in_leading_space = false; | ||
| } | ||
| ++iter; | ||
| } | ||
|
|
||
| // invariant: *iter == '\n' or iter == end | ||
| bool append_newline = iter < end; | ||
| const bool append_newline = iter < end; | ||
|
|
||
| // if this line has all white space, write '\n' and continue | ||
| if (in_leading_space && append_newline) { | ||
| *dest_iter++ = '\n'; | ||
| if (in_leading_space) { | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this the issue? or was it There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There were multiple issues. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please indicate the issues for posterity. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two issues:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, I don't think it's worth changing this function. We should just change the comment. It's an internal function. |
||
| if (append_newline) { | ||
| *dest_iter++ = '\n'; | ||
| } | ||
| continue; | ||
| } | ||
|
|
||
| /* copy [new_line_start + whitespace_len, iter) to buffer, then | ||
| conditionally append '\n' */ | ||
|
|
||
StanFromIreland marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| Py_ssize_t new_line_len = iter - line_start - whitespace_len; | ||
| const Py_ssize_t new_line_len = iter - line_start - whitespace_len; | ||
| assert(new_line_len >= 0); | ||
| memcpy(dest_iter, line_start + whitespace_len, new_line_len); | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.