Skip to content

Conversation

@StanFromIreland
Copy link
Member

@StanFromIreland StanFromIreland commented Sep 7, 2025

While working on #62535, I noticed that several textwrap.dedent tests fail with this implementation.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_PyUnicode_Dedent is used in pymain_run_command so it may be performance critical. So please keep the same logic for implementation by working with char* only. Or show that this doesn't result in a performance loss.

@bedevere-app

This comment was marked as resolved.

@StanFromIreland

This comment was marked as resolved.

@bedevere-app

This comment was marked as resolved.

@bedevere-app bedevere-app bot requested a review from picnixz September 7, 2025 15:07
Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid adding const qualifiers.

@bedevere-app

This comment was marked as resolved.

@StanFromIreland

This comment was marked as resolved.

@bedevere-app

This comment was marked as resolved.

@bedevere-app bedevere-app bot requested a review from picnixz September 7, 2025 15:13
Py_ssize_t whitespace_len = search_longest_common_leading_whitespace(
src, end, &whitespace_start);

if (whitespace_len == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the fast path.

Copy link
Member Author

@StanFromIreland StanFromIreland Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't, we need to clear lines, see the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to do it? in general there is nothing to dedent, so it'll slow down normal cases. Even if the comment says that it's meant to match textwrap.dedent(), I don't think it's needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to respect the behavior of textwrap.dedent, as is noted in What's new 3.14, the docs, and the comments. Then yes.

Copy link
Member

@picnixz picnixz Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think we need to! It's an internal function. textwrap.dedent is implemented in pure Python.

Copy link
Member Author

@StanFromIreland StanFromIreland Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But our documentation says we do:

Whats New 3.14

The auto-dedentation behavior mirrors textwrap.dedent().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would rather just remove the false claims, we can do: https://github.com/python/cpython/compare/main...StanFromIreland:remove-misleading-notes?expand=1

Though I think this may cause confusion in the future.

Copy link
Member

@picnixz picnixz Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-dedentation behavior mirrors textwrap.dedent().

Yes, but we can just say that we don't normalize whitespaces and only consider spaces and tabs. No one should ever write spaces with other space-like characters. Let's just amend the NEWS. As for str.dedent(), it will need a PEP which still doesn't exist and the discussion seems stalled IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am working on the PEP :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #138620 with your alternative, though I still think fixing it is better.

// if this line has all white space, write '\n' and continue
if (in_leading_space && append_newline) {
*dest_iter++ = '\n';
if (in_leading_space) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this the issue? or was it *iter != ' ' ...?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were multiple issues.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indicate the issues for posterity.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two issues:

  1. Not clearing lines that are only whitespace, whereas textwrap.dedent does
  2. Only considering '\t' and ' ', whereas textwrap.dedent uses str.isspace

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't think it's worth changing this function. We should just change the comment. It's an internal function.

@bedevere-app

This comment was marked as resolved.

@picnixz
Copy link
Member

picnixz commented Sep 7, 2025

I would still be interested in knowing the answer to that question:

Or show that this doesn't result in a performance loss

Did your refactoring improve the overall performance or not?

@StanFromIreland
Copy link
Member Author

StanFromIreland commented Sep 7, 2025

Did your refactoring improve the overall performance or not?

Using PyUnicodeWriter has a ~20% performance penalty.

@StanFromIreland
Copy link
Member Author

I have made the requested changes; please review again

@bedevere-app

This comment was marked as resolved.

@bedevere-app bedevere-app bot requested a review from picnixz September 7, 2025 15:24
Py_ssize_t whitespace_len = search_longest_common_leading_whitespace(
src, end, &whitespace_start);

if (whitespace_len == 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to do it? in general there is nothing to dedent, so it'll slow down normal cases. Even if the comment says that it's meant to match textwrap.dedent(), I don't think it's needed.

// if this line has all white space, write '\n' and continue
if (in_leading_space && append_newline) {
*dest_iter++ = '\n';
if (in_leading_space) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please indicate the issues for posterity.

@picnixz
Copy link
Member

picnixz commented Sep 7, 2025

We can still test _PyUnicode_Dedent to check that it matches a "subset" of the features of textwrap.dedent.

@sunmy2019
Copy link
Member

I'm the implementor of the C function. Sorry that I did not read the code of textwrap.dedent. I just wrote according to the test cases. 😳

I understand now it's a matter of design.

@picnixz
Copy link
Member

picnixz commented Oct 16, 2025

Yes, and I think we do not need to exactly mimic textwrap dedent unless there is a compelling reason. I personally do not find one: the function is private and internally used only by the parser I think, and I doubt anyone would have a script that uses whitespaces that are not spaces/tabs for indents, and if they do, I do not think we should support this at the cost of slowing down the regular use cases.

For instance, I would suggest that we currently keep a simplified version as it is only used for the parser and, if the PEP for str.dedent() is accepted, possibly revisit this design question later, possibly by adding support for normalisation as well (PyUnicode_Dedent and PyUnicode_DedentNormalize).

WDYT?

@sunmy2019
Copy link
Member

For instance, I would suggest that we currently keep a simplified version as it is only used for the parser and, if the PEP for str.dedent() is accepted, possibly revisit this design question later, possibly by adding support for normalisation as well (PyUnicode_Dedent and PyUnicode_DedentNormalize).

WDYT?

I think so. Let's just change the document for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants