- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
          gh-69619: Add whitespace term to glossary and reference in stdtypes.rst
          #132568
        
          New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 2 commits
a560791
              9e90956
              448fd5f
              240cbae
              dd9ddd0
              File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | 
|---|---|---|
|  | @@ -2092,8 +2092,9 @@ expression support in the :mod:`re` module). | |
|  | ||
| Return a copy of the string with leading characters removed. The *chars* | ||
| argument is a string specifying the set of characters to be removed. If omitted | ||
| or ``None``, the *chars* argument defaults to removing whitespace. The *chars* | ||
| argument is not a prefix; rather, all combinations of its values are stripped:: | ||
| or ``None``, the *chars* argument defaults to removing :term:`whitespace`. | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to link to https://docs.python.org/3/library/stdtypes.html#str.isspace? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll yield, but I think a glossary term is the right way to go here.  | ||
| The *chars* argument is not a prefix; rather, all combinations of its values | ||
| are stripped:: | ||
|  | ||
| >>> ' spacious '.lstrip() | ||
| 'spacious ' | ||
|  | @@ -2211,8 +2212,9 @@ expression support in the :mod:`re` module). | |
|  | ||
| Return a copy of the string with trailing characters removed. The *chars* | ||
| argument is a string specifying the set of characters to be removed. If omitted | ||
| or ``None``, the *chars* argument defaults to removing whitespace. The *chars* | ||
| argument is not a suffix; rather, all combinations of its values are stripped:: | ||
| or ``None``, the *chars* argument defaults to removing :term:`whitespace`. | ||
| The *chars* argument is not a suffix; rather, all combinations of its values | ||
| are stripped:: | ||
|  | ||
| >>> ' spacious '.rstrip() | ||
| ' spacious' | ||
|  | @@ -2348,7 +2350,7 @@ expression support in the :mod:`re` module). | |
|  | ||
| Return a copy of the string with the leading and trailing characters removed. | ||
| The *chars* argument is a string specifying the set of characters to be removed. | ||
| If omitted or ``None``, the *chars* argument defaults to removing whitespace. | ||
| If omitted or ``None``, the *chars* argument defaults to removing :term:`whitespace`. | ||
|         
                  StanFromIreland marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| The *chars* argument is not a prefix or suffix; rather, all combinations of its | ||
| values are stripped:: | ||
|  | ||
|  | @@ -2735,7 +2737,7 @@ data and are closely related to string objects in a variety of other ways. | |
|  | ||
| This :class:`bytes` class method returns a bytes object, decoding the | ||
| given string object. The string must contain two hexadecimal digits per | ||
| byte, with ASCII whitespace being ignored. | ||
| byte, with :term:`ASCII whitespace <whitespace>` being ignored. | ||
|  | ||
| >>> bytes.fromhex('2Ef0 F1f2 ') | ||
| b'.\xf0\xf1\xf2' | ||
|  | @@ -2824,7 +2826,7 @@ objects. | |
|  | ||
| This :class:`bytearray` class method returns bytearray object, decoding | ||
| the given string object. The string must contain two hexadecimal digits | ||
| per byte, with ASCII whitespace being ignored. | ||
| per byte, with :term:`ASCII whitespace <whitespace>` being ignored. | ||
|  | ||
| >>> bytearray.fromhex('2Ef0 F1f2 ') | ||
| bytearray(b'.\xf0\xf1\xf2') | ||
|  | @@ -3243,8 +3245,8 @@ produce new objects. | |
| *chars* argument is a binary sequence specifying the set of byte values to | ||
| be removed - the name refers to the fact this method is usually used with | ||
| ASCII characters. If omitted or ``None``, the *chars* argument defaults | ||
| to removing ASCII whitespace. The *chars* argument is not a prefix; | ||
| rather, all combinations of its values are stripped:: | ||
| to removing :term:`ASCII whitespace <whitespace>`. The *chars* argument is | ||
| There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Likewise, better to link to https://docs.python.org/3/library/stdtypes.html#bytes.isspace? | ||
| not a prefix; rather, all combinations of its values are stripped:: | ||
|  | ||
| >>> b' spacious '.lstrip() | ||
| b'spacious ' | ||
|  | @@ -3287,8 +3289,8 @@ produce new objects. | |
| Split the binary sequence into subsequences of the same type, using *sep* | ||
| as the delimiter string. If *maxsplit* is given, at most *maxsplit* splits | ||
| are done, the *rightmost* ones. If *sep* is not specified or ``None``, | ||
| any subsequence consisting solely of ASCII whitespace is a separator. | ||
| Except for splitting from the right, :meth:`rsplit` behaves like | ||
| any subsequence consisting solely of :term:`ASCII whitespace <whitespace>` | ||
| is a separator. Except for splitting from the right, :meth:`rsplit` behaves like | ||
| :meth:`split` which is described in detail below. | ||
|  | ||
|  | ||
|  | @@ -3299,8 +3301,8 @@ produce new objects. | |
| *chars* argument is a binary sequence specifying the set of byte values to | ||
| be removed - the name refers to the fact this method is usually used with | ||
| ASCII characters. If omitted or ``None``, the *chars* argument defaults to | ||
| removing ASCII whitespace. The *chars* argument is not a suffix; rather, | ||
| all combinations of its values are stripped:: | ||
| removing :term:`ASCII whitespace <whitespace>`. The *chars* argument is not | ||
| a suffix; rather, all combinations of its values are stripped:: | ||
|  | ||
| >>> b' spacious '.rstrip() | ||
| b' spacious' | ||
|  | @@ -3352,7 +3354,8 @@ produce new objects. | |
| [b'1', b'2', b'3<4'] | ||
|  | ||
| If *sep* is not specified or is ``None``, a different splitting algorithm | ||
| is applied: runs of consecutive ASCII whitespace are regarded as a single | ||
| is applied: runs of consecutive :term:`ASCII whitespace <whitespace>` are | ||
| regarded as a single | ||
| separator, and the result will contain no empty strings at the start or | ||
| end if the sequence has leading or trailing whitespace. Consequently, | ||
| splitting an empty sequence or a sequence consisting solely of ASCII | ||
|  | @@ -3376,8 +3379,8 @@ produce new objects. | |
| removed. The *chars* argument is a binary sequence specifying the set of | ||
| byte values to be removed - the name refers to the fact this method is | ||
| usually used with ASCII characters. If omitted or ``None``, the *chars* | ||
| argument defaults to removing ASCII whitespace. The *chars* argument is | ||
| not a prefix or suffix; rather, all combinations of its values are | ||
| argument defaults to removing :term:`ASCII whitespace <whitespace>`. The *chars* | ||
| argument is not a prefix or suffix; rather, all combinations of its values are | ||
| stripped:: | ||
|  | ||
| >>> b' spacious '.strip() | ||
|  | @@ -3519,10 +3522,8 @@ place, and instead produce new objects. | |
| .. method:: bytes.isspace() | ||
| bytearray.isspace() | ||
|  | ||
| Return ``True`` if all bytes in the sequence are ASCII whitespace and the | ||
| sequence is not empty, ``False`` otherwise. ASCII whitespace characters are | ||
| those byte values in the sequence ``b' \t\n\r\x0b\f'`` (space, tab, newline, | ||
| carriage return, vertical tab, form feed). | ||
| Return ``True`` if all bytes in the sequence are :term:`ASCII whitespace <whitespace>` | ||
|         
                  StanFromIreland marked this conversation as resolved.
              Outdated
          
            Show resolved
            Hide resolved | ||
| and the sequence is not empty, ``False`` otherwise. | ||
|  | ||
|  | ||
| .. method:: bytes.istitle() | ||
|  | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide to keep this glossary entry (see other comments), it should mention Unicode first, and reduce the table to an in-line description (see the entry for bytes.isspace()) to take up less space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I suggested the table. I didn't realize there was precedent for the inline format.
I find the table significantly easier to read, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glossary page is very long, we should avoid making it longer. Perhaps split up the characters though, eg "
\t(horizontal tab), ...".There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it bad for the glossary to be long? I don't think people read it in order, they just click on terms elsewhere and get redirected. I would think that users prefer more information on individual terms rather than the overall glossary page being short.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not bad for it to be long, but rather longer than it needs to be. A full table here isn't needed to describe six characters, and as mentioned it takes the focus away from Unicode whitespace, which is the default set of whitespace operated on, unless using bytes/buffer functions, or
re.ASCII. The more common thing (Unicode) should be the focus, and we should avoid giving readers the expectation that whitespace is limited to the ASCII set.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, fair point. Maybe there's a better way to emphasize Unicode here? I'm really not a fan of the inline version based on
bytes.isspace.