diff --git a/spec-0012/index.md b/spec-0012/index.md new file mode 100644 index 00000000..d81a366c --- /dev/null +++ b/spec-0012/index.md @@ -0,0 +1,201 @@ +--- +title: "SPEC 12 — Formatting mathematical expressions" +number: 12 +date: 2024-06-06 +author: + - "Pamphile Roy " + - "Matt Haberland " +discussion: https://discuss.scientific-python.org/t/spec-12-formatting-mathematical-expressions +endorsed-by: +--- + +## Description + +[PEP 8](https://peps.python.org/pep-0008) +and other established styling documents either + +- lack comprehensive guidelines about mathematical expressions, or +- provide simple rules that ignore the relationship between formatting and readability. + +In practice, this leads to varying, even conflicting, mathematical expression +styles across the ecosystem. We seek to standardize the representation of +mathematical code for the same reason we standardize formatting of other code: +it brings consistency to the ecosystem and allows collaborators to focus on +more important aspects of their work. + +## Implementation + +These rules are intended to respect and +complement the [PEP 8 standards](https://peps.python.org/pep-0008), such as using +[implied line continuation](https://peps.python.org/pep-0008/#maximum-line-length) and +and [breaking lines before binary operators](https://peps.python.org/pep-0008/#should-a-line-break-before-or-after-a-binary-operator)[^1]. + +0. Unless otherwise specified, rely on the implicit order of operations; + i.e., do not add extraneous parentheses. For example, prefer `u**v + y**z` + over `(u**v) + (y**z)`, and prefer `x + y + z` over `(x + y) + z`. A full + list of implicit operator priority levels is given by + [Operator Precedence](https://docs.python.org/3/reference/expressions.html#operator-precedence). +1. Always use the `**` operator and unary `+`, `-`, and `~` operators _without_ + surrounding whitespace. For example, prefer `y = -x**4` over `y = - (x ** 4)`. +2. Always surround non-PEMDAS[^2] operators with whitespace, and always make the priority of + non-PEMDAS operators explicit. For example, prefer `(x == y) or (w == t)` over + `x==y or w==t`.[^3] +3. Always surround AS[^2] operators with whitespace. +4. Typically, surround MD[^2] operators with whitespace, except in the following situations. + - When there are lower-priority operators (namely AS) within the same compound + expression[^4]. For example, prefer `z = -x * y**t` over `z = -x*y**t`, but + prefer `z = w - x*y**t` over `z = w - x * y**t` due to the presence of the + lower-priority subtraction operator. + - When the division operation would be written mathematically as a fraction with a + horizontal bar. For example, prefer `z = t/v * x/y` over `z = t / v * x / y` + if this would be written mathematically as the product of two fractions, + e.g. $\frac{t}{v} \cdot \frac{x}{y}$. +5. Considering the previous rules, only `**`, `*`, `/`, and the unary `+`, `-`, and `~` + operators can appear in implicit subexpressions[^4] without spaces. In such expressions, + + - Use at most one unary operator, and if used, ensure that it is the leftmost operator. + - Use at most one `**` operator, and if used, ensure that it is the rightmost operator. + - Use at most one `/` operator, and if used, ensure that it is the rightmost operator except for `**`. + + To achieve these goals, simplification or the addition of parentheses may be required. + For example: + + - The expressions `--x` and `-~x` would be implicit subexpressions without spaces + containing more than one unary operator. The former can be simplified to `+x` or + simply `x`, and the latter requires explicit parentheses, i.e. `-(~x)`. + - The expression `x**y**z` would be an implicit subexpression without spaces + containing more than one `**` operator. This code would be executed as `x**(y**z)` + following the implicit order, but the explicit parentheses should be included for + clarity. + - In the expression `t**v*x**y + z`, no spaces are used around the multiplication + operator due to the presence of the lower-priority addition operator. However, + this would lead to `t**v*x**y` being an implicit subexpression without spaces + containing more than one `**` operator. This code would be executed as + `(t**v)*(x**y) + z`, but the explicit parentheses should be included for clarity. + - In the expression `z + x**y/w`, no spaces are used around the division operator + due to the presence of the lower-priority addition operator. However, this would + lead to `x**y/w` being an implicit subexpression without spaces containing `**` + to the left of another operator. This code would be executed as `z + (x**y)/w`, + but the explicit parentheses should be included for clarity. + +6. Simplify combinations of unary and binary `+` and `-` operators when possible. + For example, + - prefer `x + y` over `x + +y`, + - prefer `x + y` over `x - -y`, + - prefer `x - y` over `x - +y`, and + - prefer `x - y` over `x + -y`. +7. If required to satisfy other style requirements, include line breaks before + the outermost explicit subexpression possible. For example, if + `t + (w + (x + (y + z))))` must be broken, prefer + ```python3 + (t + + (w + (x + (y + z))))) + ``` + over + ```python3 + (t + (w + (x + (y + + z))))) + ``` + If there are multiple candidates, include the break at the first opportunity. +8. If line breaks must occur within a compound subexpression, the break should + be placed before the operator with lowest priority. For example, if + (x + y*z) must be broken, prefer + ```python3 + (x + + y*z) + ``` + over + ```python3 + (x + y + * z) + ``` + If there are multiple candidates, include the break at the first opportunity. +9. Any of the preceding rules may be broken if there is a clear reason to do so. + - _Conflict with other style rules_. For example, there is not supposed to be + whitespace surrounding the `**` operator, but one can imagine a chain of `**` + operations that exhausts the character limit of a line. + - _Domain knowledge_. For instance, in the expression + `t = (x + y) - z`, it may be important to emphasize that the addition should be + performed first for numerical reasons or because `(x + y)` is a conceptually + important quantity. In such cases, consider adding a comment, e.g. + ```python3 + t = (x + y) - z # perform `x + y` first for precision + ``` + or breaking the expressions into separate logical lines, e.g. + ```python3 + w = x + y + t = w - z + ``` + +## Terminology + +An "explicit" expression is a code expression enclosed within parentheses or +otherwise syntactically separated from other expressions (i.e. by code other +than operators, whitespace, literals, or variables). For example, in the list +comprehension: + +```python3 +[j for j in range(1, i + 1)] +``` + +The output expression `j` is one explicit expression and the input sequence +`range(1, i + 1)` is another. + +A "subexpression" is subset of an expression that is either explicit or could +be made explicit (i.e. with parentheses) without affecting the order of +operations. In the example above, `j` and `range(1, i + 1)` can also be +referred to as explicit subexpressions of the whole expression, and `1` and +`i + 1` are explicit subexpressions of the expression `range(1, i + 1)`. `i` and +`1` are "implicit" subexpressions of `i + 1`: they could be written as explicit +subexpressions `(i)` and `(1)` without affecting the order of operations, but they +are not explicit as written. + +As another example, in `x + y*z`, `y*z` is a subexpression because it could be made +explicit as in `x + (y*z)` without changing the order of operations. However, `x + y` +would not be a subexpression because `(x + y)*z` would change the order of operations. +Note that `x + y*z` as a whole may also be referred to as a "subexpression" rather than +an "expression" even though `(x + y*z)` is not a proper subset of the whole. + +A "simple" expression is an expression involving only one operator priority level +without considering the operators within explicit subexpressions. +A "compound" expression is an expression involving more than one operator +priority level without considering the contents of explicit subexpressions. +For example, + +- `x + y - z` is a simple expression because `+` and `-` have the + same priority level. There are no explicit subexpressions to be ignored. +- `x * (y + z)` is also a simple expression because there is only one operator + between `x` and the explicit subexpression `(y + z)`; we ignore the contents - and + especially the operator - within the explicit subexpression; conceptually, it may + regarded as `(...)`. +- `x*y + z` is a compound expression; there are two operators and no explicit + subexpressions that can be ignored. + +[^1]: + Although examples do not show the use of hanging indent, any of the indentation styles + allowed by [PEP 8 Indentation](https://peps.python.org/pep-0008/#indentation) are permitted + by this SPEC. + +[^2]: + The acronym PEMDAS commonly refers to "parentheses", "exponentiation", "multiplication", + "division", "addition", and "subtraction". Herein, we will consider these operators + to be "PEMDAS operators", and we will also include the unary `+`, `-`, and `~` in + this category for convenience. The order of operations of PEMDAS operators is typically + taught in primary school and reinforced throughout a programmer's training and + experience, so it is assumed that most programmers are comfortable relying on the + implicit order of operations of expressions involving a few PEMDAS operations. Implicit + order of operations becomes less obvious as the number of distinct operator priority + levels increases and when multiple non-PEMDAS operators are involved. Portions of this + acronym, namely MD and AS, will be used to refer to the corresponding operators. + +[^3]: + There is a case for simply eliminating spaces to reinforce the implicit order + of operations, as in `x==y or w==t`. However, if this were the rule, following + the rule would require users to remember the full order of operations hierarchy + and apply it without mistakes. Use of explicit parentheses with non-PEMDAS + operators leads to simpler rules, is more explicit, and is not uncommon in + existing code. + +[^4]: + For definitions of "explicit"/"implicit" and "simple"/"compound" + "expressions"/"subexpressions", see Terminology.