# What is Operator Precedence? In Python you can write expressions, sometimes in various ways. Consider: ```python a*x**2 + b*x + c ``` This is a concise way of writing: ```python (a*(x**2)) + (b*x) + c ``` In the more concise expression, parentheses do not appear. It is understood that when an expression with exponentiation (`**`) occurs combined with an expression with multiplication (`*`), do the exponentiation first and then combine with the multiplication. Similarly, in the absence of parentheses dictating otherwise, perform multiplication (`*`) before addition (`+`). Decompilation outputs expressions with parenthesis omitted when they are not necessary. The process by which the semantic actions, or tree phase of the decompiler figures this out is by using a precedence table which specifies which operator to perform when it is appears next to another operator. Again, operator precedence is used to decide whether or not to add parenthesis around an _expression_. Note that an operator like multiplication is involved here and in the context of another expression. In the expression: ``` sin(2 * theta) + pi ``` the multiplication appears in the expression `2 * theta`, but that expression doesn't butt up with another expression. Instead the expression is used as the argument in a function call. The call is used as the first operand of an addition operation, but the important thing is that two expressions with operators don't appear next to each other as was the case in `a*x**2 + b*x + c`. # How does operator precedence work? To recap, in `a*x**2` the expression with a multiplication operator appears next to an expression with an exponentiation operator. How can we specify whether this means `(a*x)**2` or `a*(x**2)` to someone who does not know this? In Python's the reference manual operators are separated into a hierarchy. [Here](https://docs.python.org/3/reference/expressions.html) is an excerpt of a much larger table: | operator | Notes | | ------------------------- | -------------- | | .. . | ... | | `**` | Exponentiation | | `+x`, `-x`, `~x` | Positive, negative, bitwise NOT | | `*`, `@`, `/`, `//`, `%` | Multiplication, matrix multiplication, division, floor division, remainder | | ... | ... | In this table, we see that `**` appears in a table entry line before `*`. And that means to put the expression with `**` in a group before grouping an expression with `*` when the two expressions are adjacent. Using the table above, grouping behavior would be identical if we had written `a@x**2`, `a/x**2`, etc. The way this is specified in the decompiler though is to give for each operator a numeric value. Here is the corresponding code in `consts.py` that captures the above: ``` PRECEDENCE = { # ... "BINARY_POWER": 4, # Exponentiation: ** "unary_op": 6, # Positive, negative, bitwise NOT: +x, -x, ~x "BINARY_MULTIPLY": 8, # * "BINARY_MATRIX_MULTIPLY": 8, # @ "BINARY_DIVIDE": 8, # / "BINARY_FLOOR_DIVIDE": 8, # // "BINARY_MODULO": 8, # Remainder, % # ... } ``` In the actual dictionary in the Python code, we list things from high number (8) to low (4) rather than as shown above. I ordered things this way just to make the correspondences with the Python reference manual more clear. The thing to note is that lower numbers, like 4 group first, and higher numbers like 8, group last. # Examples Above, we described how we see things from the human side, and how to specify precedence in a way the decompiler can act on. But from the standpoint of the decompiler, how does this all work? We will describe this in terms of some examples before summarizing with the general process. Using the example `a*x**2`, The decompiler in the semantic or tree phase sees: ``` 1 expr 2 bin_op (BINARY_MULTIPLY, precedence 8) 3 0. expr 4 L. 1 0 LOAD_NAME a 5 1. expr 6 bin_op (BINARY_POWER, precedence 4) 7 0. expr 8 2 LOAD_NAME x 9 1. expr 10 constant 11 4 LOAD_CONST 2 12 2. binary_operator 13 6 BINARY_POWER 14 2. binary_operator 15 8 BINARY_MULTIPLY ... ``` As we traverse from the top-level tree for `expr` at line 1 down, four more `expr`s are encountered below it. The `expr` at line 3 we don't have to worry about since that covers `LOAD_NAME` at line 4 and `LOAD_NAME` is not an operator. However, the tree node for the `expr` at line 5 we do have to take into account operator precedence. At the `expr` tree node at line 5, the precedence value set from line 2 in `BINARY_MULTIPLY` is passed down. The `expr` first child in line 6 is `bin_op`, an operator which has a precedence of 4. Since parent precedence value 8 is greater than child precedence value 4, the expression with the parent should group first. When the parent groups first, we don't have to add parenthesis. The other `expr`s under the second `bin_op` we don't have to worry about either since, again, those involve `LOAD_NAME` or `LOAD_CONST` which are not operators. Now let's compare that with `(a*x)**2`: The decompiler in the semantic or tree phase sees: ``` 1 expr 2 bin_op (BINARY_POWER, precedence 4) 3 0. expr 4 bin_op (BINARY_MULTIPLY, precedence 8) 5 0. expr 6 L. 1 0 LOAD_NAME a 7 1. expr 8 2 LOAD_NAME x 9 2. binary_operator 10 4 BINARY_MULTIPLY 11 1. expr 12 constant 13 6 LOAD_CONST 2 14 2. binary_operator 15 8 BINARY_POWER ``` Again, as we descend the top-level tree for `expr` at line 1, our more `expr`s are seen: the first `expr` at line 3 involving another `bin_op` at line 4. Passed down to the `expr` at line 3 is the precedence `BINARY_POWER` with value 4. But in contrast to its `bin_op` parent at line 2, the `bin_op` child at line 4 has a precedence value 8. Since the parent value 4 is less than the child value 8, the child expression needs to group first and that is done by adding a parentheses in the handling of the `expr` at line 3. In general, in the function that handles expressions, `n_expr`, a precedence value is passed to it via `self.prec`, That is compared to the expression's single child, if that happens to be an operator with a precedence value. If the precedence value of the child is less than the precedence value of the parent, then a set parenthesis needs to be added. # %p and %P specifiers There are situations where operator composition appears in the tree in a more implicit way. This occurs when the template engine expands strings that have operators mentioned in a template string. Going through an example will make this more clear. Consider this Python code: ```python x = "system" if not "site-packages" in file else "local" ``` We have a grammar rule `if_exp_not` that corresponds roughly to `IfExp(test=[UnaryOp(...)])` in a Python AST. The output template an `if_exp_not` node of a parse tree is: ``` "if_exp_not": ( "%p if not %p else %p", ... (0, "expr", PRECEDENCE["unary_not"]), ...) ``` The way to read this is that if the nonterminal type of the parse tree is a `if_exp_not` node, then the first child of the parse tree is an expression (or "expr") that will get expanded in between the words "not" and "else". The thing to note though is that in contrast to the Python AST for this, where there is an explicit UnaryOp, the UnaryOp operator has been folded into the nonterminal `if_exp_not`. But we still need a way to know whether we need parenthesis around the expression. In the example given above, `... if not "site-packages" in file else ...`, we don't need additional parenthesis. However if the expression is `a or b`, then, yes, do need parenthesis around this expression. Otherwise Python will interpret `... if not a or b else ...` as `... if (not a) or b else ... ` rather than `... if not (a or b) else ...`. The way indicate the correct meaning we intend is to set the precedence of the parent precedence for that particular "expr" to be the precedence of "unary_not": logically that is the surrounding context that this expression is getting used. Other "expr"s that might appear as placeholders in other parts of the template can have different precedences or no precedence. In particular, there probably will never be a parenthesis required by the expression before "if not" or the expression after "else". So that precedence can be set to a high value. If we change first and last "%p" to "%c", the precedence value is whatever was passed down to us from above. Rather than rely on that we set that to be something equal to what matches the precedence of an `if ... else` expression. %P description to be continued...