@@ -15,17 +15,20 @@ We also provide guidelines on when to use each form of UB.
1515
1616Introduction
1717============
18- Undefined behavior is used to specify the behavior of corner cases for which we
19- don't wish to specify the concrete results. UB is also used to provide
18+ Undefined behavior (UB) is used to specify the behavior of corner cases for
19+ which we don't wish to specify the concrete results. UB is also used to provide
2020additional constraints to the optimizers (e.g., assumptions that the frontend
2121guarantees through the language type system or the runtime).
2222For example, we could specify the result of division by zero as zero, but
2323since we are not really interested in the result, we say it is UB.
2424
25- There exist two forms of undefined behaviour in LLVM: immediate UB and deferred UB.
26- The latter comes in two flavours: undef and poison values.
25+ There exist two forms of undefined behavior in LLVM: immediate UB and deferred
26+ UB. The latter comes in two flavors: undef and poison values.
27+ There is also a ``freeze `` instruction to tame the propagation of deferred UB.
2728The lattice of values in LLVM is:
28- immediate UB > poison > undef > freeze > concrete value.
29+ immediate UB > poison > undef > freeze(poison) > concrete value.
30+
31+ We explain each of the concepts in detail below.
2932
3033
3134Immediate UB
@@ -81,6 +84,31 @@ As a rule of thumb, use immediate UB only for the cases that trap the CPU for
8184most of the supported architectures.
8285
8386
87+ Time Travel
88+ -----------
89+ Immediate UB in LLVM IR allows the so-called time travelling. What this means
90+ is that if a program triggers UB, then we are not required to preserve any of
91+ its observable behavior, including I/O.
92+ For example, the following function triggers UB after calling ``printf ``:
93+
94+ .. code-block :: llvm
95+
96+ define void @fn() {
97+ call void @printf(...) willreturn
98+ unreachable
99+ }
100+
101+ Since we know that ``printf `` will always return, and because LLVM's UB can
102+ time-travel, it is legal to remove the call to ``printf `` altogether and
103+ optimize the function to simply:
104+
105+ .. code-block :: llvm
106+
107+ define void @fn() {
108+ unreachable
109+ }
110+
111+
84112 Deferred UB
85113===========
86114Deferred UB is a lighter form of UB. It enables instructions to be executed
@@ -311,8 +339,8 @@ We can make the loop unswitching optimization above correct as follows:
311339 br i1 %c2, label %then, label %else
312340
313341
314- Writing Tests that Avoid UB
315- ===========================
342+ Writing Tests Without Undefined Behavior
343+ ========================================
316344
317345When writing tests, it is important to ensure that they don't trigger UB
318346unnecessarily. Some automated test reduces sometimes use undef or poison
@@ -349,3 +377,17 @@ conditions and dereferencing undef/poison/null pointers.
349377 If you need a placeholder value to pass as an argument to an instruction
350378 that may trigger UB, add a new argument to the function rather than using
351379 undef or poison.
380+
381+
382+ Summary
383+ =======
384+ Undefined behavior (UB) in LLVM IR consists of two well-defined concepts:
385+ immediate and deferred UB (undef and poison values).
386+ Passing deferred UB values to certain operations leads to immediate UB.
387+ This can be avoided in some cases through the use of the ``freeze ``
388+ instruction.
389+
390+ The lattice of values in LLVM is:
391+ immediate UB > poison > undef > freeze(poison) > concrete value.
392+ It is only valid to transform values from the left to the right (e.g., a poison
393+ value can be replaced with a concrete value, but not the other way around).
0 commit comments