llvm · nunoplopes · Dec 11, 2024 · Dec 9, 2024 · Dec 9, 2024 · Dec 9, 2024
diff --git a/llvm/docs/Reference.rst b/llvm/docs/Reference.rst
@@ -52,6 +52,7 @@ LLVM and API reference documentation.
    TestingGuide
    TransformMetadata
    TypeMetadata
+   UndefinedBehavior
    XRay
    XRayExample
    XRayFDRFormat
@@ -120,6 +121,9 @@ LLVM IR
   Defines the LLVM intermediate representation and the assembly form of the
   different nodes.
 
+:doc:`Undefined Behavior (UB) <UndefinedBehavior>`
+  A guide on what UB/undef/poison are and when to use each one.
+
 :doc:`InAlloca`
   Description of the ``inalloca`` argument attribute.
 

diff --git a/llvm/docs/UndefinedBehavior.rst b/llvm/docs/UndefinedBehavior.rst
@@ -0,0 +1,368 @@
+======================================
+LLVM IR Undefined Behavior (UB) Manual
+======================================
+
+.. contents::
+   :local:
+   :depth: 2
+
+Abstract
+========
+This document describes the undefined behavior (UB) in LLVM's IR, including
+undef and poison values, as well as the ``freeze`` instruction.
+We also provide guidelines on when to use each form of UB.
+
+
+Introduction
+============
+Undefined behavior is used to specify the behavior of corner cases for which we
+don't wish to specify the concrete results.
+For example, we could specify the result of division by zero as zero, but
+since we are not really interested in the result, we say it is UB.
+
+There are two forms of UB in LLVM: immediate UB and deferred UB (undef and
+poison values).
+The lattice of values in LLVM is:
+immediate UB > poison > undef > freeze > concrete value.
+
+
+Immediate UB
+============
+Immediate UB is the most severe form of UB. It should be avoided whenever
+possible.
+Immediate UB should be used only for operations that trap in most CPUs supported
+by LLVM.
+Examples include division by zero, dereferencing a null pointer, etc.
+
+The reason that immediate UB should be avoided is that it makes optimizations
+such as hoisting a lot harder.
+Consider the following example:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      br i1 %c, label %then, label %else
+
+    then:
+      %div = udiv i32 3, %v
+      br label %ret
+
+    else:
+      br label %ret
+
+    ret:
+      %r = phi i32 [ %div, %then ], [ 0, %else ]
+      ret i32 %r
+    }
+
+We might be tempted to simplify this function by removing the branching and
+executing the division speculatively because ``%c`` is true most of times.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+    define i32 @f(i1 %c, i32 %v) {
+      %div = udiv i32 3, %v
+      %r = select i1 %c, i32 %div, i32 0
+      ret i32 %r
+    }
+
+However, this transformation is not correct! Since division triggers UB
+when the divisor is zero, we can only execute speculatively if we are sure we
+don't hit that condition.
+For the function above, when called like ``f(false, 0)``, before the optimization
+it would return 0, and after the optimization it now triggers UB.
+
+This example highlights why we minimize the cases that trigger immediate UB
+as much as possible.
+As a rule of thumb, use immediate UB only for the cases that trap the CPU for
+most of the supported architectures.
+
+
+Deferred UB
+===========
+Deferred UB is a lighter form of UB. It enables instructions to be executed
+speculatively while marking some corner cases having erroneous values.
+Deferred UB should be used for cases where the semantics offered by common
+CPUs differs,but the CPU does not trap.
+
+As an example, consider the shift instructions. The x86 and ARM architectures
+offer different semantics when the shift amount is equal to or greater than
+the bitwidth.
+We could solve this tension in one of two ways: 1) pick one of the x86/ARM
+semantics for LLVM, which would make the code emitted for the other architecture
+slower; 2) define that case as yielding ``poison``.
+LLVM chose the latter option. For frontends for languages like C or C++
+(e.g., clang), they can map shifts in the source program directly to a shift in
+LLVM IR, since the semantics of C and C++ define such shifts as UB.
+For languages that offer strong semantics, they must use the value of the shift
+conditionally, e.g.:
+
+.. code-block:: llvm
+
+    define i32 @x86_shift(i32 %a, i32 %b) {
+      %mask = and i32 %b, 31
+      %shift = shl i32 %a, %mask
+      ret i32 %shift
+    }
+
+
+There are two deferred UB values in LLVM: ``undef`` and ``poison``, which we
+describe next.
+
+
+Undef Values
+------------
+.. warning::
+   Undef values are deprecated and should be used only when strictly necessary.
+   No new uses should be added unless justified.
+
+An undef value represents any value of a given type. Moreover, each use of
+an instruction that depends on undef can observe a different value.
+For example:
+
+.. code-block:: llvm
+
+    define i32 @fn() {
+      %add = add i32 undef, 0
+      %ret = add i32 %add, %add
+      ret i32 %ret
+    }
+
+Unsurprisingly, the first addition yields ``undef``.
+However, the result of the second addition is more subtle. We might be tempted
+to think that it yields an even number. But it might not be!
+Since each (transitive) use of ``undef`` can observe a different value,
+the second addition is equivalent to ``add i32 undef, undef``, which is
+equivalent to ``undef``.
+Hence, the function above is equivalent to:
+
+.. code-block:: llvm
+
+    define i32 @fn() {
+      ret i32 undef
+    }
+
+Each call to this function may observe a different value, namely any 32-bit
+number (even and odd).
+
+Because each use of undef can observe a different value, some optimizations
+are wrong if we are not sure a value is not undef.
+Consider a function that multiplies a number by 2:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %v) {
+      %mul2 = mul i32 %v, 2
+      ret i32 %mul2
+    }
+
+This function is guaranteed to return an even number, even if ``%v`` is
+undef.
+However, as we've seen above, the following function does not:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %v) {
+      %mul2 = add i32 %v, %v
+      ret i32 %mul2
+    }
+
+This optimization is wrong just because undef values exist, even if they are
+not used in this part of the program as LLVM has no way to tell if ``%v`` is
+undef or not.
+
+.. note::
+   Uses of undef values should be restricted to representing loads of
+   uninitialized memory. This is the only part of the IR semantics that cannot
+   be replaced with alternatives yet (work in ongoing).
+
+Looking at the value lattice, ``undef`` values can only be replaced with either
+a ``freeze`` instruction or a concrete value.
+A consequence is that giving undef as an operand to an instruction that triggers
+UB for some values of that operand makes the program UB. For example,
+``udiv %x, undef`` is UB since we replace undef with 0 (``udiv %x, 0``),
+becoming obvious that it is UB.
+
+
+Poison Values
+-------------
+Poison values are a stronger from of deferred UB than undef. They still
+allow instructions to be executed speculatively, but they taint the whole
+expression DAG (with some exceptions), akin to floating point NaN values.
+
+Example:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %a, i32 %b, i32 %c) {
+      %add = add nsw i32 %a, %b
+      %ret = add nsw i32 %add, %c
+      ret i32 %ret
+    }
+
+The ``nsw`` attribute in the additions indicates that the operation yields
+poison if there is a signed overflow.
+If the first addition overflows, ``%add`` is poison and thus ``%ret`` is also
+poison since it taints the whole expression DAG.
+
+Poison values can be replaced with any value of type (undef, concrete values,
+or a ``freeze`` instruction).
+
+
+The Freeze Instruction
+======================
+Both undef and poison values sometimes propagate too much down an expression
+DAG. Undef values because each transitive use can observe a different value,
+and poison values because they make the whole DAG poison.
+There are some cases where it is important to stop such propagation.
+This is where the ``freeze`` instruction comes in.
+
+Take the following example function:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %n, i1 %c) {
+    entry:
+      br label %loop
+
+   loop:
+      %i = phi i32 [ 0, %entry ], [ %i2, %loop.end ]
+      %cond = icmp ule i32 %i, %n
+      br i1 %cond, label %loop.cont, label %exit
+
+   loop.cont:
+      br i1 %c, label %then, label %else
+
+    then:
+      ...
+      br label %loop.end
+
+    else:
+      ...
+      br label %loop.end
+
+    loop.end:
+      %i2 = add i32 %i, 1
+      br label %loop
+
+    exit:
+      ...
+    }
+
+Imagine we want to perform loop unswitching on the loop above since the branch
+condition inside the loop is loop invariant.
+We would obtain the following IR:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %n, i1 %c) {
+    entry:
+      br i1 %c, label %then, label %else
+
+   then:
+      %i = phi i32 [ 0, %entry ], [ %i2, %then.cont ]
+      %cond = icmp ule i32 %i, %n
+      br i1 %cond, label %then.cont, label %exit
+
+   then.cont:
+      ...
+      %i2 = add i32 %i, 1
+      br label %then
+
+   else:
+      %i3 = phi i32 [ 0, %entry ], [ %i4, %else.cont ]
+      %cond = icmp ule i32 %i3, %n
+      br i1 %cond, label %else.cont, label %exit
+
+   else.cont:
+      ...
+      %i4 = add i32 %i3, 1
+      br label %else
+
+    exit:
+      ...
+    }
+
+There is a subtle catch: when the function is called with ``%n`` being zero,
+the original function did not branch on ``%c``, while the optimized one does.
+Branching on a deferred UB value is immediate UB, hence the transformation is
+wrong in general because ``%c`` may be undef or poison.
+
+Cases like this need a way to tame deferred UB values. This is exactly what the
+``freeze`` instruction is for!
+When given a concrete value as argument, ``freeze`` is a no-op, returning the
+argument as-is. When given an undef or poison value, ``freeze`` returns a
+non-deterministic value of the type.
+This is not the same as undef: the value returned by ``freeze`` is the same
+for all users.
+
+Branching on a value returned by ``freeze`` is always safe since it either
+evaluates to true or false consistently.
+We can make the loop unswitching optimization above correct as follows:
+
+.. code-block:: llvm
+
+    define i32 @fn(i32 %n, i1 %c) {
+    entry:
+      %c2 = freeze i1 %c
+      br i1 %c2, label %then, label %else
+
+
+Writing Tests
+=============
+
+Avoiding UB
+-----------
+When writing tests, it is important to ensure that they don't trigger UB
+unnecessarily. Some automated test reduces sometimes use undef or poison
+values as dummy values, but this is considered a bad practice if this leads
+to triggering UB.
+
+For example, imagine that we want to write a test and we don't care about the
+particular divisor value because our optimization kicks in regardless:
+
+.. code-block:: llvm
+
+    define i32 @fn(i8 %a) {
+      %div = udiv i8 %a, poison
+      ...
+   }
+
+The issue with this test is that it triggers immediate UB. This prevents
+verification tools like Alive from validating the correctness of the
+optimization. Hence, it is considered a bad practice to have tests with
+unnecessary immediate UB (unless that is exactly what the test is for).
+The test above should use a dummy function argument instead of using poison:
+
+.. code-block:: llvm
+
+    define i32 @fn(i8 %a, i8 %dummy) {
+      %div = udiv i8 %a, %dummy
+      ...
+   }
+
+Common sources of immediate UB in tests include branching on undef/poison
+conditions and dereferencing undef/poison/null pointers.
+
+.. note::
+   If you need a placeholder value to pass as an argument to an instruction
+   that may trigger UB, add a new argument to the function rather than using
+   undef or poison.
+
+
+Reducing bitwidth
+-----------------
+To speed up automated verification of tests (e.g., using Alive), it is
+recommended that tests use low bitwidth formats and small vector sizes.
+For example, if we write a test to check that a multiplication by two is
+replaced by a shift left, we can do so using 8-bit integers instead of the
+usual 32-bit integers:
+
+.. code-block:: llvm
+
+    define i8 @fn(i8 %val) {
+      ; CHECK: %mul2 = shl %val, 1
+      %mul2 = mul i8 %val, 2
+      ret i8 %mul2
+   }