[HLSL][Doc] Document multi-argument resolution (llvm#104474)

llvm-beanz · web-flow · commit 02654f737063 · 2024-08-30T16:18:46.000-05:00
This updates the expected diffferences document to capture the difference in multi-argument overload resolution between Clang and DXC. Fixes llvm#99530
diff --git a/clang/docs/HLSL/ExpectedDifferences.rst b/clang/docs/HLSL/ExpectedDifferences.rst
@@ -54,6 +54,19 @@ HLSL 202x based on proposal
 and
 `0008 <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.
 
+The largest difference between Clang and DXC's overload resolution is the
+algorithm used for identifying best-match overloads. There are more details
+about the algorithmic differences in the :ref:`multi_argument_overloads` section
+below. There are three high level differences that should be highlighted:
+
+* **There should be no cases** where DXC and Clang both successfully
+  resolve an overload where the resolved overload is different between the two.
+* There are cases where Clang will successfully resolve an overload that DXC
+  wouldn't because we've trimmed the overload set in Clang to remove ambiguity.
+* There are cases where DXC will successfully resolve an overload that Clang
+  will not for two reasons: (1) DXC only generates partial overload sets for
+  builtin functions and (2) DXC resolves cases that probably should be ambiguous.
+
 Clang's implementation extends standard overload resolution rules to HLSL
 library functionality. This causes subtle changes in overload resolution
 behavior between Clang and DXC. Some examples include:
@@ -71,18 +84,23 @@ behavior between Clang and DXC. Some examples include:
     uint U;
     int I;
     float X, Y, Z;
-    double3 A, B;
+    double3 R, G;
   }
 
-  void twoParams(int, int);
-  void twoParams(float, float);
+  void takesSingleDouble(double);
+  void takesSingleDouble(vector<double, 1>);
+
+  void scalarOrVector(double);
+  void scalarOrVector(vector<double, 2>);
 
   export void call() {
-    halfOrInt16(U); // DXC: Fails with call ambiguous between int16_t and uint16_t overloads
-                    // Clang: Resolves to halfOrInt16(uint16_t).
-    halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t).
     half H;
+    halfOrInt16(I); // All: Resolves to halfOrInt16(int16_t).
+
   #ifndef IGNORE_ERRORS
+    halfOrInt16(U); // All: Fails with call ambiguous between int16_t and uint16_t
+                    // overloads
+
     // asfloat16 is a builtin with overloads for half, int16_t, and uint16_t.
     H = asfloat16(I); // DXC: Fails to resolve overload for int.
                       // Clang: Resolves to asfloat16(int16_t).
@@ -94,21 +112,28 @@ behavior between Clang and DXC. Some examples include:
 
     takesDoubles(X, Y, Z); // Works on all compilers
   #ifndef IGNORE_ERRORS
-    fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to double.
+    fma(X, Y, Z); // DXC: Fails to resolve no known conversion from float to
+                  //   double.
                   // Clang: Resolves to fma(double,double,double).
-  #endif
 
-    double D = dot(A, B); // DXC: Resolves to dot(double3, double3), fails DXIL Validation.
+    double D = dot(R, G); // DXC: Resolves to dot(double3, double3), fails DXIL Validation.
                           // FXC: Expands to compute double dot product with fmul/fadd
-                          // Clang: Resolves to dot(float3, float3), emits conversion warnings.
+                          // Clang: Fails to resolve as ambiguous against
+                          //   dot(half, half) or dot(float, float)
+  #endif
 
   #ifndef IGNORE_ERRORS
     tan(B); // DXC: resolves to tan(float).
             // Clang: Fails to resolve, ambiguous between integer types.
 
-    twoParams(I, X); // DXC: resolves twoParams(int, int).
-                     // Clang: Fails to resolve ambiguous conversions.
   #endif
+
+    double D;
+    takesSingleDouble(D); // All: Fails to resolve ambiguous conversions.
+    takesSingleDouble(R); // All: Fails to resolve ambiguous conversions.
+
+    scalarOrVector(D); // All: Resolves to scalarOrVector(double).
+    scalarOrVector(R); // All: Fails to resolve ambiguous conversions.
   }
 
 .. note::
@@ -119,3 +144,75 @@ behavior between Clang and DXC. Some examples include:
   diagnostic notifying the user of the conversion rather than silently altering
   precision relative to the other overloads (as FXC does) or generating code
   that will fail validation (as DXC does).
+
+.. _multi_argument_overloads:
+
+Multi-Argument Overloads
+------------------------
+
+In addition to the differences in single-element conversions, Clang and DXC
+differ dramatically in multi-argument overload resolution. C++ multi-argument
+overload resolution behavior (or something very similar) is required to
+implement
+`non-member operator overloading <https://github.com/microsoft/hlsl-specs/blob/main/proposals/0008-non-member-operator-overloading.md>`_.
+
+Clang adopts the C++ inspired language from the
+`draft HLSL specification <https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf>`_,
+where an overload ``f1`` is a better candidate than ``f2`` if for all arguments the
+conversion sequences is not worse than the corresponding conversion sequence and
+for at least one argument it is better.
+
+.. code-block:: c++
+
+  cbuffer CB {
+    int I;
+    float X;
+    float4 V;
+  }
+
+  void twoParams(int, int);
+  void twoParams(float, float);
+  void threeParams(float, float, float);
+  void threeParams(float4, float4, float4);
+
+  export void call() {
+    twoParams(I, X); // DXC: resolves twoParams(int, int).
+                     // Clang: Fails to resolve ambiguous conversions.
+
+    threeParams(X, V, V); // DXC: resolves threeParams(float4, float4, float4).
+                          // Clang: Fails to resolve ambiguous conversions.
+  }
+
+For the examples above since ``twoParams`` called with mixed parameters produces
+implicit conversion sequences that are { ExactMatch, FloatingIntegral }  and {
+FloatingIntegral, ExactMatch }. In both cases an argument has a worse conversion
+in the other sequence, so the overload is ambiguous.
+
+In the ``threeParams`` example the sequences are { ExactMatch, VectorTruncation,
+VectorTruncation } or { VectorSplat, ExactMatch, ExactMatch }, again in both
+cases at least one parameter has a worse conversion in the other sequence, so
+the overload is ambiguous.
+
+.. note::
+
+  The behavior of DXC documented below is undocumented so this is gleaned from
+  observation and a bit of reading the source.
+
+DXC's approach for determining the best overload produces an integer score value
+for each implicit conversion sequence for each argument expression. Scores for
+casts are based on a bitmask construction that is complicated to reverse
+engineer. It seems that:
+
+* Exact match is 0
+* Dimension increase is 1
+* Promotion is 2
+* Integral -> Float conversion is 4
+* Float -> Integral conversion is 8
+* Cast is 16
+
+The masks are or'd against each other to produce a score for the cast.
+
+The scores of each conversion sequence are then summed to generate a score for
+the overload candidate. The overload candidate with the lowest score is the best
+candidate. If more than one overload are matched for the lowest score the call
+is ambiguous.