The documentations correctly says that the AreaUnderCurve will calculate the auc metric for the binary case : It is expected that ŷ be a vector of distributions over the binary set of unique elements of y.
However, the function can still work with more than 2 categoricals levels. Looking at the code, the positive class is always assumed to be the last one positive_class = CategoricalArrays.levels(first(ŷ))|> last. So, when passing labels UnivariateFinite{Multiclass{N}}, with N > 2, the value returned is the AUC for the N vs all other labels.
This is not a bug as the code is doing what the documentation says it will, but I'm wondering if the way in which it silently works (but wrong) on the multi class case can be improved (maybe with a warning, or an assertion error?)
Ideally, StatisticalMeasures could implement the macro auc for that case, which is the average of individual AUC scores.