You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calculates the area under the receiver operating characteristic (ROC) curve.
497
+
A ROC curve is created by plotting True Positive Rate (TPR) on the y-axis and False Positive Rate (FPR) on the x-axis across all thresholds.
498
+
The resulting value ranges from zero to one, with a higher value indicating better model performance.
499
+
500
+
The ROC AUC (also known as simply AUC) is a concept in machine learning.
501
+
For more details, please see [here](https://developers.google.com/machine-learning/glossary#pr-auc-area-under-the-pr-curve), [here](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc#expandable-1) and [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).
{"scores", "Scores prediction model gives. [`Array(T)`](/sql-reference/data-types/array) of [Integers](../data-types/int-uint.md) or [Floats](../data-types/float.md)."},
506
+
{"labels", "Labels of samples, usually 1 for positive sample and 0 for negative sample. [Array](/sql-reference/data-types/array) of [Integers](../data-types/int-uint.md) or [Enums](../data-types/enum.md)."},
507
+
{"scale", "Decides whether to return the normalized area. If false, returns the area under the TP (true positives) x FP (false positives) curve instead. Default value: true. [Bool](../data-types/boolean.md). Optional."},
508
+
{"partial_offsets", R"(
509
+
- An array of four non-negative integers for calculating a partial area under the ROC curve (equivalent to a vertical band of the ROC space) instead of the whole AUC. This option is useful for distributed computation of the ROC AUC. The array must contain the following elements [`higher_partitions_tp`, `higher_partitions_fp`, `total_positives`, `total_negatives`]. [Array](/sql-reference/data-types/array) of non-negative [Integers](../data-types/int-uint.md). Optional.
510
+
- `higher_partitions_tp`: The number of positive labels in the higher-scored partitions.
511
+
- `higher_partitions_fp`: The number of negative labels in the higher-scored partitions.
512
+
- `total_positives`: The total number of positive samples in the entire dataset.
513
+
- `total_negatives`: The total number of negative samples in the entire dataset.
514
+
515
+
::::note
516
+
When `arr_partial_offsets` is used, the `arr_scores` and `arr_labels` should be only a partition of the entire dataset, containing an interval of scores.
517
+
The dataset should be divided into contiguous partitions, where each partition contains the subset of the data whose scores fall within a specific range.
518
+
For example:
519
+
- One partition could contain all scores in the range [0, 0.5).
520
+
- Another partition could contain scores in the range [0.5, 1.0].
521
+
::::
522
+
)"}
523
+
};
524
+
FunctionDocumentation::ReturnedValue returned_value_roc = "Returns area under the receiver operating characteristic (ROC) curve. [Float64](../data-types/float.md).";
Calculates the area under the precision-recall (PR) curve.
536
+
A precision-recall curve is created by plotting precision on the y-axis and recall on the x-axis across all thresholds.
537
+
The resulting value ranges from 0 to 1, with a higher value indicating better model performance.
538
+
The PR AUC is particularly useful for imbalanced datasets, providing a clearer comparison of performance compared to ROC AUC on those cases.
539
+
For more details, please see [here](https://developers.google.com/machine-learning/glossary#pr-auc-area-under-the-pr-curve), [here](https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc#expandable-1) and [here](https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve).
{"cores", "Scores prediction model gives. [Array](/sql-reference/data-types/array) of [Integers](../data-types/int-uint.md) or [Floats](../data-types/float.md)."},
544
+
{"labels", "Labels of samples, usually 1 for positive sample and 0 for negative sample. [Array](/sql-reference/data-types/array) of [Integers](../data-types/int-uint.md) or [Enums](../data-types/enum.md)."},
545
+
{"partial_offsets", R"(
546
+
- Optional. An [`Array(T)`](/sql-reference/data-types/array) of three non-negative integers for calculating a partial area under the PR curve (equivalent to a vertical band of the PR space) instead of the whole AUC. This option is useful for distributed computation of the PR AUC. The array must contain the following elements [`higher_partitions_tp`, `higher_partitions_fp`, `total_positives`]. [Array](/sql-reference/data-types/array) of non-negative [Integers](../data-types/int-uint.md). Optional.
547
+
- `higher_partitions_tp`: The number of positive labels in the higher-scored partitions.
548
+
- `higher_partitions_fp`: The number of negative labels in the higher-scored partitions.
549
+
- `total_positives`: The total number of positive samples in the entire dataset.
550
+
551
+
::::note
552
+
When `arr_partial_offsets` is used, the `arr_scores` and `arr_labels` should be only a partition of the entire dataset, containing an interval of scores.
553
+
The dataset should be divided into contiguous partitions, where each partition contains the subset of the data whose scores fall within a specific range.
554
+
For example:
555
+
- One partition could contain all scores in the range [0, 0.5).
556
+
- Another partition could contain scores in the range [0.5, 1.0].
557
+
::::
558
+
)"}
559
+
};
560
+
FunctionDocumentation::ReturnedValue returned_value_pr = "Returns area under the precision-recall (PR) curve. [Float64](../data-types/float.md).";
{"func(x [, y1, ..., yN])", "A lambda function `func(x [, y1, y2, ... yN]) → F(x [, y1, y2, ... yN])` which operates on elements of the source array (`x`) and condition arrays (`y`). [Lambda function](/sql-reference/functions/overview#arrow-operator-and-lambda)."},
142
+
{"source", "The source array to process [`Array(T)`](/sql-reference/data-types/array)."},
143
+
{"[, cond1, ... , condN]", "Optional. N condition arrays providing additional arguments to the lambda function. [`Array(T)`](/sql-reference/data-types/array)."},
144
+
};
145
+
FunctionDocumentation::ReturnedValue returned_value = "Returns an array. [`Array(T)`](/sql-reference/data-types/array).";
146
+
FunctionDocumentation::Examples examples = {
147
+
{"Example with single array", "SELECT arrayFill(x -> not isNull(x), [1, null, 2, null]) AS res", "[1,1,2,2]"},
148
+
{"Example with two arrays", "SELECT arrayFill(x, y, z -> x > y AND x < z, [5, 3, 6, 2], [4, 7, 1, 3], [10, 2, 8, 5]) AS res", "[5,5,6,6]"}
{"func(x[, y1, ..., yN])", "A lambda function which operates on elements of the source array (`x`) and condition arrays (`y`). [Lambda function](/sql-reference/functions/overview#arrow-operator-and-lambda)."},
167
+
{"source", "The source array to process [`Array(T)`](/sql-reference/data-types/array)."},
168
+
{"[, cond1, ... , condN]", "Optional. N condition arrays providing additional arguments to the lambda function. [`Array(T)`](/sql-reference/data-types/array)."},
169
+
};
170
+
FunctionDocumentation::ReturnedValue returned_value_reverse = "Returns an array with elements of the source array replaced by the results of the lambda. [`Array(T)`](/sql-reference/data-types/array).";
FunctionDocumentation::Description description = "Returns an array containing only the elements in the source array for which a lambda function returns something other than `0`.";
{"func(x[, y1, ..., yN])", "A lambda function which operates on elements of the source array (`x`) and condition arrays (`y`). [Lambda function](/sql-reference/functions/overview#arrow-operator-and-lambda)."},
56
+
{"source", "The source array to process [`Array(T)`](/sql-reference/data-types/array)."},
57
+
{"[, cond1, ... , condN]", "Optional. N condition arrays providing additional arguments to the lambda function. [`Array(T)`](/sql-reference/data-types/array)."},
58
+
};
59
+
FunctionDocumentation::ReturnedValue returned_value = "Returns a subset of the source array. [`Array(T)`](/sql-reference/data-types/array).";
60
+
FunctionDocumentation::Examples examples = {
61
+
{"Example 1", "SELECT arrayFilter(x -> x LIKE '%World%', ['Hello', 'abc World']) AS res", "['abc World']"},
{"func", "A lambda function which operates on elements of the source array (`x`) and condition arrays (`y`). [Lambda function](/sql-reference/functions/overview#arrow-operator-and-lambda)."},
15
+
{"arr", "N arrays to process. [Array(T)](/sql-reference/data-types/array)."},
16
+
};
17
+
FunctionDocumentation::ReturnedValue returned_value = "Returns an array from the lambda results. [`Array(T)`](/sql-reference/data-types/array)";
0 commit comments