GH-47287: [C++][Compute] Add constraint for kernel signature matching and use it for binary decimal arithmetic kernels by zanmato1984 · Pull Request #47297 · apache/arrow

zanmato1984 · 2025-08-08T17:43:02Z

Rationale for this change

A rework of #40223 using a more systematic alternative.

What changes are included in this PR?

Introduce a structure MatchConstraint for applying extra (and optional) matching constraint for kernel signature matching, in additional to simply input type checks.

Also implement two concrete MatchConstraints for binary decimal arithmetic kernels, to suppress exact match even if the input types are OK, for example, by requiring all decimal must be of the same scale for add and subtract, and s1 >= s2 for divide.

This should also be a fundamental enhancement to further resolve similar issues like:

Are these changes tested?

UT included.

Are there any user-facing changes?

New public class MatchConstraint.

GitHub Issue: [C++][Compute] Binary kernels should consider decimal scale constraints when matching #47287

…cimal arithmetic kernels

github-actions · 2025-08-08T17:43:31Z

⚠️ GitHub issue #47287 has been automatically assigned in GitHub to PR creator.

zanmato1984 · 2025-08-08T17:46:15Z

@github-actions crossbow submit -g cpp

zanmato1984 · 2025-08-08T17:47:38Z

Looping in some people who might be interested. @pitrou @kou @westonpace @bkietz @felipecrv @ZhangHuiGui

github-actions · 2025-08-08T17:48:47Z

Revision: 4897960

Submitted crossbow builds: ursacomputing/crossbow @ actions-fc605b322e

Task	Status
example-cpp-minimal-build-static
example-cpp-minimal-build-static-system-dependency
example-cpp-tutorial
test-build-cpp-fuzz
test-conda-cpp
test-conda-cpp-valgrind
test-cuda-cpp-ubuntu-22.04-cuda-11.7.1
test-debian-12-cpp-amd64
test-debian-12-cpp-i386
test-fedora-42-cpp
test-ubuntu-22.04-cpp
test-ubuntu-22.04-cpp-20
test-ubuntu-22.04-cpp-bundled
test-ubuntu-22.04-cpp-emscripten
test-ubuntu-22.04-cpp-no-threading
test-ubuntu-24.04-cpp
test-ubuntu-24.04-cpp-bundled-offline
test-ubuntu-24.04-cpp-gcc-13-bundled
test-ubuntu-24.04-cpp-gcc-14
test-ubuntu-24.04-cpp-minimal-with-formats
test-ubuntu-24.04-cpp-thread-sanitizer

Copilot

Pull Request Overview

This PR introduces a new MatchConstraint mechanism for applying additional validation constraints to kernel signature matching beyond basic input type checks. The primary purpose is to add more refined matching logic for binary decimal arithmetic operations.

Key changes include:

Introduction of MatchConstraint abstract base class and two concrete implementations for decimal operations
Integration of constraints into KernelSignature and ScalarFunction::AddKernel
Application of constraints to binary decimal arithmetic kernels (add/subtract require same scale, divide requires first scale >= second scale)

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
cpp/src/arrow/compute/kernel.h	Defines `MatchConstraint` interface and declares constraint factory functions
cpp/src/arrow/compute/kernel.cc	Implements `MatchConstraint` concrete classes and integrates constraint validation into signature matching
cpp/src/arrow/compute/function.h	Updates `ScalarFunction::AddKernel` to accept optional constraint parameter
cpp/src/arrow/compute/function.cc	Passes constraint to `KernelSignature::Make` in `AddKernel` implementation
cpp/src/arrow/compute/kernels/scalar_arithmetic.cc	Applies specific constraints to decimal binary arithmetic kernels
cpp/src/arrow/compute/kernel_test.cc	Adds comprehensive tests for the new constraint functionality

Copilot · 2025-08-08T17:50:08Z

cpp/src/arrow/compute/kernel.cc

+      return true;
+    }
+  };
+  static auto instance = std::make_shared<DecimalsHaveSameScaleConstraint>();


The static instance may not be thread-safe during initialization. Consider using std::call_once or similar synchronization mechanism to ensure thread-safe initialization.

Suggested change

static auto instance = std::make_shared<DecimalsHaveSameScaleConstraint>();

static std::once_flag once;

static std::shared_ptr<DecimalsHaveSameScaleConstraint> instance;

std::call_once(once, []() {

instance = std::make_shared<DecimalsHaveSameScaleConstraint>();

});

Hmm, did someone decide to add Copilot reviews to the Arrow repo, or is that up to each PR author?

IIRC it was in the "suggested reviewer" list, and I clicked "request" then.

It's not a apache/arrow configuration.
PR authors and apache/arrow committers can set Copilot as a reviewer manually.

To use Copilot, they need to enable Copilot explicitly for their account: https://docs.github.com/en/copilot/get-started/quickstart#sign-up-for-github-copilot

cpp/src/arrow/compute/kernel.cc

zanmato1984 · 2025-08-11T06:02:10Z

also cc @lidavidm

kou

+1

cpp/src/arrow/compute/kernel.cc

cpp/src/arrow/compute/kernel.h

cpp/src/arrow/compute/kernel.cc

kou · 2025-08-12T02:08:30Z

Can we merge this now?
Should we wait for a few days before we merge this?

zanmato1984 · 2025-08-12T03:54:04Z

Can we merge this now? Should we wait for a few days before we merge this?

I think we can keep it open for another two or three days. And thanks for reviewing!

pitrou · 2025-08-12T15:45:42Z

I can take a look on Monday.

pitrou · 2025-08-19T09:45:29Z

cpp/src/arrow/compute/kernel.h

+
+/// \brief Constraint that all binary input types are decimal types and the first type's
+/// scale >= the second type's.
+ARROW_EXPORT std::shared_ptr<MatchConstraint> BinaryDecimalScaleComparisonGE();


The name is unfortunately a bit cryptic. Also, it's not obvious to me why we would require the first decimal to have a larger scale...

Is BinaryDecimalScale1GeScale2 better?

Also, it's not obvious to me why we would require the first decimal to have a larger scale...

It's required by divide(decimal, decimal) kernels:

arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

Lines 624 to 639 in 69d2487

Result<TypeHolder> ResolveDecimalDivisionOutput(KernelContext*,

const std::vector<TypeHolder>& types) {

return ResolveDecimalBinaryOperationOutput(

types,

[](int32_t p1, int32_t s1, int32_t p2,

int32_t s2) -> Result<std::pair<int32_t, int32_t>> {

if (s1 < s2) {

return Status::Invalid("Division of two decimal types scale1 < scale2. ", "(",

s1, s2, ").");

}

DCHECK_GE(s1, s2);

const int32_t scale = s1 - s2;

const int32_t precision = p1;

return std::make_pair(precision, scale);

});

}

Is BinaryDecimalScale1GeScale2 better?

Yes :) Not pretty, but more explicit anyway.

Thanks. Updated.

pitrou · 2025-08-19T09:49:05Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

    out_type = OutputType(ResolveDecimalMultiplicationOutput);
  } else if (op == "divide") {
    out_type = OutputType(ResolveDecimalDivisionOutput);
+    constraint = BinaryDecimalScaleComparisonGE();


Since we add some constraints here, does it mean that we can remove some error return paths from the various ResolveDecimal functions?

Good point. Yes that's possible, as we are now suppressing the match for decimal arguments not qualifying the constraints. The explicit error returning could be a simple DCHECK now.

pitrou · 2025-08-19T09:49:58Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

+  DCHECK_OK(func->AddKernel({in_type128, in_type128}, out_type, exec128, /*init=*/nullptr,
+                            constraint));
+  DCHECK_OK(func->AddKernel({in_type256, in_type256}, out_type, exec256, /*init=*/nullptr,
+                            constraint));


Side note: do we have an issue open to extend this to Decimal32 and Decimal64, or are these kernels already registered elsewhere?

There is a TODO in #43956

pitrou

Looks good on the principle, I posted various minor comments with perhaps opportunities for improvement.

zanmato1984 · 2025-08-19T12:03:22Z

Thanks for looking @pitrou . I'll update soon.

zanmato1984 · 2025-08-20T10:34:32Z

Hi @pitrou , can we merge it now?

zanmato1984 · 2025-08-20T15:22:35Z

I'll later follow up on validating the related issues listed in the PR description. Hopefully we can close or fix most of them.

conbench-apache-arrow · 2025-08-20T22:18:47Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit dadc21f.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

…47445) ### Rationale for this change The issues in #40911 and #39875 are the same: we have a fundamental defect when dispatching kernels for decimal division. The following statements assume both the dividend and the divisor are of the same decimal type (`Decimal32/64/128/256`), with possibly different `(p, s)`. * When doing `DispatchBest`, which is directly invoked through `CallFunction("divide", ...)`, w/o trying `DispatchExact` ahead, the dividend is ALWAYS promoted and the result will have the same `(p, s)` as the dividend, according to the rule listed in our documentation [1] (this is actually adopting the Redshift one [2]). * When doing `DispatchExact`, which is first tried by expression evaluation, there will be a match w/o any promotions so the subsequent try of `DispatchMatch` won't happen. The issue is obvious - `DispatchExact` and `DispatchBest` are conflicting - one saying "OK, for any `decimal128(p1, s1) / decimal128(p2, s2)`, it is a match" and the other saying "No, we must promote the dividend according to `(p1, s1)` and `(p2, s2)`". Then we actually have two choices to fix it: 1. Consider `DispatchBest` is doing the right thing (justified by [1]), and NEVER "exact match" any kernel for decimal division. This is what this PR does. The only problem is that we are basically ALWAYS rejecting a kernel to be "exactly matched" - weird, though functionally correct. 2. Consider `DispatchExact` is doing the right thing, and NOT promoting dividend in `DispatchBest`. The kernel is matched only based on their decimal type (not considering their `(p, s)`). And only the result is promoted (this also complies [1]). This is what the other attempting PR #40969 does. But that PR only claims a promoted result type w/o actually promoting the computation (i.e., the memory representation of a decimal needs to be promoted when doing the division) so the result is wrong. Though this is amendable by supporting basic decimal methods like `PromoteAndDivide` that does the promotion of the dividend and the division all together in one run, the modification can be cumbersome - the "scale up" needs to be propagated from the kernel definition all down to the basic decimal primitives. Besides, I assume this may not be as performant as doing batch promotion + batch division. [1] https://docs.aws.amazon.com/redshift/latest/dg/r_numeric_computations201.html#r_numeric_computations201-precision-and-scale-of-computed-decimal-results [2] https://arrow.apache.org/docs/cpp/compute.html#arithmetic-functions ### What changes are included in this PR? Suppress the `DispatchExact` for decimal division. Also, the match constraint `BinaryDecimalScale1GeScale2` introduced in #47297 becomes useless thus gets removed. ### Are these changes tested? Yes. ### Are there any user-facing changes? None. * GitHub Issue: #40911 Authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Antoine Pitrou <antoine@python.org>

…ith different precisions and scales (#47479) ### Rationale for this change Another case of decimal kernels not able to suppress exact matching when precisions and scales of the arguments differ, causing wrong result type. After #47297, we have a systematic way to do that and guide the matching to go to the "best match" (applying implicit casts). ### What changes are included in this PR? Simply added a constraint match that checks if the precisions and scales of the decimal arguments are the same. Also added corresponding tests in forms of both expression (exact match first, then best match) and function call (best match only). ### Are these changes tested? Yes. ### Are there any user-facing changes? None. * GitHub Issue: #41336 Authored-by: Rossi Sun <zanmato1984@gmail.com> Signed-off-by: Rossi Sun <zanmato1984@gmail.com>

… not handle decimal arguments with different scales (#47459) ### Rationale for this change We used to be not able to suppress the exact matching for decimal arguments with different scales, when a decimal comparison kernel who actually requires the scales to be the same. This caused issue like #41011. The "match constraint" introduced in #47297 is exactly for fixing issues like this, by simply adding a proper constraint. ### What changes are included in this PR? Added match constraint that requires all decimal inputs have the same scale (like for decimal addition and subtract). ### Are these changes tested? Yes. ### Are there any user-facing changes? None. * GitHub Issue: #41011 Lead-authored-by: Rossi Sun <zanmato1984@gmail.com> Co-authored-by: Antoine Pitrou <pitrou@free.fr> Signed-off-by: Rossi Sun <zanmato1984@gmail.com>

Add constraint for kernel signature matching and use it for binary de…

4897960

…cimal arithmetic kernels

github-actions bot added Component: C++ awaiting review Awaiting review labels Aug 8, 2025

Fix lint

921b230

zanmato1984 requested review from Copilot, kou and pitrou August 8, 2025 17:49

Copilot AI reviewed Aug 8, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 8, 2025

zanmato1984 added 2 commits August 10, 2025 22:46

Merge main

f5c430c

Add dispatch exact test for decimal binary arithmetics

ce3e755

kou approved these changes Aug 11, 2025

View reviewed changes

cpp/src/arrow/compute/kernel.cc Outdated Show resolved Hide resolved

cpp/src/arrow/compute/kernel.h Outdated Show resolved Hide resolved

github-actions bot added awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review labels Aug 11, 2025

Address comments

b281c17

kou reviewed Aug 11, 2025

View reviewed changes

cpp/src/arrow/compute/kernel.cc Outdated Show resolved Hide resolved

github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels Aug 11, 2025

github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 12, 2025

zanmato1984 force-pushed the fix/gh-47287 branch from 310471f to 75c8761 Compare August 13, 2025 16:29

pitrou reviewed Aug 19, 2025

View reviewed changes

pitrou approved these changes Aug 19, 2025

View reviewed changes

zanmato1984 added 3 commits August 19, 2025 20:21

Address comment: convenience constraint maker from function

61ff30e

Address comment: remove stale error handling code

636bb36

Address comment: rename cryptic name

d20efc0

github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 19, 2025

pitrou approved these changes Aug 20, 2025

View reviewed changes

zanmato1984 merged commit dadc21f into apache:main Aug 20, 2025
41 of 43 checks passed

zanmato1984 removed the awaiting changes Awaiting changes label Aug 20, 2025

github-actions bot added the awaiting committer review Awaiting committer review label Aug 20, 2025

zanmato1984 mentioned this pull request Aug 20, 2025

[C++][Compute] Binary kernels should consider decimal scale constraints when matching #47287

Closed

zanmato1984 deleted the fix/gh-47287 branch August 20, 2025 15:22

zanmato1984 mentioned this pull request Aug 20, 2025

GH-47268: [C++][Compute] Fix discarded bad status for call binding #47284

Merged

zanmato1984 mentioned this pull request Aug 29, 2025

GH-41011: [C++][Compute] Fix the issue that comparison function could not handle decimal arguments with different scales #47459

Merged

zanmato1984 mentioned this pull request Sep 3, 2025

GH-41336: [C++][Compute] Fix case_when kernel dispatch for decimals with different precisions and scales #47479

Merged

-  static auto instance = std::make_shared<DecimalsHaveSameScaleConstraint>();
+  static std::once_flag once;
+  static std::shared_ptr<DecimalsHaveSameScaleConstraint> instance;
+  std::call_once(once, []() {
+    instance = std::make_shared<DecimalsHaveSameScaleConstraint>();
+  });

	Result<TypeHolder> ResolveDecimalDivisionOutput(KernelContext*,
	const std::vector<TypeHolder>& types) {
	return ResolveDecimalBinaryOperationOutput(
	types,
	[](int32_t p1, int32_t s1, int32_t p2,
	int32_t s2) -> Result<std::pair<int32_t, int32_t>> {
	if (s1 < s2) {
	return Status::Invalid("Division of two decimal types scale1 < scale2. ", "(",
	s1, s2, ").");
	}
	DCHECK_GE(s1, s2);
	const int32_t scale = s1 - s2;
	const int32_t precision = p1;
	return std::make_pair(precision, scale);
	});
	}

Conversation

zanmato1984 commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

zanmato1984 commented Aug 8, 2025

Uh oh!

zanmato1984 commented Aug 8, 2025

Uh oh!

github-actions bot commented Aug 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zanmato1984 commented Aug 11, 2025

Uh oh!

kou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kou commented Aug 12, 2025

Uh oh!

zanmato1984 commented Aug 12, 2025

Uh oh!

pitrou commented Aug 12, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Aug 19, 2025

Uh oh!

zanmato1984 commented Aug 20, 2025

Uh oh!

Uh oh!

zanmato1984 commented Aug 20, 2025

Uh oh!

conbench-apache-arrow bot commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

zanmato1984 commented Aug 8, 2025 •

edited

Loading