GH-40911: [C++][Compute] Fix the decimal division kernel dispatching by zanmato1984 · Pull Request #47445 · apache/arrow

zanmato1984 · 2025-08-27T17:21:55Z

Rationale for this change

The issues in #40911 and #39875 are the same: we have a fundamental defect when dispatching kernels for decimal division. The following statements assume both the dividend and the divisor are of the same decimal type (Decimal32/64/128/256), with possibly different (p, s).

When doing DispatchBest, which is directly invoked through CallFunction("divide", ...), w/o trying DispatchExact ahead, the dividend is ALWAYS promoted and the result will have the same (p, s) as the dividend, according to the rule listed in our documentation [1] (this is actually adopting the Redshift one [2]).
When doing DispatchExact, which is first tried by expression evaluation, there will be a match w/o any promotions so the subsequent try of DispatchMatch won't happen.

The issue is obvious - DispatchExact and DispatchBest are conflicting - one saying "OK, for any decimal128(p1, s1) / decimal128(p2, s2), it is a match" and the other saying "No, we must promote the dividend according to (p1, s1) and (p2, s2)".

Then we actually have two choices to fix it:

Consider DispatchBest is doing the right thing (justified by [1]), and NEVER "exact match" any kernel for decimal division. This is what this PR does. The only problem is that we are basically ALWAYS rejecting a kernel to be "exactly matched" - weird, though functionally correct.
Consider DispatchExact is doing the right thing, and NOT promoting dividend in DispatchBest. The kernel is matched only based on their decimal type (not considering their (p, s)). And only the result is promoted (this also complies [1]). This is what the other attempting PR GH-40911: [C++] Remove decimal division's precision and scale calculate logic from implicit casts #40969 does. But that PR only claims a promoted result type w/o actually promoting the computation (i.e., the memory representation of a decimal needs to be promoted when doing the division) so the result is wrong. Though this is amendable by supporting basic decimal methods like PromoteAndDivide that does the promotion of the dividend and the division all together in one run, the modification can be cumbersome - the "scale up" needs to be propagated from the kernel definition all down to the basic decimal primitives. Besides, I assume this may not be as performant as doing batch promotion + batch division.

[1] https://docs.aws.amazon.com/redshift/latest/dg/r_numeric_computations201.html#r_numeric_computations201-precision-and-scale-of-computed-decimal-results
[2] https://arrow.apache.org/docs/cpp/compute.html#arithmetic-functions

What changes are included in this PR?

Suppress the DispatchExact for decimal division.

Also, the match constraint BinaryDecimalScale1GeScale2 introduced in #47297 becomes useless thus gets removed.

Are these changes tested?

Yes.

Are there any user-facing changes?

None.

GitHub Issue: [C++] Decimal divide promotion rule output wrong precision and scale in expression Bind when scale1 >= scale2 #40911

zanmato1984 · 2025-08-27T17:23:36Z

cpp/src/arrow/compute/kernels/scalar_arithmetic.cc

    out_type = OutputType(ResolveDecimalMultiplicationOutput);
  } else if (op == "divide") {
    out_type = OutputType(ResolveDecimalDivisionOutput);
-    constraint = BinaryDecimalScale1GeScale2();


We don't really need this constraint to suppress the exact matching as this is now done via overridden DispatchExact.

Why not a DecimalsHaveSameScaleAndPrecision? (or full type equality, which is exactly equivalent here)

That's the tricky part about decimal division - there is no "exact match" at all.

By definition we ALWAYS promote the dividend no matter their (p, s) are. For example, decimal(5, 1) / decimal(5, 1) = decimal(11, 6).

As long as we allow any exact match, the promotion won't happen.

Well, {decimal(5, 1), decimal(5, 1)} looks like an exact match in this example. The result type is unrelated to this.

Or you mean the dividend gets promoted to decimal(11, 6)?

Or you mean the dividend gets promoted to decimal(11, 6)?

Exactly. Except that it is actually promoted to decimal(11, 7) but you get the idea.

Hmm, thanks. Perhaps the PR description can be clearer about this?

And this is how we obey the resulting type rule we claim - promoting the dividend.

That said, there is an alternative though - as you implied in your previous comment

looks like an exact match in this example. The result type is unrelated to this.

This is also explained in my PR description approach 2. I didn't take that approach because that would require the promotion to happen during the underlying division for each individual value in the array. Can be cumbersome in terms of both coding and performance.

Ok, thanks for the explanation!

zanmato1984 · 2025-08-27T17:54:35Z

Some decimal division kernel dispatching tests need to update. Will do.

cc @pitrou @bkietz @ZhangHuiGui

zanmato1984 · 2025-08-28T10:18:26Z

The fix is now complete. @pitrou @bkietz @westonpace @ZhangHuiGui do you want to take a look? Thanks.

pitrou · 2025-08-28T10:44:22Z

cpp/src/arrow/compute/kernel_test.cc

@@ -476,26 +459,27 @@ TEST(KernelSignature, MatchesInputsWithConstraint) {
  auto small_scale_decimal = decimal128(precision, small_scale);
  auto big_scale_decimal = decimal128(precision, big_scale);


Perhaps the test would be more interesting if those types had different precisions?

This is not directly related to this PR though, updated by more interesting combinations of (p, s).

cpp/src/arrow/compute/kernels/test_util_internal.cc

pitrou

Thanks a lot for the elaborate explanations and answers @zanmato1984 . This is looking good and CI failures look unrelated.

conbench-apache-arrow · 2025-09-01T19:48:31Z

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 2e2aa0b.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

zanmato1984 requested a review from westonpace as a code owner August 27, 2025 17:21

github-actions bot added Component: C++ awaiting review Awaiting review labels Aug 27, 2025

zanmato1984 commented Aug 27, 2025

View reviewed changes

github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 27, 2025

zanmato1984 force-pushed the fix/gh-40911 branch from e981aa5 to 9a2879a Compare August 28, 2025 10:12

Fix the decimal division kernel dispatching

19672e1

zanmato1984 force-pushed the fix/gh-40911 branch from 9a2879a to 19672e1 Compare August 28, 2025 10:13

pitrou reviewed Aug 28, 2025

View reviewed changes

cpp/src/arrow/compute/kernels/test_util_internal.cc Show resolved Hide resolved

Address comment: make match constraint test richer

61da81d

pitrou approved these changes Sep 1, 2025

View reviewed changes

pitrou merged commit 2e2aa0b into apache:main Sep 1, 2025
40 of 42 checks passed

pitrou removed the awaiting committer review Awaiting committer review label Sep 1, 2025

pitrou mentioned this pull request Sep 1, 2025

[C++] Decimal divide promotion rule output wrong precision and scale in expression Bind when scale1 >= scale2 #40911

Closed

github-actions bot added the awaiting committer review Awaiting committer review label Sep 1, 2025

This was referenced Sep 1, 2025

GH-40911: [C++] Remove decimal division's precision and scale calculate logic from implicit casts #40969

Closed

[C++] Why arrow decimal divide precision and scale is not correct? #39875

Closed

zanmato1984 mentioned this pull request Sep 3, 2025

GH-41336: [C++][Compute] Fix case_when kernel dispatch for decimals with different precisions and scales #47479

Merged

		@@ -476,26 +459,27 @@ TEST(KernelSignature, MatchesInputsWithConstraint) {
		auto small_scale_decimal = decimal128(precision, small_scale);
		auto big_scale_decimal = decimal128(precision, big_scale);

Conversation

zanmato1984 commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zanmato1984 commented Aug 27, 2025

Uh oh!

zanmato1984 commented Aug 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

conbench-apache-arrow bot commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zanmato1984 commented Aug 27, 2025 •

edited by github-actions bot

Loading