Skip to content

Conversation

@zhjwpku
Copy link
Collaborator

@zhjwpku zhjwpku commented Aug 17, 2025

No description provided.

@zhjwpku zhjwpku force-pushed the int128_and_decimal branch 2 times, most recently from 70068f9 to e48c9f8 Compare August 22, 2025 14:03
@zhjwpku zhjwpku marked this pull request as ready for review August 22, 2025 15:43
@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Aug 22, 2025

This is ready for review. I've extracted the Decimal128 implementation and tests from Arrow. Please help to review. @wgtmac @lidavidm @mapleFU

@wgtmac
Copy link
Member

wgtmac commented Aug 23, 2025

Could you explicitly mark which part is exactly ported from Arrow to make the current review and future sync process easier?

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Aug 24, 2025

Could you explicitly mark which part is exactly ported from Arrow to make the current review and future sync process easier?

That's not easy, Arrow's Decimal type uses a complex inheritance hierarchy and supports bit lengths in any 64-bit multiple. Since we only need Decimal128, l omitted the inheritance layer. But the algorithms adapted from Arrow is quite straightforward and should be easy for future sync.

@mapleFU
Copy link
Member

mapleFU commented Aug 24, 2025

What about a thirdparty/ vendor directory, and we can use it same as <arrow/xxx> but it's vendored by us? (Current solving is also ok to me)

@wgtmac
Copy link
Member

wgtmac commented Aug 24, 2025

I guess we cannot port arrow/util/decimal.h without significant change due to dependency on arrow utilities.

My concern is that we use std::array<uint64_t, 2> as the internal representation but use int128_t for testing. I'm not sure if we can directly use int128_t as the physical representation for the Decimal class. I did exactly this when I worked for my previous employer and it should work well.

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Aug 25, 2025

I guess we cannot port arrow/util/decimal.h without significant change due to dependency on arrow utilities.

My concern is that we use std::array<uint64_t, 2> as the internal representation but use int128_t for testing. I'm not sure if we can directly use int128_t as the physical representation for the Decimal class. I did exactly this when I worked for my previous employer and it should work well.

uint128_t is now used for Decimal multiplier, I think we can use int128_t for the division, let me try the int128_t physical representation idea for the next few days.

@zhjwpku zhjwpku force-pushed the int128_and_decimal branch 4 times, most recently from 6ab1a93 to a4b5dde Compare August 30, 2025 03:41
@zhjwpku zhjwpku force-pushed the int128_and_decimal branch 2 times, most recently from 2b23ca3 to bfe9fc1 Compare September 4, 2025 00:06
@zhjwpku zhjwpku requested a review from wgtmac September 6, 2025 01:41

Decimal::Decimal(std::string_view str) {
auto result = Decimal::FromString(str);
if (!result) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then please throw IcebergError and add a comment with \throw by recommending Decimal::FromString to be used in production.

@zhjwpku
Copy link
Collaborator Author

zhjwpku commented Sep 7, 2025

@mapleFU Can you take a look at this again, thanks.

@wgtmac
Copy link
Member

wgtmac commented Sep 8, 2025

This is mainly ported from Arrow C++. Could you help take a quick pass? @lidavidm @zeroshade

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me @zhjwpku Thanks for working on this, and thanks @wgtmac and @dongxiao1198 for the review 🙌

uint128_t value = 0;
ShiftAndAdd(dec.while_digits, value);
ShiftAndAdd(dec.fractional_digits, value);
Decimal result(static_cast<int128_t>(value));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind that Iceberg might also store the Decimal as an int32 or int64:
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reminder, will keep in mind, thanks.

@Fokko Fokko merged commit 0fc573a into apache:main Sep 16, 2025
7 checks passed
@zhjwpku zhjwpku deleted the int128_and_decimal branch September 17, 2025 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants