Skip to content
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
cba31c7
Support decimal32/64 in schema conversion
curioustien Jan 19, 2025
a9398a2
Support decimal32/64 in column writer
curioustien Jan 19, 2025
e1dc023
Restrict column writer with correct decimal types
curioustien Jan 19, 2025
6032b02
Support decimal32/64 in reader & vector kernels & tests
curioustien Jan 19, 2025
290de24
Pyarrow parquet to pandas
curioustien Jan 26, 2025
e5b996e
Address comments
curioustien Feb 15, 2025
44f1adc
Add more tests in arrow_schema_test
curioustien Feb 15, 2025
c017323
Add more tests in arrow_reader_writer_test
curioustien Feb 16, 2025
63d307b
Add more typed tests for small decimals
curioustien Feb 16, 2025
77dd7d3
Document new flag
curioustien Feb 16, 2025
d81cf13
Add decimal32/64 list type support arrow to pandas
curioustien Feb 16, 2025
424472f
Support smallest_decimal_enabled flag in pyarrow
curioustien Feb 16, 2025
d1687a7
Revert writer schema manifest arg passing change
curioustien Mar 9, 2025
1f0fb7b
Merge remote-tracking branch 'upstream/main' into parquet-decimal-test
curioustien Mar 22, 2025
52711d5
Fix lint
curioustien Mar 22, 2025
f64d6d9
Remove extra doc
curioustien Mar 22, 2025
3fb307e
Revert FileReader changes
curioustien Mar 29, 2025
f279349
Delay scratch buffer pointer cast
curioustien Mar 29, 2025
8a78c72
Use ArrowReaderProperties
curioustien Mar 29, 2025
29e98ff
Merge remote-tracking branch 'upstream/main' into parquet-decimal-test
curioustien Mar 29, 2025
d2e1ffa
Revert "Delay scratch buffer pointer cast"
curioustien Apr 4, 2025
a8304f3
Remove mistake include
curioustien Apr 4, 2025
de295e3
Merge remote-tracking branch 'upstream/main' into parquet-decimal-test
curioustien Apr 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion cpp/src/arrow/compute/kernels/vector_hash.cc
Original file line number Diff line number Diff line change
Expand Up @@ -555,6 +555,7 @@ KernelInit GetHashInit(Type::type type_id) {
case Type::DATE32:
case Type::TIME32:
case Type::INTERVAL_MONTHS:
case Type::DECIMAL32:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the changes required to the compute kernels required to support Parquet? I can't see why but I might be missing something. Otherwise, we should move adding support for decimal32 and decimal64 to those compute kernels on a different PR and leave this one only with the required parquet changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I see now, on the description says this is required for some tests:
Allow decimal32/64 in Arrow compute vector hash which is needed for some of the existing Parquet tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down to split this change to another PR which can cover this support with more tests on the arrow compute side. But yes, there are a few tests in Parquet that hit arrow vector kernel code path

return HashInit<RegularHashKernel<UInt32Type, Action>>;
case Type::INT64:
case Type::UINT64:
Expand All @@ -564,6 +565,7 @@ KernelInit GetHashInit(Type::type type_id) {
case Type::TIMESTAMP:
case Type::DURATION:
case Type::INTERVAL_DAY_TIME:
case Type::DECIMAL64:
return HashInit<RegularHashKernel<UInt64Type, Action>>;
case Type::BINARY:
case Type::STRING:
Expand Down Expand Up @@ -707,7 +709,7 @@ void AddHashKernels(VectorFunction* func, VectorKernel base, OutputType out_ty)
DCHECK_OK(func->AddKernel(base));
}

for (auto t : {Type::DECIMAL128, Type::DECIMAL256}) {
for (auto t : {Type::DECIMAL32, Type::DECIMAL64, Type::DECIMAL128, Type::DECIMAL256}) {
base.init = GetHashInit<Action>(t);
base.signature = KernelSignature::Make({t}, out_ty);
DCHECK_OK(func->AddKernel(base));
Expand Down
3 changes: 2 additions & 1 deletion cpp/src/arrow/compute/kernels/vector_selection.cc
Original file line number Diff line number Diff line change
Expand Up @@ -308,7 +308,8 @@ std::shared_ptr<VectorFunction> MakeIndicesNonZeroFunction(std::string name,
AddKernels(NumericTypes());
AddKernels({boolean()});

for (const auto& ty : {Type::DECIMAL128, Type::DECIMAL256}) {
for (const auto& ty :
{Type::DECIMAL32, Type::DECIMAL64, Type::DECIMAL128, Type::DECIMAL256}) {
kernel.signature = KernelSignature::Make({ty}, uint64());
DCHECK_OK(func->AddKernel(kernel));
}
Expand Down
4 changes: 3 additions & 1 deletion cpp/src/arrow/dataset/file_parquet.cc
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,8 @@ parquet::ArrowReaderProperties MakeArrowReaderProperties(
parquet_scan_options.arrow_reader_properties->cache_options());
arrow_properties.set_io_context(
parquet_scan_options.arrow_reader_properties->io_context());
arrow_properties.set_smallest_decimal_enabled(
parquet_scan_options.arrow_reader_properties->smallest_decimal_enabled());
arrow_properties.set_use_threads(options.use_threads);
return arrow_properties;
}
Expand Down Expand Up @@ -532,7 +534,7 @@ Future<std::shared_ptr<parquet::arrow::FileReader>> ParquetFileFormat::GetReader
metadata)
.Then(
[=](const std::unique_ptr<parquet::ParquetFileReader>& reader) mutable
-> Result<std::shared_ptr<parquet::arrow::FileReader>> {
-> Result<std::shared_ptr<parquet::arrow::FileReader>> {
auto arrow_properties = MakeArrowReaderProperties(
*self, *reader->metadata(), *options, *parquet_scan_options);

Expand Down
Loading