Commit cbd36b8
### Rationale for this change
Currently we drop all statistics if `SortOrder` is `UNKNOWN`. This seems too broad and there are some statistics, like `null_count` that could be maintained.
https://github.com/apache/arrow/blob/6f6138b7eedece0841b04f4e235e3bedf5a3ee29/cpp/src/parquet/metadata.cc#L330-L335
Clearing `min/max` but allowing to keep `null_count` when `SortOrder` is `UNKNOWN` would allow users to use them.
### What changes are included in this PR?
Maintain Statistics when reading them if `SortOrder::UNKNOWK` but clear min/max
### Are these changes tested?
Yes, there is a file on parquet-testing which allows us to validate this exact scenario.
### Are there any user-facing changes?
No changes to APIs, users will be able to read statistics on this case.
* GitHub Issue: #47449
Lead-authored-by: Raúl Cumplido <[email protected]>
Co-authored-by: Gang Wu <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
1 parent ef1af63 commit cbd36b8
File tree
4 files changed
+57
-13
lines changed- cpp/src/parquet
4 files changed
+57
-13
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
327 | 327 | | |
328 | 328 | | |
329 | 329 | | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
| 330 | + | |
334 | 331 | | |
335 | 332 | | |
336 | 333 | | |
337 | 334 | | |
338 | 335 | | |
339 | 336 | | |
340 | 337 | | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
341 | 342 | | |
342 | 343 | | |
343 | 344 | | |
| |||
1588 | 1589 | | |
1589 | 1590 | | |
1590 | 1591 | | |
1591 | | - | |
1592 | | - | |
1593 | | - | |
1594 | | - | |
1595 | | - | |
1596 | 1592 | | |
1597 | 1593 | | |
1598 | 1594 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
336 | 336 | | |
337 | 337 | | |
338 | 338 | | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
339 | 344 | | |
340 | 345 | | |
341 | 346 | | |
| |||
573 | 578 | | |
574 | 579 | | |
575 | 580 | | |
576 | | - | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
577 | 584 | | |
578 | 585 | | |
579 | 586 | | |
| |||
732 | 739 | | |
733 | 740 | | |
734 | 741 | | |
| 742 | + | |
735 | 743 | | |
736 | 744 | | |
737 | 745 | | |
| |||
832 | 840 | | |
833 | 841 | | |
834 | 842 | | |
| 843 | + | |
835 | 844 | | |
836 | 845 | | |
837 | 846 | | |
| |||
894 | 903 | | |
895 | 904 | | |
896 | 905 | | |
897 | | - | |
| 906 | + | |
898 | 907 | | |
899 | 908 | | |
900 | 909 | | |
| |||
909 | 918 | | |
910 | 919 | | |
911 | 920 | | |
912 | | - | |
| 921 | + | |
913 | 922 | | |
914 | 923 | | |
915 | 924 | | |
| |||
1104 | 1113 | | |
1105 | 1114 | | |
1106 | 1115 | | |
| 1116 | + | |
1107 | 1117 | | |
1108 | 1118 | | |
1109 | 1119 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
163 | 163 | | |
164 | 164 | | |
165 | 165 | | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
166 | 174 | | |
167 | 175 | | |
168 | 176 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
916 | 916 | | |
917 | 917 | | |
918 | 918 | | |
919 | | - | |
| 919 | + | |
920 | 920 | | |
921 | 921 | | |
922 | 922 | | |
| |||
1632 | 1632 | | |
1633 | 1633 | | |
1634 | 1634 | | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
1635 | 1665 | | |
1636 | 1666 | | |
1637 | 1667 | | |
| |||
0 commit comments