Commit 7189472
authored
### Rationale for this change
The `is_{min/max}_value_exact` fields exist on the thrift definition and some implementations are already using them and truncating min and max values. This PR aims to expose those values and to default to true when writing files on C++ as no truncation is happening at the moment. If min/max statistics are generated we can set `is_{min/max}_value_exact` to true.
Truncation for string and binary min/max is out of scope for this PR, we can do this on a following one.
### What changes are included in this PR?
- The fields have been added to EncodedStatistics and Statistics along with the Thrift integration.
- Tests and validation with new parquet-testing file generated where there fields are present (apache/parquet-testing#88)
- Tests with existing files without the fields.
- Update existing tests to validate the new fields.
- Add new fields to `ParquetFilePrinter`
### Are these changes tested?
Yes on CI.
### Are there any user-facing changes?
Yes, the new fields will be available for the users on the API when reading Parquet files.
* GitHub Issue: #46905
Authored-by: Raúl Cumplido <[email protected]>
Signed-off-by: Antoine Pitrou <[email protected]>
1 parent 2059243 commit 7189472
File tree
8 files changed
+322
-20
lines changed- cpp
- src/parquet
- submodules
8 files changed
+322
-20
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
95 | 103 | | |
96 | 104 | | |
97 | 105 | | |
| |||
100 | 108 | | |
101 | 109 | | |
102 | 110 | | |
103 | | - | |
| 111 | + | |
104 | 112 | | |
105 | 113 | | |
106 | 114 | | |
107 | 115 | | |
108 | 116 | | |
109 | 117 | | |
110 | 118 | | |
111 | | - | |
| 119 | + | |
| 120 | + | |
112 | 121 | | |
113 | 122 | | |
114 | 123 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
166 | 166 | | |
167 | 167 | | |
168 | 168 | | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
169 | 177 | | |
170 | 178 | | |
171 | | - | |
| 179 | + | |
172 | 180 | | |
173 | | - | |
| 181 | + | |
174 | 182 | | |
175 | 183 | | |
176 | 184 | | |
| |||
342 | 350 | | |
343 | 351 | | |
344 | 352 | | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
345 | 369 | | |
346 | 370 | | |
347 | 371 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1023 | 1023 | | |
1024 | 1024 | | |
1025 | 1025 | | |
1026 | | - | |
| 1026 | + | |
1027 | 1027 | | |
1028 | 1028 | | |
1029 | 1029 | | |
| |||
1108 | 1108 | | |
1109 | 1109 | | |
1110 | 1110 | | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
1111 | 1142 | | |
1112 | 1143 | | |
1113 | 1144 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
590 | 590 | | |
591 | 591 | | |
592 | 592 | | |
| 593 | + | |
| 594 | + | |
593 | 595 | | |
594 | 596 | | |
595 | 597 | | |
596 | 598 | | |
597 | 599 | | |
598 | 600 | | |
599 | 601 | | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
600 | 614 | | |
601 | 615 | | |
602 | 616 | | |
| |||
613 | 627 | | |
614 | 628 | | |
615 | 629 | | |
| 630 | + | |
| 631 | + | |
616 | 632 | | |
617 | 633 | | |
618 | 634 | | |
| |||
659 | 675 | | |
660 | 676 | | |
661 | 677 | | |
662 | | - | |
| 678 | + | |
| 679 | + | |
| 680 | + | |
663 | 681 | | |
664 | 682 | | |
665 | 683 | | |
| |||
742 | 760 | | |
743 | 761 | | |
744 | 762 | | |
| 763 | + | |
| 764 | + | |
745 | 765 | | |
746 | 766 | | |
747 | 767 | | |
| |||
757 | 777 | | |
758 | 778 | | |
759 | 779 | | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
760 | 786 | | |
761 | 787 | | |
762 | 788 | | |
| |||
821 | 847 | | |
822 | 848 | | |
823 | 849 | | |
| 850 | + | |
| 851 | + | |
824 | 852 | | |
825 | 853 | | |
826 | 854 | | |
| |||
1042 | 1070 | | |
1043 | 1071 | | |
1044 | 1072 | | |
1045 | | - | |
| 1073 | + | |
| 1074 | + | |
1046 | 1075 | | |
1047 | 1076 | | |
1048 | 1077 | | |
| |||
1052 | 1081 | | |
1053 | 1082 | | |
1054 | 1083 | | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
| 1087 | + | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
1055 | 1096 | | |
1056 | 1097 | | |
1057 | 1098 | | |
1058 | 1099 | | |
1059 | | - | |
| 1100 | + | |
| 1101 | + | |
1060 | 1102 | | |
1061 | 1103 | | |
1062 | 1104 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
128 | 128 | | |
129 | 129 | | |
130 | 130 | | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
131 | 134 | | |
132 | 135 | | |
133 | 136 | | |
| |||
151 | 154 | | |
152 | 155 | | |
153 | 156 | | |
| 157 | + | |
154 | 158 | | |
155 | 159 | | |
156 | 160 | | |
157 | 161 | | |
| 162 | + | |
158 | 163 | | |
159 | 164 | | |
160 | 165 | | |
| |||
223 | 228 | | |
224 | 229 | | |
225 | 230 | | |
| 231 | + | |
| 232 | + | |
| 233 | + | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
| 240 | + | |
| 241 | + | |
| 242 | + | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
226 | 253 | | |
227 | 254 | | |
228 | 255 | | |
| |||
259 | 286 | | |
260 | 287 | | |
261 | 288 | | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
262 | 297 | | |
263 | 298 | | |
264 | 299 | | |
| |||
376 | 411 | | |
377 | 412 | | |
378 | 413 | | |
379 | | - | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
380 | 431 | | |
381 | 432 | | |
382 | 433 | | |
0 commit comments