Commit 1a000ab
refactor: rename scan.allowIncompatible to scan.unsignedSmallIntSafetyCheck (apache#3238)
* refactor: rename scan.allowIncompatible to scan.unsignedSmallIntSafetyCheck
This change renames `spark.comet.scan.allowIncompatible` to
`spark.comet.scan.unsignedSmallIntSafetyCheck` and changes its default
to `true` (enabled by default).
The key change is that ByteType is removed from the safety check entirely,
leaving only ShortType subject to fallback behavior.
## Why ByteType is Safe
ByteType columns are always safe for native execution because:
1. **Parquet type mapping**: Spark's ByteType can only originate from signed
INT8 in Parquet. There is no unsigned 8-bit Parquet type (UINT_8) that maps
to ByteType.
2. **UINT_8 maps to ShortType**: When Parquet files contain unsigned UINT_8
columns, Spark maps them to ShortType (16-bit), not ByteType. This is
because UINT_8 values (0-255) exceed the signed byte range (-128 to 127).
3. **Truncation preserves signed values**: When storing signed INT8 in 8 bits,
the truncation from any wider representation preserves the correct signed
value due to two's complement representation.
## Why ShortType Needs the Safety Check
ShortType columns may be problematic because:
1. **Ambiguous origin**: ShortType can come from either signed INT16 (safe) or
unsigned UINT_8 (potentially incompatible).
2. **Different reader behavior**: Arrow-based readers like DataFusion respect
the unsigned UINT_8 logical type and read data as unsigned, while Spark
ignores the logical type and reads as signed. This can produce different
results for values 128-255.
3. **No metadata available**: At scan time, Comet cannot determine whether a
ShortType column originated from INT16 or UINT_8, so the safety check
conservatively falls back to Spark for all ShortType columns.
Users who know their data does not contain unsigned UINT_8 columns can disable
the safety check with `spark.comet.scan.unsignedSmallIntSafetyCheck=false`.
Co-Authored-By: Claude Opus 4.5 <[email protected]>
* format
* rename
* rename
* Fix clippy warnings for Rust 1.93
- Use local `root_op` variable instead of unwrapping `exec_context.root_op`
- Replace `is_some()` + `unwrap()` pattern with `if let Some(...)`
Co-Authored-By: Claude Opus 4.5 <[email protected]>
---------
Co-authored-by: Claude Opus 4.5 <[email protected]>1 parent 1b75777 commit 1a000ab
File tree
8 files changed
+37
-29
lines changed- common/src/main/scala/org/apache/comet
- docs/source/contributor-guide
- spark/src
- main/scala/org/apache/comet/rules
- test/scala/org/apache
- comet
- parquet
- rules
- spark/sql
8 files changed
+37
-29
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
766 | 766 | | |
767 | 767 | | |
768 | 768 | | |
769 | | - | |
770 | | - | |
| 769 | + | |
| 770 | + | |
771 | 771 | | |
772 | | - | |
773 | | - | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
774 | 779 | | |
775 | | - | |
| 780 | + | |
776 | 781 | | |
777 | 782 | | |
778 | 783 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
51 | 52 | | |
52 | 53 | | |
53 | 54 | | |
| |||
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
721 | 721 | | |
722 | 722 | | |
723 | 723 | | |
724 | | - | |
| 724 | + | |
725 | 725 | | |
726 | | - | |
727 | | - | |
728 | | - | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
729 | 731 | | |
730 | 732 | | |
731 | 733 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
124 | 124 | | |
125 | 125 | | |
126 | 126 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
459 | 459 | | |
460 | 460 | | |
461 | 461 | | |
462 | | - | |
463 | | - | |
| 462 | + | |
| 463 | + | |
464 | 464 | | |
465 | 465 | | |
466 | 466 | | |
| |||
Lines changed: 1 addition & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1904 | 1904 | | |
1905 | 1905 | | |
1906 | 1906 | | |
1907 | | - | |
| 1907 | + | |
1908 | 1908 | | |
1909 | 1909 | | |
1910 | 1910 | | |
| |||
Lines changed: 7 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
140 | 140 | | |
141 | 141 | | |
142 | 142 | | |
143 | | - | |
| 143 | + | |
144 | 144 | | |
145 | | - | |
| 145 | + | |
146 | 146 | | |
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
150 | 150 | | |
151 | 151 | | |
152 | | - | |
| 152 | + | |
153 | 153 | | |
154 | | - | |
| 154 | + | |
155 | 155 | | |
156 | 156 | | |
157 | | - | |
| 157 | + | |
158 | 158 | | |
159 | 159 | | |
160 | 160 | | |
| |||
167 | 167 | | |
168 | 168 | | |
169 | 169 | | |
170 | | - | |
| 170 | + | |
171 | 171 | | |
172 | 172 | | |
173 | | - | |
| 173 | + | |
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
83 | 83 | | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
| 86 | + | |
87 | 87 | | |
88 | 88 | | |
89 | 89 | | |
| |||
1113 | 1113 | | |
1114 | 1114 | | |
1115 | 1115 | | |
1116 | | - | |
| 1116 | + | |
1117 | 1117 | | |
1118 | 1118 | | |
1119 | 1119 | | |
| |||
1275 | 1275 | | |
1276 | 1276 | | |
1277 | 1277 | | |
1278 | | - | |
| 1278 | + | |
1279 | 1279 | | |
1280 | 1280 | | |
0 commit comments