Commit d53e74e
[GLUTEN-11550][VL][UT] Enable Variant test suites
Enable GlutenVariantEndToEndSuite, GlutenVariantShreddingSuite, and
GlutenParquetVariantShreddingSuite for both spark40 and spark41.
Fixes:
1. VeloxValidatorApi: Detect variant shredded structs (produced by
Spark's PushVariantIntoScan) by checking __VARIANT_METADATA_KEY
metadata. Triggers fallback to Spark's native Parquet reader.
2. Spark41Shims: Detect Parquet variant logical type annotations and
fall back to vanilla Spark when PARQUET_IGNORE_VARIANT_ANNOTATION
is not set, since Velox native reader does not check variant
annotations.
3. pom.xml: Add -Dfile.encoding=UTF-8 to test JVM args.
On JDK 17 and earlier, java.nio.charset.Charset.defaultCharset()
is determined by the OS locale. On CI containers (centos-8/9)
where LANG=C, the default charset is US-ASCII (ANSI_X3.4-1968).
JDK 18+ changed this via JEP 400 (https://openjdk.org/jeps/400)
to always default to UTF-8 regardless of locale.
Spark's VariantUtil.getString() uses new String(byte[], offset,
length) without specifying charset, which decodes using the JVM
default charset. With JDK 17 + LANG=C, UTF-8 encoded multi-byte
characters (e.g. Chinese) are decoded as ASCII, producing garbled
output.
Call chain:
VariantEndToEndSuite.check("\"你好,世界...\"")
-> to_json(parse_json(col("v")))
-> StructsToJsonEvaluator.evaluate()
-> JacksonGenerator.write(VariantVal)
-> VariantVal.toJson()
-> Variant.toJsonImpl()
-> VariantUtil.getString(byte[], pos)
-> new String(value, start, length) // no charset specified
https://github.com/apache/spark/blob/v4.0.1/common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java#L508
https://github.com/apache/spark/blob/v4.1.0/common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java#L509
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent ff89faf commit d53e74e
File tree
8 files changed
+84
-6
lines changed- backends-velox/src/main/scala/org/apache/gluten
- backendsapi/velox
- utils
- gluten-ut
- spark40/src/test/scala/org/apache/gluten/utils/velox
- spark41/src/test/scala/org/apache/gluten/utils/velox
- shims
- common/src/main/scala/org/apache/gluten/sql/shims
- spark41/src/main/scala/org/apache/gluten/sql/shims/spark41
8 files changed
+84
-6
lines changedLines changed: 8 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
208 | 208 | | |
209 | 209 | | |
210 | 210 | | |
| 211 | + | |
| 212 | + | |
| 213 | + | |
| 214 | + | |
| 215 | + | |
| 216 | + | |
| 217 | + | |
| 218 | + | |
211 | 219 | | |
212 | 220 | | |
213 | 221 | | |
| |||
Lines changed: 9 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
121 | 121 | | |
122 | 122 | | |
123 | 123 | | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
124 | 133 | | |
125 | 134 | | |
126 | 135 | | |
| |||
Lines changed: 42 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
162 | 162 | | |
163 | 163 | | |
164 | 164 | | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
165 | 207 | | |
166 | 208 | | |
167 | 209 | | |
| |||
Lines changed: 2 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
853 | 853 | | |
854 | 854 | | |
855 | 855 | | |
856 | | - | |
857 | | - | |
| 856 | + | |
| 857 | + | |
858 | 858 | | |
859 | 859 | | |
860 | 860 | | |
| |||
Lines changed: 3 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
397 | 397 | | |
398 | 398 | | |
399 | 399 | | |
400 | | - | |
| 400 | + | |
401 | 401 | | |
402 | 402 | | |
403 | 403 | | |
| |||
818 | 818 | | |
819 | 819 | | |
820 | 820 | | |
821 | | - | |
822 | | - | |
| 821 | + | |
| 822 | + | |
823 | 823 | | |
824 | 824 | | |
825 | 825 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
167 | 167 | | |
168 | 168 | | |
169 | 169 | | |
| 170 | + | |
170 | 171 | | |
171 | 172 | | |
172 | 173 | | |
| |||
Lines changed: 2 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
237 | 237 | | |
238 | 238 | | |
239 | 239 | | |
| 240 | + | |
| 241 | + | |
240 | 242 | | |
241 | 243 | | |
242 | 244 | | |
| |||
Lines changed: 17 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
571 | 571 | | |
572 | 572 | | |
573 | 573 | | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
574 | 590 | | |
575 | 591 | | |
576 | 592 | | |
| |||
0 commit comments