Skip to content

Commit 09a91d9

Browse files
committed
[SPARK-26021][SQL][FOLLOWUP] add test for special floating point values
## What changes were proposed in this pull request? a followup of apache#23043 . Add a test to show the minor behavior change introduced by apache#23043 , and add migration guide. ## How was this patch tested? a new test Closes apache#23141 from cloud-fan/follow. Authored-by: Wenchen Fan <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
1 parent 8c68718 commit 09a91d9

File tree

5 files changed

+48
-12
lines changed

5 files changed

+48
-12
lines changed

common/unsafe/src/test/java/org/apache/spark/unsafe/PlatformUtilSuite.java

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -165,10 +165,14 @@ public void writeMinusZeroIsReplacedWithZero() {
165165
byte[] floatBytes = new byte[Float.BYTES];
166166
Platform.putDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET, -0.0d);
167167
Platform.putFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET, -0.0f);
168-
double doubleFromPlatform = Platform.getDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET);
169-
float floatFromPlatform = Platform.getFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET);
170168

171-
Assert.assertEquals(Double.doubleToLongBits(0.0d), Double.doubleToLongBits(doubleFromPlatform));
172-
Assert.assertEquals(Float.floatToIntBits(0.0f), Float.floatToIntBits(floatFromPlatform));
169+
byte[] doubleBytes2 = new byte[Double.BYTES];
170+
byte[] floatBytes2 = new byte[Float.BYTES];
171+
Platform.putDouble(doubleBytes, Platform.BYTE_ARRAY_OFFSET, 0.0d);
172+
Platform.putFloat(floatBytes, Platform.BYTE_ARRAY_OFFSET, 0.0f);
173+
174+
// Make sure the bytes we write from 0.0 and -0.0 are same.
175+
Assert.assertArrayEquals(doubleBytes, doubleBytes2);
176+
Assert.assertArrayEquals(floatBytes, floatBytes2);
173177
}
174178
}

docs/sql-migration-guide-upgrade.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,16 @@ displayTitle: Spark SQL Upgrading Guide
1717

1818
- Since Spark 3.0, the `from_json` functions supports two modes - `PERMISSIVE` and `FAILFAST`. The modes can be set via the `mode` option. The default mode became `PERMISSIVE`. In previous versions, behavior of `from_json` did not conform to either `PERMISSIVE` nor `FAILFAST`, especially in processing of malformed JSON records. For example, the JSON string `{"a" 1}` with the schema `a INT` is converted to `null` by previous versions but Spark 3.0 converts it to `Row(null)`.
1919

20-
- In Spark version 2.4 and earlier, the `from_json` function produces `null`s for JSON strings and JSON datasource skips the same independetly of its mode if there is no valid root JSON token in its input (` ` for example). Since Spark 3.0, such input is treated as a bad record and handled according to specified mode. For example, in the `PERMISSIVE` mode the ` ` input is converted to `Row(null, null)` if specified schema is `key STRING, value INT`.
20+
- In Spark version 2.4 and earlier, the `from_json` function produces `null`s for JSON strings and JSON datasource skips the same independetly of its mode if there is no valid root JSON token in its input (` ` for example). Since Spark 3.0, such input is treated as a bad record and handled according to specified mode. For example, in the `PERMISSIVE` mode the ` ` input is converted to `Row(null, null)` if specified schema is `key STRING, value INT`.
2121

2222
- The `ADD JAR` command previously returned a result set with the single value 0. It now returns an empty result set.
2323

2424
- In Spark version 2.4 and earlier, users can create map values with map type key via built-in function like `CreateMap`, `MapFromArrays`, etc. Since Spark 3.0, it's not allowed to create map values with map type key with these built-in functions. Users can still read map values with map type key from data source or Java/Scala collections, though they are not very useful.
25-
25+
2626
- In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a grouped dataset with key attribute wrongly named as "value", if the key is non-struct type, e.g. int, string, array, etc. This is counterintuitive and makes the schema of aggregation queries weird. For example, the schema of `ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the grouping attribute to "key". The old behaviour is preserved under a newly added configuration `spark.sql.legacy.dataset.nameNonStructGroupingKeyAsValue` with a default value of `false`.
2727

28+
- In Spark version 2.4 and earlier, float/double -0.0 is semantically equal to 0.0, but users can still distinguish them via `Dataset.show`, `Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 internally, and users can't distinguish them any more.
29+
2830
## Upgrading From Spark SQL 2.3 to 2.4
2931

3032
- In Spark version 2.3 and earlier, the second parameter to array_contains function is implicitly promoted to the element type of first array type parameter. This type promotion can be lossy and may cause `array_contains` function to return wrong result. This problem has been addressed in 2.4 by employing a safer type promotion mechanism. This can cause some change in behavior and are illustrated in the table below.

sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeArrayData.java

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -334,17 +334,11 @@ public void setLong(int ordinal, long value) {
334334
}
335335

336336
public void setFloat(int ordinal, float value) {
337-
if (Float.isNaN(value)) {
338-
value = Float.NaN;
339-
}
340337
assertIndexIsValid(ordinal);
341338
Platform.putFloat(baseObject, getElementOffset(ordinal, 4), value);
342339
}
343340

344341
public void setDouble(int ordinal, double value) {
345-
if (Double.isNaN(value)) {
346-
value = Double.NaN;
347-
}
348342
assertIndexIsValid(ordinal);
349343
Platform.putDouble(baseObject, getElementOffset(ordinal, 8), value);
350344
}

sql/core/src/test/scala/org/apache/spark/sql/DatasetPrimitiveSuite.scala

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -393,4 +393,33 @@ class DatasetPrimitiveSuite extends QueryTest with SharedSQLContext {
393393
val ds = spark.createDataset(data)
394394
checkDataset(ds, data: _*)
395395
}
396+
397+
test("special floating point values") {
398+
import org.scalatest.exceptions.TestFailedException
399+
400+
// Spark treats -0.0 as 0.0
401+
intercept[TestFailedException] {
402+
checkDataset(Seq(-0.0d).toDS(), -0.0d)
403+
}
404+
intercept[TestFailedException] {
405+
checkDataset(Seq(-0.0f).toDS(), -0.0f)
406+
}
407+
intercept[TestFailedException] {
408+
checkDataset(Seq(Tuple1(-0.0)).toDS(), Tuple1(-0.0))
409+
}
410+
411+
val floats = Seq[Float](-0.0f, 0.0f, Float.NaN).toDS()
412+
checkDataset(floats, 0.0f, 0.0f, Float.NaN)
413+
414+
val doubles = Seq[Double](-0.0d, 0.0d, Double.NaN).toDS()
415+
checkDataset(doubles, 0.0, 0.0, Double.NaN)
416+
417+
checkDataset(Seq(Tuple1(Float.NaN)).toDS(), Tuple1(Float.NaN))
418+
checkDataset(Seq(Tuple1(-0.0f)).toDS(), Tuple1(0.0f))
419+
checkDataset(Seq(Tuple1(Double.NaN)).toDS(), Tuple1(Double.NaN))
420+
checkDataset(Seq(Tuple1(-0.0)).toDS(), Tuple1(0.0))
421+
422+
val complex = Map(Array(Seq(Tuple1(Double.NaN))) -> Map(Tuple2(Float.NaN, null)))
423+
checkDataset(Seq(complex).toDS(), complex)
424+
}
396425
}

sql/core/src/test/scala/org/apache/spark/sql/QueryTest.scala

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,13 @@ abstract class QueryTest extends PlanTest {
132132
a.length == b.length && a.zip(b).forall { case (l, r) => compare(l, r)}
133133
case (a: Iterable[_], b: Iterable[_]) =>
134134
a.size == b.size && a.zip(b).forall { case (l, r) => compare(l, r)}
135+
case (a: Product, b: Product) =>
136+
compare(a.productIterator.toSeq, b.productIterator.toSeq)
137+
// 0.0 == -0.0, turn float/double to binary before comparison, to distinguish 0.0 and -0.0.
138+
case (a: Double, b: Double) =>
139+
java.lang.Double.doubleToRawLongBits(a) == java.lang.Double.doubleToRawLongBits(b)
140+
case (a: Float, b: Float) =>
141+
java.lang.Float.floatToRawIntBits(a) == java.lang.Float.floatToRawIntBits(b)
135142
case (a, b) => a == b
136143
}
137144

0 commit comments

Comments
 (0)