You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-30082][SQL] Do not replace Zeros when replacing NaNs
### What changes were proposed in this pull request?
Do not cast `NaN` to an `Integer`, `Long`, `Short` or `Byte`. This is because casting `NaN` to those types results in a `0` which erroneously replaces `0`s while only `NaN`s should be replaced.
### Why are the changes needed?
This Scala code snippet:
```
import scala.math;
println(Double.NaN.toLong)
```
returns `0` which is problematic as if you run the following Spark code, `0`s get replaced as well:
```
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], ("index", "value"))
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
| 1.0| 0|
| 0.0| 3|
| NaN| 0|
+-----+-----+
>>> df.replace(float('nan'), 2).show()
+-----+-----+
|index|value|
+-----+-----+
| 1.0| 2|
| 0.0| 3|
| 2.0| 2|
+-----+-----+
```
### Does this PR introduce any user-facing change?
Yes, after the PR, running the same above code snippet returns the correct expected results:
```
>>> df = spark.createDataFrame([(1.0, 0), (0.0, 3), (float('nan'), 0)], ("index", "value"))
>>> df.show()
+-----+-----+
|index|value|
+-----+-----+
| 1.0| 0|
| 0.0| 3|
| NaN| 0|
+-----+-----+
>>> df.replace(float('nan'), 2).show()
+-----+-----+
|index|value|
+-----+-----+
| 1.0| 0|
| 0.0| 3|
| 2.0| 0|
+-----+-----+
```
### How was this patch tested?
Added unit tests to verify replacing `NaN` only affects columns of type `Float` and `Double`
Closesapache#26738 from johnhany97/SPARK-30082.
Lead-authored-by: John Ayad <[email protected]>
Co-authored-by: John Ayad <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
0 commit comments