You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
This PR proposes to support R 4.1.0+ in SparkR. Currently the tests are being failed as below:
```
══ Failed ══════════════════════════════════════════════════════════════════════
── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow optimi
collect(createDataFrame(rdf)) not equal to `expected`.
Component “g”: 'tzone' attributes are inconsistent ('UTC' and '')
── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - type
collect(ret) not equal to `rdf`.
Component “b”: 'tzone' attributes are inconsistent ('UTC' and '')
── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - type
collect(ret) not equal to `rdf`.
Component “b”: 'tzone' attributes are inconsistent ('UTC' and '')
── 4. Error (test_sparkSQL.R:1454:3): column functions ─────────────────────────
Error: (converted from warning) cannot xtfrm data frames
Backtrace:
1. base::sort(collect(distinct(select(df, input_file_name())))) test_sparkSQL.R:1454:2
2. base::sort.default(collect(distinct(select(df, input_file_name()))))
5. base::order(x, na.last = na.last, decreasing = decreasing)
6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x)
7. base:::FUN(X[[i]], ...)
10. base::xtfrm.data.frame(x)
── 5. Failure (test_utils.R:67:3): cleanClosure on R functions ─────────────────
`actual` not equal to `g`.
names for current but not for target
Length mismatch: comparison on first 0 components
── 6. Failure (test_utils.R:80:3): cleanClosure on R functions ─────────────────
`actual` not equal to `g`.
names for current but not for target
Length mismatch: comparison on first 0 components
```
It fixes three as below:
- Avoid a sort on DataFrame which isn't legitimate: apache#32709 (comment)
- Treat the empty timezone and local timezone as equivalent in SparkR: apache#32709 (comment)
- Disable `check.environment` in the cleaned closure comparison (enabled by default from R 4.1+, https://cran.r-project.org/doc/manuals/r-release/NEWS.html), and keep the test as is apache#32709 (comment)
Higher R versions have bug fixes and improvements. More importantly R users tend to use highest R versions.
Yes, SparkR will work together with R 4.1.0+
```bash
./R/run-tests.sh
```
```
sparkSQL_arrow:
SparkSQL Arrow optimization: .................
...
sparkSQL:
SparkSQL functions: ........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
........................................................................................................................................................................................................
...
utils:
functions in utils.R: ..............................................
```
Closesapache#32709 from HyukjinKwon/SPARK-35573.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
0 commit comments