Skip to content

Commit efa4458

Browse files
HyukjinKwonSumedh Wale
authored andcommitted
[SPARK-35573][R][TESTS] Make SparkR tests pass with R 4.1+
This PR proposes to support R 4.1.0+ in SparkR. Currently the tests are being failed as below: ``` ══ Failed ══════════════════════════════════════════════════════════════════════ ── 1. Failure (test_sparkSQL_arrow.R:71:3): createDataFrame/collect Arrow optimi collect(createDataFrame(rdf)) not equal to `expected`. Component “g”: 'tzone' attributes are inconsistent ('UTC' and '') ── 2. Failure (test_sparkSQL_arrow.R:143:3): dapply() Arrow optimization - type collect(ret) not equal to `rdf`. Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') ── 3. Failure (test_sparkSQL_arrow.R:229:3): gapply() Arrow optimization - type collect(ret) not equal to `rdf`. Component “b”: 'tzone' attributes are inconsistent ('UTC' and '') ── 4. Error (test_sparkSQL.R:1454:3): column functions ───────────────────────── Error: (converted from warning) cannot xtfrm data frames Backtrace: 1. base::sort(collect(distinct(select(df, input_file_name())))) test_sparkSQL.R:1454:2 2. base::sort.default(collect(distinct(select(df, input_file_name())))) 5. base::order(x, na.last = na.last, decreasing = decreasing) 6. base::lapply(z, function(x) if (is.object(x)) as.vector(xtfrm(x)) else x) 7. base:::FUN(X[[i]], ...) 10. base::xtfrm.data.frame(x) ── 5. Failure (test_utils.R:67:3): cleanClosure on R functions ───────────────── `actual` not equal to `g`. names for current but not for target Length mismatch: comparison on first 0 components ── 6. Failure (test_utils.R:80:3): cleanClosure on R functions ───────────────── `actual` not equal to `g`. names for current but not for target Length mismatch: comparison on first 0 components ``` It fixes three as below: - Avoid a sort on DataFrame which isn't legitimate: apache#32709 (comment) - Treat the empty timezone and local timezone as equivalent in SparkR: apache#32709 (comment) - Disable `check.environment` in the cleaned closure comparison (enabled by default from R 4.1+, https://cran.r-project.org/doc/manuals/r-release/NEWS.html), and keep the test as is apache#32709 (comment) Higher R versions have bug fixes and improvements. More importantly R users tend to use highest R versions. Yes, SparkR will work together with R 4.1.0+ ```bash ./R/run-tests.sh ``` ``` sparkSQL_arrow: SparkSQL Arrow optimization: ................. ... sparkSQL: SparkSQL functions: ........................................................................................................................................................................................................ ........................................................................................................................................................................................................ ........................................................................................................................................................................................................ ........................................................................................................................................................................................................ ........................................................................................................................................................................................................ ........................................................................................................................................................................................................ ... utils: functions in utils.R: .............................................. ``` Closes apache#32709 from HyukjinKwon/SPARK-35573. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent f5d16b8 commit efa4458

File tree

2 files changed

+17
-2
lines changed

2 files changed

+17
-2
lines changed

R/pkg/tests/fulltests/test_sparkSQL.R

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1267,6 +1267,11 @@ test_that("column functions", {
12671267
expect_equal(collect(df2)[[3, 1]], FALSE)
12681268
expect_equal(collect(df2)[[3, 2]], TRUE)
12691269

1270+
# Test that input_file_name()
1271+
actual_names <- collect(distinct(select(df, input_file_name())))
1272+
expect_equal(length(actual_names), 1)
1273+
expect_equal(basename(actual_names[1, 1]), basename(jsonPath))
1274+
12701275
df3 <- select(df, between(df$name, c("Apache", "Spark")))
12711276
expect_equal(collect(df3)[[1, 1]], TRUE)
12721277
expect_equal(collect(df3)[[2, 1]], FALSE)

R/pkg/tests/fulltests/test_utils.R

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,12 @@ test_that("cleanClosure on R functions", {
6464
actual <- get("y", envir = env, inherits = FALSE)
6565
expect_equal(actual, y)
6666
actual <- get("g", envir = env, inherits = FALSE)
67-
expect_equal(actual, g)
67+
if (as.numeric(R.Version()$major) >= 4 && !startsWith(R.Version()$minor, "0")) {
68+
# 4.1+ checks environment in the function
69+
expect_true(all.equal(actual, g, check.environment = FALSE))
70+
} else {
71+
expect_equal(actual, g)
72+
}
6873

6974
# Test for nested enclosures and package variables.
7075
env2 <- new.env()
@@ -77,7 +82,12 @@ test_that("cleanClosure on R functions", {
7782
actual <- get("y", envir = env, inherits = FALSE)
7883
expect_equal(actual, y)
7984
actual <- get("g", envir = env, inherits = FALSE)
80-
expect_equal(actual, g)
85+
if (as.numeric(R.Version()$major) >= 4 && !startsWith(R.Version()$minor, "0")) {
86+
# 4.1+ checks environment in the function
87+
expect_true(all.equal(actual, g, check.environment = FALSE))
88+
} else {
89+
expect_equal(actual, g)
90+
}
8191

8292
base <- c(1, 2, 3)
8393
l <- list(field = matrix(1))

0 commit comments

Comments
 (0)