-
Notifications
You must be signed in to change notification settings - Fork 1k
Open
Labels
encodingissues related to Encodingissues related to Encoding
Milestone
Description
(Only reporting now, seeing that data.table is being developed again.)
Checking released data.table 1.15.4, my Alpine Linux server gives
Error: 3 error(s) out of 11070. Search tests/tests.Rraw.bz2 for test number(s) 1590.05, 1590.06, 1997.14. Duration: 34.4s elapsed (34.9s cpu).
but at this point it is probably more useful to look at the development version of data.table.
So in a vanilla Alpine Linux container,
docker run --rm -it alpinerunning
export TZ=UTC
apk add R R-dev R-doc
## get data.table (devel) and suggested packages
R -s -e 'install.packages("data.table", repos = "https://rdatatable.gitlab.io/data.table", dependencies = TRUE, destdir = "/tmp")'
export _R_CHECK_TESTS_NLINES_=0
R CMD check --extra-arch /tmp/data.table_*.tar.gzgives only 2 failures for test numbers 1590.05 and 1590.06:
Error in test.data.table()
* using R version 4.4.0 (2024-04-24)
* using platform: x86_64-pc-linux-musl
* R was compiled by
gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309
GNU Fortran (Alpine 13.2.1_git20240309) 13.2.1 20240309
* running under: Alpine Linux v3.20
* using session charset: UTF-8
[...]
Running the tests in ‘tests/main.R’ failed.
Complete output:
> require(data.table)
Loading required package: data.table
>
> test.data.table() # runs the main test suite of 5,000+ tests in /inst/tests/tests.Rraw
getDTthreads(verbose=TRUE):
OpenMP version (_OPENMP) 201511
omp_get_num_procs() 12
R_DATATABLE_NUM_PROCS_PERCENT unset (default 50)
R_DATATABLE_NUM_THREADS unset
R_DATATABLE_THROTTLE unset (default 1024)
omp_get_thread_limit() 2147483647
omp_get_max_threads() 12
OMP_THREAD_LIMIT unset
OMP_NUM_THREADS unset
RestoreAfterFork true
data.table is using 6 threads with throttle==1024. See ?setDTthreads.
test.data.table() running: //data.table.Rcheck/data.table/tests/tests.Rraw
Test 1590.05 ran without errors but failed check that x equals y:
> x = x1 != x2
First 1 of 1 (type 'logical'):
[1] FALSE
> y = TRUE
First 1 of 1 (type 'logical'):
[1] TRUE
1 element mismatch
Test 1590.06 ran without errors but failed check that x equals y:
> x = forderv(c(x2, x1, x1, x2))
First 0 of 0 (type 'integer'):
integer(0)
> y = INT(1, 4, 2, 3)
First 4 of 4 (type 'integer'):
[1] 1 4 2 3
Numeric: lengths (0, 4) differ
Unloading package bit64
Sat Aug 3 13:25:45 2024 endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ=='UTC', Sys.timezone()=='UTC', Sys.getlocale()=='C.UTF-8;C;C;C;C;C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==12; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==12; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 6 threads with throttle==1024. See ?setDTthreads.', .libPaths()=='//data.table.Rcheck','/usr/lib/R/library', zlibVersion()==1.3.1 ZLIB_VERSION==1.3.1
Error in test.data.table() :
2 error(s) out of 11369. Search tests/tests.Rraw for test number(s) 1590.05, 1590.06. Duration: 26.9s elapsed (29.1s cpu).
Here is the relevant R code, with comments indicating results on Alpine Linux:
x1 <- "fa\xE7ile"
Encoding(x1) <- "latin1"
x2 <- iconv(x1, "latin1", "UTF-8")
identical(x1, x2) # TRUE, ok
x1 == x2 # TRUE, ok
Encoding(x2) <- "unknown" # <-- an invalid string in a non-UTF-8 locale
identical(x1, x2) # TRUE on Alpine even in the C locale, but FALSE on, e.g., Ubuntu in the C locale
x1 == x2 # the sameIt seems this test (1590.05) relies on (undocumented) platform-dependent behaviour for invalid strings, so should probably be dropped.
I cannot say anything about the unexpected length-0 result of data.table:::forderv(c(x2,x1,x1,x2)) (test number 1590.06).
Metadata
Metadata
Assignees
Labels
encodingissues related to Encodingissues related to Encoding