Skip to content

encoding-related test failures on Alpine Linux #6350

@bastistician

Description

@bastistician

(Only reporting now, seeing that data.table is being developed again.)

Checking released data.table 1.15.4, my Alpine Linux server gives

Error: 3 error(s) out of 11070. Search tests/tests.Rraw.bz2 for test number(s) 1590.05, 1590.06, 1997.14. Duration: 34.4s elapsed (34.9s cpu).

but at this point it is probably more useful to look at the development version of data.table.

So in a vanilla Alpine Linux container,

docker run --rm -it alpine

running

export TZ=UTC
apk add R R-dev R-doc
## get data.table (devel) and suggested packages
R -s -e 'install.packages("data.table", repos = "https://rdatatable.gitlab.io/data.table", dependencies = TRUE, destdir = "/tmp")'
export _R_CHECK_TESTS_NLINES_=0
R CMD check --extra-arch /tmp/data.table_*.tar.gz

gives only 2 failures for test numbers 1590.05 and 1590.06:

Error in test.data.table()
* using R version 4.4.0 (2024-04-24)
* using platform: x86_64-pc-linux-musl
* R was compiled by
    gcc (Alpine 13.2.1_git20240309) 13.2.1 20240309
    GNU Fortran (Alpine 13.2.1_git20240309) 13.2.1 20240309
* running under: Alpine Linux v3.20
* using session charset: UTF-8
[...]
Running the tests in ‘tests/main.R’ failed.
Complete output:
  > require(data.table)
  Loading required package: data.table
  > 
  > test.data.table()  # runs the main test suite of 5,000+ tests in /inst/tests/tests.Rraw
  getDTthreads(verbose=TRUE):
    OpenMP version (_OPENMP)       201511
    omp_get_num_procs()            12
    R_DATATABLE_NUM_PROCS_PERCENT  unset (default 50)
    R_DATATABLE_NUM_THREADS        unset
    R_DATATABLE_THROTTLE           unset (default 1024)
    omp_get_thread_limit()         2147483647
    omp_get_max_threads()          12
    OMP_THREAD_LIMIT               unset
    OMP_NUM_THREADS                unset
    RestoreAfterFork               true
    data.table is using 6 threads with throttle==1024. See ?setDTthreads.
  test.data.table() running: //data.table.Rcheck/data.table/tests/tests.Rraw
  Test 1590.05 ran without errors but failed check that x equals y:
  > x = x1 != x2 
  First 1 of 1 (type 'logical'): 
  [1] FALSE
  > y = TRUE 
  First 1 of 1 (type 'logical'): 
  [1] TRUE
  1 element mismatch
  Test 1590.06 ran without errors but failed check that x equals y:
  > x = forderv(c(x2, x1, x1, x2)) 
  First 0 of 0 (type 'integer'): 
  integer(0)
  > y = INT(1, 4, 2, 3) 
  First 4 of 4 (type 'integer'): 
  [1] 1 4 2 3
  Numeric: lengths (0, 4) differ
  Unloading package bit64
  
  Sat Aug  3 13:25:45 2024  endian==little, sizeof(long double)==16, longdouble.digits==64, sizeof(pointer)==8, TZ=='UTC', Sys.timezone()=='UTC', Sys.getlocale()=='C.UTF-8;C;C;C;C;C', l10n_info()=='MBCS=TRUE; UTF-8=TRUE; Latin-1=FALSE; codeset=UTF-8', getDTthreads()=='OpenMP version (_OPENMP)==201511; omp_get_num_procs()==12; R_DATATABLE_NUM_PROCS_PERCENT==unset (default 50); R_DATATABLE_NUM_THREADS==unset; R_DATATABLE_THROTTLE==unset (default 1024); omp_get_thread_limit()==2147483647; omp_get_max_threads()==12; OMP_THREAD_LIMIT==unset; OMP_NUM_THREADS==unset; RestoreAfterFork==true; data.table is using 6 threads with throttle==1024. See ?setDTthreads.', .libPaths()=='//data.table.Rcheck','/usr/lib/R/library', zlibVersion()==1.3.1 ZLIB_VERSION==1.3.1
  Error in test.data.table() : 
    2 error(s) out of 11369. Search tests/tests.Rraw for test number(s) 1590.05, 1590.06. Duration: 26.9s elapsed (29.1s cpu).

Here is the relevant R code, with comments indicating results on Alpine Linux:

x1 <- "fa\xE7ile"
Encoding(x1) <- "latin1"
x2 <- iconv(x1, "latin1", "UTF-8")
identical(x1, x2)  # TRUE, ok
x1 == x2           # TRUE, ok

Encoding(x2) <- "unknown"  #  <-- an invalid string in a non-UTF-8 locale
identical(x1, x2)  # TRUE on Alpine even in the C locale, but FALSE on, e.g., Ubuntu in the C locale
x1 == x2           # the same

It seems this test (1590.05) relies on (undocumented) platform-dependent behaviour for invalid strings, so should probably be dropped.

I cannot say anything about the unexpected length-0 result of data.table:::forderv(c(x2,x1,x1,x2)) (test number 1590.06).

Metadata

Metadata

Assignees

No one assigned

    Labels

    encodingissues related to Encoding

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions