Skip to content

Commit 4e95497

Browse files
committed
add UTF-8 as standard encoding
1 parent 4b8028d commit 4e95497

File tree

3 files changed

+4
-5
lines changed

3 files changed

+4
-5
lines changed

NEWS.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
3535
5. Negative and missing values of `n` argument of adaptive rolling functions trigger an error.
3636
37+
6. `unique()` and `duplicated()` warn now if columns with encodings other than UTF-8 are present, since these are converted to UTF-8 for comparison, which may lead to unexpected results, [#469](https://github.com/Rdatatable/data.table/issues/469). Additionally, `fread(encoding=)` now defaults to `"UTF-8"` (previously `"unknown"`) since most systems already use UTF-8 as the native encoding. Thanks to @arunsrinivasan for the request and @ben-schwen for the implementation.
38+
3739
### NOTICE OF INTENDED FUTURE POTENTIAL BREAKING CHANGES
3840
3941
1. `data.table(x=1, <expr>)`, where `<expr>` is an expression resulting in a 1-column matrix without column names, will eventually have names `x` and `V2`, not `x` and `V1`, consistent with `data.table(x=1, <expr>)` where `<expr>` results in an atomic vector, for example `data.table(x=1, cbind(1))` and `data.table(x=1, 1)` will both have columns named `x` and `V2`. In this release, the matrix case continues to be named `V1`, but the new behavior can be activated by setting `options(datatable.old.matrix.autoname)` to `FALSE`. See point 5 under Bug Fixes for more context; this change will provide more internal consistency as well as more consistency with `data.frame()`.
@@ -357,7 +359,6 @@
357359

358360
7. In rare situations a data.table object may lose its internal attribute that holds a self-reference. New helper function `.selfref.ok()` tests just that. It is only intended for technical use cases. See manual for examples.
359361

360-
8. `unique()` and `duplicated()` warn now if columns with encodings other than UTF-8 are present, since these are converted to UTF-8 for comparison, which may lead to unexpected results, [#469](https://github.com/Rdatatable/data.table/issues/469). Thanks to @arunsrinivasan for the request and @ben-schwen for the implementation.
361362

362363
## data.table [v1.17.8](https://github.com/Rdatatable/data.table/milestone/41) (6 July 2025)
363364

R/fread.R

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ fread = function(
22
input="", file=NULL, text=NULL, cmd=NULL, sep="auto", sep2="auto", dec="auto", quote="\"", nrows=Inf, header="auto",
33
na.strings=getOption("datatable.na.strings","NA"), stringsAsFactors=FALSE, verbose=getOption("datatable.verbose",FALSE),
44
skip="__auto__", select=NULL, drop=NULL, colClasses=NULL, integer64=getOption("datatable.integer64","integer64"),
5-
col.names, check.names=FALSE, encoding="unknown", strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL, index=NULL,
5+
col.names, check.names=FALSE, encoding="UTF-8", strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL, index=NULL,
66
showProgress=getOption("datatable.showProgress",interactive()), data.table=getOption("datatable.fread.datatable",TRUE),
77
nThread=getDTthreads(verbose), logical01=getOption("datatable.logical01",FALSE),
88
logicalYN=getOption("datatable.logicalYN", FALSE),

inst/tests/tests.Rraw

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7900,9 +7900,7 @@ test(1547, foo(1L, 5L, a=2L, "c"), c("2", "c"))
79007900
# Fix for encoding issues in windows, #563
79017901
f = testDir("issue_563_fread.txt")
79027902
ans1 <- fread(f, sep=",", header=TRUE)
7903-
ans2 <- fread(f, sep=",", header=TRUE, encoding="UTF-8")
7904-
test(1548.1, unique(unlist(lapply(ans1, Encoding))), "unknown")
7905-
test(1548.2, unique(unlist(lapply(ans2, Encoding))), "UTF-8")
7903+
test(1548.1, unique(unlist(lapply(ans2, Encoding))), "UTF-8")
79067904

79077905
# 1549 moved to benchmark.Rraw, #5517
79087906

0 commit comments

Comments
 (0)