Skip to content

Commit 436bd6c

Browse files
MichaelChiricoMichael Chirico
andauthored
fread parse Y/N as bool (#4564)
* basic idea for Y/N bool parser * New logicalYN argument * NEWS * tests * missing \arguments{} entry * Some comments --------- Co-authored-by: Michael Chirico <[email protected]>
1 parent 17a7c3e commit 436bd6c

File tree

8 files changed

+84
-21
lines changed

8 files changed

+84
-21
lines changed

NEWS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,6 +67,8 @@ rowwiseDT(
6767

6868
5. `setcolorder()` gains `skip_absent` to ignore unrecognized columns (i.e. columns included in `neworder` but not present in the data), [#6044, #6068](https://github.com/Rdatatable/data.table/pull/6044). Default behavior (`skip_absent=FALSE`) remains unchanged, i.e. unrecognized columns result in an error. Thanks to @sluga for the suggestion and @sluga & @Nj221102 for the PRs.
6969

70+
6. `fread()` gains `logicalYN` argument to read columns consisting only of strings `Y`, `N` as `logical` (as opposed to character), [#4563](https://github.com/Rdatatable/data.table/issues/4563). The default is controlled by option `datatable.logicalYN`, itself defaulting to `FALSE`, for back-compatibility -- some smaller tables (especially sharded tables) might inadvertently read a "true" string column as `logical` and cause bugs. This is particularly important for tables with a column named `y` or `n` -- automatic header detection under `logicalYN=TRUE` will see these values in the first row as being "data" as opposed to column names. A parallel option was not included for `fwrite()` at this time -- users looking for a compact representation of logical columns can still use `fwrite(logical01=TRUE)`. We also opted for now to check only `Y`, `N` and not `Yes`/`No`/`YES`/`NO`.
71+
7072
## BUG FIXES
7173

7274
1. `fwrite()` respects `dec=','` for timestamp columns (`POSIXct` or `nanotime`) with sub-second accuracy, [#6446](https://github.com/Rdatatable/data.table/issues/6446). Thanks @kav2k for pointing out the inconsistency and @MichaelChirico for the PR.

R/fread.R

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,9 @@ na.strings=getOption("datatable.na.strings","NA"), stringsAsFactors=FALSE, verbo
44
skip="__auto__", select=NULL, drop=NULL, colClasses=NULL, integer64=getOption("datatable.integer64","integer64"),
55
col.names, check.names=FALSE, encoding="unknown", strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL, index=NULL,
66
showProgress=getOption("datatable.showProgress",interactive()), data.table=getOption("datatable.fread.datatable",TRUE),
7-
nThread=getDTthreads(verbose), logical01=getOption("datatable.logical01",FALSE), keepLeadingZeros=getOption("datatable.keepLeadingZeros",FALSE),
7+
nThread=getDTthreads(verbose), logical01=getOption("datatable.logical01",FALSE),
8+
logicalYN=getOption("datatable.logicalYN", FALSE),
9+
keepLeadingZeros=getOption("datatable.keepLeadingZeros",FALSE),
810
yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
911
{
1012
if (missing(input)+is.null(file)+is.null(text)+is.null(cmd) < 3L) stopf("Used more than one of the arguments input=, file=, text= and cmd=.")
@@ -24,7 +26,7 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
2426
}
2527
stopifnot(
2628
isTRUEorFALSE(strip.white), isTRUEorFALSE(blank.lines.skip), isTRUEorFALSE(fill) || is.numeric(fill) && length(fill)==1L && fill >= 0L, isTRUEorFALSE(showProgress),
27-
isTRUEorFALSE(verbose), isTRUEorFALSE(check.names), isTRUEorFALSE(logical01), isTRUEorFALSE(keepLeadingZeros), isTRUEorFALSE(yaml),
29+
isTRUEorFALSE(verbose), isTRUEorFALSE(check.names), isTRUEorFALSE(logical01), isTRUEorFALSE(logicalYN), isTRUEorFALSE(keepLeadingZeros), isTRUEorFALSE(yaml),
2830
isTRUEorFALSE(stringsAsFactors) || (is.double(stringsAsFactors) && length(stringsAsFactors)==1L && 0.0<=stringsAsFactors && stringsAsFactors<=1.0),
2931
is.numeric(nrows), length(nrows)==1L
3032
)
@@ -277,7 +279,7 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC")
277279
tz="UTC"
278280
}
279281
ans = .Call(CfreadR,input,identical(input,file),sep,dec,quote,header,nrows,skip,na.strings,strip.white,blank.lines.skip,
280-
fill,showProgress,nThread,verbose,warnings2errors,logical01,select,drop,colClasses,integer64,encoding,keepLeadingZeros,tz=="UTC")
282+
fill,showProgress,nThread,verbose,warnings2errors,logical01,logicalYN,select,drop,colClasses,integer64,encoding,keepLeadingZeros,tz=="UTC")
281283
if (!length(ans)) return(null.data.table()) # test 1743.308 drops all columns
282284
nr = length(ans[[1L]])
283285
require_bit64_if_needed(ans)

inst/tests/tests.Rraw

Lines changed: 29 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5567,6 +5567,10 @@ test(1343.5, fread("A,B\n1,true\n2,\n3,false"), data.table(A=1:3, B=c(TRUE,NA,FA
55675567
test(1343.6, fread("A,B\n1,true\n2,NA\n3,"), data.table(A=1:3, B=c(TRUE,NA,NA)))
55685568
test(1344.1, fread("A,B\n1,2\n0,3\n,1\n", logical01=FALSE), data.table(A=c(1L,0L,NA), B=c(2L,3L,1L)))
55695569
test(1344.2, fread("A,B\n1,2\n0,3\n,1\n", logical01=TRUE), data.table(A=c(TRUE,FALSE,NA), B=c(2L,3L,1L)))
5570+
test(1344.3, fread("A,B\nY,2\nN,3\nNA,1\n", logicalYN=FALSE), data.table(A=c('Y','N',NA), B=c(2L,3L,1L)))
5571+
test(1344.4, fread("A,B\nY,2\nN,3\nNA,1\n", logicalYN=TRUE), data.table(A=c(TRUE,FALSE,NA), B=c(2L,3L,1L)))
5572+
test(1344.5, fread("A,B\nY,2\nN,3\n,1\n", logicalYN=FALSE, na.strings=""), data.table(A=c('Y','N',NA), B=c(2L,3L,1L)))
5573+
test(1344.6, fread("A,B\nY,2\nN,3\n,1\n", logicalYN=TRUE, na.strings=""), data.table(A=c(TRUE,FALSE,NA), B=c(2L,3L,1L)))
55705574

55715575
# .N now available in i
55725576
DT = data.table(a=1:3,b=1:6)
@@ -7870,9 +7874,14 @@ str = "a,b\n1.5,\"at the 5\" end of the gene.\""
78707874
test(1551.1, fread(str), data.table(a = 1.5, b = "at the 5\" end of the gene."), warning=w<-"resolved improper quoting")
78717875
#1256
78727876
str = "x,y\nx1,\"oops\" y1\n"
7873-
test(1551.2, fread(str), data.table(x = "x1", y = "\"oops\" y1"), warning=w)
7877+
test(1551.21, fread(str), data.table(x='x1', y='"oops" y1'), warning=w)
7878+
# during header detection, 'y' is seen as a valid value --> header determined 'FALSE' despite later non-Y/N data
7879+
test(1551.22, fread(str, logicalYN=TRUE), data.table(V1=c('x', 'x1'), V2=c('y', '"oops" y1')), warning=w)
7880+
test(1551.23, fread(str, logicalYN=TRUE, header=TRUE), data.table(x='x1', y='"oops" y1'), warning=w)
78747881
str = "x,y\nx1,\"oops\" y1"
7875-
test(1551.3, fread(str), data.table(x = "x1", y = "\"oops\" y1"), warning=w)
7882+
test(1551.31, fread(str), data.table(x="x1", y='"oops" y1'), warning=w)
7883+
test(1551.32, fread(str, logicalYN=TRUE), data.table(V1=c('x', 'x1'), V2=c('y', '"oops" y1')), warning=w)
7884+
test(1551.33, fread(str, logicalYN=TRUE, header=TRUE), data.table(x="x1", y='"oops" y1'), warning=w)
78767885
#1077
78777886
str = '2,3\n""foo,bar'
78787887
test(1551.4, fread(str), data.table(V1=c("2","\"\"foo"), V2=c("3","bar")), warning=w)
@@ -7882,8 +7891,12 @@ test(1551.5, fread(str),
78827891
data.table(L1 = c("L2", "L3"), some = c("some", "this"), unquoted = c("\"half\" quoted", "should work"), stuff = c("stuff", "ok though")),
78837892
warning = w)
78847893
#1095
7885-
rhs = read.table(testDir("issue_1095_fread.txt.bz2"), sep=",", comment.char="", stringsAsFactors=FALSE, quote="", strip.white=TRUE)
7886-
if (test_R.utils) test(1551.6, fread(testDir("issue_1095_fread.txt.bz2"), logical01=FALSE), setDT(rhs), warning=w)
7894+
rhs = setDT(read.table(testDir("issue_1095_fread.txt.bz2"), sep=",", comment.char="", stringsAsFactors=FALSE, quote="", strip.white=TRUE))
7895+
if (test_R.utils) {
7896+
test(1551.61, fread(testDir("issue_1095_fread.txt.bz2"), logical01=FALSE), rhs, warning=w)
7897+
rhs[, names(.SD) := lapply(.SD, \(x) x == "Y"), .SDcols = c("V16", "V17", "V45")]
7898+
test(1551.62, fread(testDir("issue_1095_fread.txt.bz2"), logical01=FALSE, logicalYN=TRUE), rhs, warning=w)
7899+
}
78877900

78887901
# FR #1314 rest of na.strings issue
78897902
str = "a,b,c,d\n#N/A,+1,5.5,FALSE\n#N/A,5,6.6,TRUE\n#N/A,+1,#N/A,-999\n#N/A,#N/A,-999,FALSE\n#N/A,1,NA,TRUE"
@@ -7896,8 +7909,10 @@ test(1552.3, fread(str, na.strings=c("#N/A", "-999", "+1")), read_table(str, na.
78967909
test(1552.4, fread(str, na.strings=c("#N/A", "-999", "+1", "1")), read_table(str, na.strings=c("#N/A", "-999", "+1", "1"))) # enabled by FR #2927
78977910
test(1552.5, fread(str, na.strings=c("#N/A", "-999", "FALSE")), error="NAstring <<FALSE>>.*boolean.*not permitted")
78987911
test(1552.6, fread("A\n1.0\n2\n-", na.strings=c("-")), data.table(A=c(1.0, 2.0, NA)))
7899-
test(1552.7, fread(str, na.strings=c("#N/A", "-999", "+1", "1"), logical01=TRUE),
7912+
test(1552.71, fread(str, na.strings=c("#N/A", "-999", "+1", "1"), logical01=TRUE),
79007913
error="NAstring <<1>> and logical01=TRUE.*not permitted")
7914+
test(1552.72, fread(str, na.strings=c("#N/A", "-999", "+1", "Y"), logicalYN=TRUE),
7915+
error="NAstring <<Y>> and logicalYN=TRUE.*not permitted")
79017916
str = "a,b,c\n0,1,2\n1,0,2"
79027917
test(1552.8, fread(str, na.strings = "0"), data.table(a=c(NA,1L), b=c(1L,NA), c=c(2L,2L)))
79037918
test(1552.9, fread(str, na.strings = c("0","1")), data.table(a=c(NA,NA), b=c(NA,NA), c=c(2L,2L)))
@@ -8896,8 +8911,10 @@ test(1618.5, fread("a,c,b\n1,2,3", select=c("b", "c"), col.names=c("q", "r")), d
88968911
test(1618.6, fread("a,c,b\n1,2,3", select=c("b", "z")), data.table(b=3L), warning="Column name 'z' not found.*skipping")
88978912

88988913
# Additional test for 1445 for non-monotonic integer select
8899-
select1618.8 <- c(4, 9, 8, 23, 1, 21, 5, 18, 11, 13)
8900-
test(1618.8, names(fread("a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z\na,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z", select = select1618.8)), letters[select1618.8])
8914+
select1618 <- c(4, 9, 8, 23, 1, 21, 5, 18, 11, 13)
8915+
str = paste0(paste(letters, collapse=','), '\n', paste(letters, collapse=','))
8916+
test(1618.8, names(fread(str, select=select1618)), letters[select1618])
8917+
test(1618.9, names(fread(str, select=select1618, logicalYN=TRUE)), paste0('V', select1618))
89018918

89028919
# fix for #1270. Have been problems with R before vs after 3.1.0 here. But now ok in all R versions.
89038920
DT = data.table(x=1:2, y=5:6)
@@ -9106,7 +9123,8 @@ test(1626.91, fsetdiff(DT, DT["b"]), DT[c(1,4)])
91069123
# fix for #1087 and #1465
91079124
test(1627.1, charToRaw(names(fread(testDir("issue_1087_utf8_bom.csv")))[1L]), as.raw(97L))
91089125
test(1627.2, names(fread(testDir("issue_1087_utf8_bom.csv"), verbose=TRUE))[1L], "a", output="UTF-8 byte order mark EF BB BF found")
9109-
test(1627.3, names(fread(testDir("gb18030.txt")))[1L], "x", warning="GB-18030 encoding detected")
9126+
test(1627.31, names(fread(testDir("gb18030.txt")))[1L], "x", warning="GB-18030 encoding detected")
9127+
test(1627.32, names(fread(testDir("gb18030.txt"), logicalYN=TRUE))[1L], "V1", warning="GB-18030 encoding detected")
91109128
test(1627.4, fread(testDir("utf16le.txt")), error="File is encoded in UTF-16")
91119129
test(1627.5, fread(testDir("utf16be.txt")), error="File is encoded in UTF-16")
91129130

@@ -11448,9 +11466,11 @@ unlink(f)
1144811466
test(1753.1, fread("X,Y\n1,2\n3,4\n5,6"), data.table(X=INT(1,3,5),Y=INT(2,4,6)))
1144911467
test(1753.2, fread("X,Y\n1,2\n3,4,\n5,6",logical01=TRUE), ans<-data.table(X=TRUE,Y=2L), warning="Stopped.*line 3. Expected 2 fields but found 3.*discarded.*<<3,4,>>")
1145011468
test(1753.3, fread("X,Y\n1,2\n3,4,7\n5,6",logical01=TRUE), ans, warning="Stopped.*line 3. Expected 2 fields but found 3.*discarded.*<<3,4,7>>")
11469+
test(1753.4, fread("X,Y\nY,2\n3,4,\n5,6",logicalYN=TRUE), ans<-data.table(X=TRUE,Y=2L), warning="Stopped.*line 3. Expected 2 fields but found 3.*discarded.*<<3,4,>>")
11470+
test(1753.5, fread("X,Y\nY,2\n3,4,7\n5,6",logicalYN=TRUE), ans, warning="Stopped.*line 3. Expected 2 fields but found 3.*discarded.*<<3,4,7>>")
1145111471

1145211472
# issue 2051 where a quoted field contains ", New quote rule detection handles it.
11453-
if (test_R.utils) test(1753.4, fread(testDir("issue_2051.csv.gz"))[2,grep("^Our.*tool$",COLUMN50)], 1L)
11473+
if (test_R.utils) test(1753.6, fread(testDir("issue_2051.csv.gz"))[2,grep("^Our.*tool$",COLUMN50)], 1L)
1145411474

1145511475
# check omp critical around SET_STRING_ELT
1145611476
# minimal construction big enough for parallelism with 8 or less threads. On a machine with more, do setDTthreads(8) first otherwise

man/fread.Rd

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ showProgress=getOption("datatable.showProgress", interactive()),
2323
data.table=getOption("datatable.fread.datatable", TRUE),
2424
nThread=getDTthreads(verbose),
2525
logical01=getOption("datatable.logical01", FALSE), # due to change to TRUE; see NEWS
26+
logicalYN=getOption("datatable.logicalYN", FALSE),
2627
keepLeadingZeros = getOption("datatable.keepLeadingZeros", FALSE),
2728
yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC"
2829
)
@@ -61,6 +62,7 @@ yaml=FALSE, autostart=NA, tmpdir=tempdir(), tz="UTC"
6162
\item{data.table}{ TRUE returns a \code{data.table}. FALSE returns a \code{data.frame}. The default for this argument can be changed with \code{options(datatable.fread.datatable=FALSE)}.}
6263
\item{nThread}{The number of threads to use. Experiment to see what works best for your data on your hardware.}
6364
\item{logical01}{If TRUE a column containing only 0s and 1s will be read as logical, otherwise as integer.}
65+
\item{logicalYN}{If TRUE a column containing only Ys and Ns will be read as logical, otherwise as character.}
6466
\item{keepLeadingZeros}{If TRUE a column containing numeric data with leading zeros will be read as character, otherwise leading zeros will be removed and converted to numeric.}
6567
\item{yaml}{ If \code{TRUE}, \code{fread} will attempt to parse (using \code{\link[yaml]{yaml.load}}) the top of the input as YAML, and further to glean parameters relevant to improving the performance of \code{fread} on the data itself. The entire YAML section is returned as parsed into a \code{list} in the \code{yaml_metadata} attribute. See \code{Details}. }
6668
\item{autostart}{ Deprecated and ignored with warning. Please use \code{skip} instead. }

src/data.table.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -294,7 +294,7 @@ SEXP setcharvec(SEXP, SEXP, SEXP);
294294
SEXP chmatch_R(SEXP, SEXP, SEXP);
295295
SEXP chmatchdup_R(SEXP, SEXP, SEXP);
296296
SEXP chin_R(SEXP, SEXP);
297-
SEXP freadR(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
297+
SEXP freadR(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
298298
SEXP fwriteR(SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP, SEXP);
299299
SEXP rbindlist(SEXP, SEXP, SEXP, SEXP, SEXP);
300300
SEXP setlistelt(SEXP, SEXP, SEXP);

src/fread.c

Lines changed: 24 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,9 @@ static freadMainArgs args = {0}; // global for use by DTPRINT; static implies =
7575
static int mmp_fd = -1;
7676
#endif
7777

78-
const char typeName[NUMTYPE][10] = {"drop", "bool8", "bool8", "bool8", "bool8", "bool8", "int32", "int64", "float64", "float64", "float64", "int32", "float64", "string"};
79-
int8_t typeSize[NUMTYPE] = { 0, 1, 1, 1, 1, 1, 4, 8, 8, 8, 8, 4, 8 , 8 };
78+
// See header for more explanation.
79+
const char typeName[NUMTYPE][10] = {"drop", "bool8", "bool8", "bool8", "bool8", "bool8", "bool8", "int32", "int64", "float64", "float64", "float64", "int32", "float64", "string"};
80+
int8_t typeSize[NUMTYPE] = { 0, 1, 1, 1, 1, 1, 1, 4, 8, 8, 8, 8, 4, 8 , 8 };
8081

8182
// In AIX, NAN and INFINITY don't qualify as constant literals. Refer: PR #3043
8283
// So we assign them through below init function.
@@ -1154,6 +1155,21 @@ static void parse_bool_lowercase(FieldParseContext *ctx)
11541155
}
11551156
}
11561157

1158+
/* Parse Y | y | N | n as boolean */
1159+
static void parse_bool_yesno(FieldParseContext *ctx)
1160+
{
1161+
const char *ch = *(ctx->ch);
1162+
int8_t *target = (int8_t*) ctx->targets[sizeof(int8_t)];
1163+
if (ch[0] == 'Y' || ch[0] == 'y') {
1164+
*target = 1;
1165+
*(ctx->ch) = ch + 1;
1166+
} else if (ch[0] == 'N' || ch[0] == 'n') {
1167+
*target = 0;
1168+
*(ctx->ch) = ch + 1;
1169+
} else {
1170+
*target = NA_BOOL8;
1171+
}
1172+
}
11571173

11581174
/* How to register a new parser
11591175
* (1) Write the parser
@@ -1170,6 +1186,7 @@ static reader_fun_t fun[NUMTYPE] = {
11701186
(reader_fun_t) &parse_bool_uppercase,
11711187
(reader_fun_t) &parse_bool_titlecase,
11721188
(reader_fun_t) &parse_bool_lowercase,
1189+
(reader_fun_t) &parse_bool_yesno,
11731190
(reader_fun_t) &StrtoI32,
11741191
(reader_fun_t) &StrtoI64,
11751192
(reader_fun_t) &parse_double_regular,
@@ -1326,7 +1343,9 @@ int freadMain(freadMainArgs _args) {
13261343
strcmp(ch,"True")==0 || strcmp(ch,"False")==0)
13271344
STOP(_("freadMain: NAstring <<%s>> is recognized as type boolean, this is not permitted."), ch);
13281345
if ((strcmp(ch,"1")==0 || strcmp(ch,"0")==0) && args.logical01)
1329-
STOP(_("freadMain: NAstring <<%s>> and logical01=TRUE, this is not permitted."), ch);
1346+
STOP(_("freadMain: NAstring <<%s>> and %s=TRUE, this is not permitted."), ch, "logical01");
1347+
if ((strcmp(ch,"Y")==0 || strcmp(ch,"N")==0) && args.logicalYN)
1348+
STOP(_("freadMain: NAstring <<%s>> and %s=TRUE, this is not permitted."), ch, "logicalYN");
13301349
char *end;
13311350
errno = 0;
13321351
(void)strtod(ch, &end); // careful not to let "" get to here as strtod considers "" numeric
@@ -1335,6 +1354,7 @@ int freadMain(freadMainArgs _args) {
13351354
nastr++;
13361355
}
13371356
disabled_parsers[CT_BOOL8_N] = !args.logical01;
1357+
disabled_parsers[CT_BOOL8_Y] = !args.logicalYN;
13381358
disabled_parsers[CT_ISO8601_DATE] = disabled_parsers[CT_ISO8601_TIME] = args.oldNoDateTime; // temporary new option in v1.13.0; see NEWS
13391359
if (verbose) {
13401360
if (*NAstrings == NULL) {
@@ -1353,6 +1373,7 @@ int freadMain(freadMainArgs _args) {
13531373
if (args.skipString) DTPRINT(_(" skip to string = <<%s>>\n"), args.skipString);
13541374
DTPRINT(_(" show progress = %d\n"), args.showProgress);
13551375
DTPRINT(_(" 0/1 column will be read as %s\n"), args.logical01? "boolean" : "integer");
1376+
DTPRINT(_(" Y/N column will be read as %s\n"), args.logicalYN? "boolean" : "character");
13561377
}
13571378
if (*NAstrings==NULL || // user sets na.strings=NULL
13581379
(**NAstrings=='\0' && *(NAstrings+1)==NULL)) { // user sets na.strings=""

src/fread.h

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@
1717
#endif
1818

1919
// Ordered hierarchy of types
20+
// Each of these corresponds to a parser; they must be ordered "preferentially", i.e., if the same
21+
// input could be validly parsed as both types t1 and t2, and we "prefer" type t1, t1 must come
22+
// before t2. Most commonly, we prefer types using less storage. For example, characters '1.34'
23+
// in a file could be double, complex, or string. We prefer double, which uses only 8 bytes.
24+
// Similarly, '1234' could be integer, double, integer64, complex, or string. We prefer integer,
25+
// which uses only 4 bytes.
2026
typedef enum {
2127
NEG = -1, // dummy to force signed type; sign bit used for out-of-sample type bump management
2228
CT_DROP = 0, // skip column requested by user; it is navigated as a string column with the prevailing quoteRule
@@ -25,6 +31,7 @@ typedef enum {
2531
CT_BOOL8_U,
2632
CT_BOOL8_T,
2733
CT_BOOL8_L,
34+
CT_BOOL8_Y, // Y/N-as-bool
2835
CT_INT32, // int32_t
2936
CT_INT64, // int64_t
3037
CT_FLOAT64, // double (64-bit IEEE 754 float)
@@ -38,8 +45,10 @@ typedef enum {
3845

3946
#define IS_DEC_TYPE(x) ((x) == CT_FLOAT64 || (x) == CT_FLOAT64_EXT || (x) == CT_ISO8601_TIME) // types where dec matters
4047

41-
extern int8_t typeSize[NUMTYPE];
48+
// Used to govern when coercion is allowed. We cannot coerce to a "lower" type, unless it has the same typeName.
4249
extern const char typeName[NUMTYPE][10];
50+
extern int8_t typeSize[NUMTYPE];
51+
4352
extern const long double pow10lookup[301];
4453
extern const uint8_t hexdigits[256];
4554

@@ -149,6 +158,10 @@ typedef struct freadMainArgs
149158
// will become integer.
150159
bool logical01;
151160

161+
// If true, then column of Ns and Ys will be read as logical, otherwise it
162+
// will become character.
163+
bool logicalYN;
164+
152165
bool keepLeadingZeros;
153166

154167
// should datetime with no Z or UTZ-offset be read as UTC?

0 commit comments

Comments
 (0)