You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: NEWS.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,9 @@
22
22
23
23
5.`transpose`gains`list.cols=`argument, [#5639](https://github.com/Rdatatable/data.table/issues/5639). Use this to return output with list columns and avoids type promotion (an exception is `factor` columns which are promoted to `character` for consistency between `list.cols=TRUE` and `list.cols=FALSE`). This is convenient for creating a row-major representation of a table. Thanks to @MLopez-Ibanez for the request, and Benjamin Schwendinger for the PR.
24
24
25
-
4.Using`dt[, names(.SD) := lapply(.SD, fx)]`nowworks, [#795](https://github.com/Rdatatable/data.table/issues/795) -- one of our [most-requested issues (see #3189)](https://github.com/Rdatatable/data.table/issues/3189). Thanks to @brodieG for the report, 20 or so others for chiming in, and @ColeMiller1 for PR.
25
+
6.Using`dt[, names(.SD) := lapply(.SD, fx)]`nowworks, [#795](https://github.com/Rdatatable/data.table/issues/795) -- one of our [most-requested issues (see #3189)](https://github.com/Rdatatable/data.table/issues/3189). Thanks to @brodieG for the report, 20 or so others for chiming in, and @ColeMiller1 for PR.
26
+
27
+
7.`fread`'s `fill` argument now also accepts an `integer` in addition to boolean values. `fread` always guesses the number of columns based on reading a sample of rows in the file. When `fill=TRUE`, `fread` stops reading and ignores subsequent rows when this estimate winds up too low, e.g. when the sampled rows happen to exclude some rows that are even wider, [#2727](https://github.com/Rdatatable/data.table/issues/2727) [#2691](https://github.com/Rdatatable/data.table/issues/2691) [#4130](https://github.com/Rdatatable/data.table/issues/4130) [#3436](https://github.com/Rdatatable/data.table/issues/3436). Providing an `integer` as argument for `fill` allows for a manual estimate of the number of columns instead, [#1812](https://github.com/Rdatatable/data.table/issues/1812) [#5378](https://github.com/Rdatatable/data.table/issues/5378). Thanks to @jangorecki, @christellacaze, @Yiguan, @alexdthomas, @ibombonato, @Befrancesco, @TobiasGold for reporting/requesting, and Benjamin Schwendinger for the PR.
test(2251.05, fread(text = dt_str, fill=5L, verbose=TRUE), ans, output="Provided number of fill columns: 5 but only found 3\n Dropping 2 overallocated columns") # user guess slightly too big
18401
+
test(2251.06, fread(text = dt_str, fill=1000L), ans) # user guess much too big
18402
+
lines = c(
18403
+
"12223, University",
18404
+
"12227, bridge, Sky",
18405
+
"12828, Sunset",
18406
+
"13801, Ground",
18407
+
"14853, Tranceamerica",
18408
+
"14854, San Francisco",
18409
+
"15595, shibuya, Shrine",
18410
+
"16126, fog, San Francisco",
18411
+
"16520, California, ocean, summer, golden gate, beach, San Francisco",
\item{encoding}{ defaultis \code{"unknown"}.Otherpossibleoptionsare \code{"UTF-8"} and \code{"Latin-1"}.Note:itisnotusedtore-encodetheinput, ratherenableshandlingofencodedstringsintheirnativeencoding. }
54
54
\item{quote}{ By default (\code{"\""}), ifafieldstartswithadoublequote, \code{fread} handlesembeddedquotesrobustlyasexplainedunder \code{Details}.Ifitfails, thenanotherattemptismadetoreadthefield \emph{asis}, i.e., asifquotesaredisabled.Bysetting \code{quote=""}, thefieldisalwaysreadasifquotesaredisabled.Itisnotexpectedtoeverneedtopassanythingotherthan \"\" to quote; i.e., to turn it off. }
55
55
\item{strip.white}{ default is \code{TRUE}. Strips leading and trailing whitespaces of unquoted fields. If \code{FALSE}, only header trailing spaces are removed. }
56
-
\item{fill}{logical (default is \code{FALSE}). If \code{TRUE} then in case the rows have unequal length, blank fields are implicitly filled.}
56
+
\item{fill}{logical or integer (default is \code{FALSE}). If \code{TRUE} then in case the rows have unequal length, number of columns is estimated and blank fields are implicitly filled. If an integer is provided it is used as an upper bound for the number of columns. }
57
57
\item{blank.lines.skip}{\code{logical}, default is \code{FALSE}. If \code{TRUE} blank lines in the input are ignored.}
58
58
\item{key}{Character vector of one or more column names which is passed to \code{\link{setkey}}. It may be a single comma separated string such as \code{key="x,y,z"}, or a vector of names such as \code{key=c("x","y","z")}. Only valid when argument \code{data.table=TRUE}. Where applicable, this should refer to column names given in \code{col.names}. }
59
59
\item{index}{ Character vector or list of character vectors of one or more column names which is passed to \code{\link{setindexv}}. As with \code{key}, comma-separated notation like \code{index="x,y,z"} is accepted for convenience. Only valid when argument \code{data.table=TRUE}. Where applicable, this should refer to column names given in \code{col.names}. }
@@ -2611,8 +2636,13 @@ int freadMain(freadMainArgs _args) {
2611
2636
else {
2612
2637
ch=headPos;
2613
2638
inttt=countfields(&ch);
2614
-
DTWARN(_("Stopped early on line %"PRIu64". Expected %d fields but found %d. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<%s>>"),
2639
+
if (fill>0) {
2640
+
DTWARN(_("Stopped early on line %"PRIu64". Expected %d fields but found %d. Consider fill=%d or even more based on your knowledge of the input file. First discarded non-empty line: <<%s>>"),
DTWARN(_("Stopped early on line %"PRIu64". Expected %d fields but found %d. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<%s>>"),
0 commit comments