You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-fread-and-fwrite.Rmd
+4-42Lines changed: 4 additions & 42 deletions
Original file line number
Diff line number
Diff line change
@@ -146,45 +146,7 @@ All detection logic and any rereads are detailed when `verbose=TRUE` is enabled.
146
146
147
147
Because the large sample explicitly includes the very end of the file, critical issues—such as an inconsistent number of columns, a malformed footer, or an opening quote without a matching closing quote—can be detected and reported almost instantly. This early error detection avoids the unnecessary overhead of processing the entire file or allocating excessive memory, only to encounter a failure at the final step. It ensures faster feedback and more efficient resource usage, especially when working with large datasets.
148
148
149
-
### 1.5 Reading SQL Insert Scripts
150
-
151
-
`fread()` doesn't directly support SQL `INSERT` scripts, but they can be processed via command-line tools. For example, given `insert_script.sql`:
152
-
153
-
```
154
-
INSERT INTO tbl VALUES (1, 'asd', 923123123, 'zx');
155
-
INSERT INTO tbl VALUES (1, NULL, 923123123, 'zxz');
156
-
INSERT INTO tbl VALUES (3, 'asd3', 923123123, NULL);
157
-
```
158
-
159
-
Use this command in R [link](https://stackoverflow.com/questions/32026398/transform-sql-insert-script-into-csv-format):
160
-
161
-
```{r}
162
-
# 1. Create the SQL file programmatically
163
-
writeLines(
164
-
c(
165
-
"INSERT INTO tbl VALUES (1, 'asd', 923123123, 'zx');",
166
-
"INSERT INTO tbl VALUES (1, NULL, 923123123, 'zxz');",
167
-
"INSERT INTO tbl VALUES (3, 'asd3', 923123123, NULL);"
168
-
),
169
-
"insert_script.sql"
170
-
)
171
-
sql_lines = readLines("insert_script.sql")
172
-
173
-
values_only = gsub("INSERT INTO tbl VALUES \\((.*)\\);", "\\1", sql_lines)
- The `gsub()` function in R transforms each INSERT line into a comma-separated list of its values.
182
-
183
-
-`na.strings = "NULL"` in fread is crucial: it tells fread to interpret the literal string `"NULL"` (output by awk for SQL NULLs) as R's NA value.
184
-
185
-
- Quoted strings (e.g., 'asd') are preserved and read as character by `fread`.
186
-
187
-
### 1.6 `integer64` Support
149
+
### 1.5 `integer64` Support
188
150
189
151
By default, `fread` detects integers larger than 2<sup>31</sup> and reads them as `bit64::integer64` to preserve full precision. This behavior can be overridden in three ways:
190
152
@@ -210,7 +172,7 @@ options(datatable.integer64 = "double") # Example: set globally to "double"
210
172
getOption("datatable.integer64")
211
173
```
212
174
213
-
### 1.7 Drop or Select Columns by Name or Position
175
+
### 1.6 Drop or Select Columns by Name or Position
214
176
215
177
To save memory and improve performance, use `fread()`'s `select` or `drop` arguments to read only the columns you need.
216
178
@@ -225,11 +187,11 @@ Key points:
225
187
226
188
For details, see the manual page by running `?fread` in R.
227
189
228
-
### 1.8 Skip to a Sub-Table’s Header Row Using a Column Name Substring
190
+
### 1.7 Skip to a Sub-Table’s Header Row Using a Column Name Substring
229
191
230
192
Use `skip="string"` in `fread` to search for a line containing a substring (typically from the column names, e.g., `skip="Date"`). Reading begins at the first matching line. This is useful for skipping metadata or selecting sub-tables in multi-table files. This feature is inspired by the `read.xls` function in the gdata package.
`fread` automatically detects how quotes are escaped—including doubled ("") or backslash-escaped (\") quotes—without requiring user input. This is determined using a large sample of the data (see point 3), and validated against the entire file.
0 commit comments