Skip to content

Commit ef4aa63

Browse files
committed
<- by =
1 parent 66a5ea8 commit ef4aa63

File tree

1 file changed

+26
-42
lines changed

1 file changed

+26
-42
lines changed

vignettes/datatable-fread-and-fwrite.Rmd

Lines changed: 26 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: "fread() and fwrite()"
2+
title: "Fast read and fast write"
33
date: "`r Sys.Date()`"
44
output: rmarkdown::html_vignette # <--- Changed
55
vignette: >
@@ -46,8 +46,8 @@ HEADER: Yet more
4646
4747
library(data.table)
4848
49-
all_lines <- readLines("example_data.txt")
50-
data_lines <- grep("HEADER", all_lines, value = TRUE, invert = TRUE)
49+
all_lines = readLines("example_data.txt")
50+
data_lines = grep("HEADER", all_lines, value = TRUE, invert = TRUE)
5151
fread(text = data_lines)
5252
5353
file.remove("example_data.txt")
@@ -65,44 +65,28 @@ On Windows we recommend [Cygwin](https://www.cygwin.com/) (run one .exe to insta
6565
)`.
6666

6767
```{r}
68-
my_data_string <- "colA,colB,colC\n1,apple,TRUE\n2,banana,FALSE\n3,orange,TRUE"
69-
dt_from_text <- fread(text = my_data_string)
68+
my_data_string = "colA,colB,colC\n1,apple,TRUE\n2,banana,FALSE\n3,orange,TRUE"
69+
dt_from_text = fread(text = my_data_string)
7070
print(dt_from_text)
7171
```
7272

73-
#### 1.1.2 **Reading from R connections and URLs**
73+
#### 1.1.2 **Reading from URLs**
7474

75-
`fread()` is highly versatile and can accept R connection objects as input to its file (or input) argument. This allows you to read from various sources, including:
75+
`fread()` can read data directly from web URLs by passing the URL as a character string to its `file` argument. This allows you to download and read data from the internet in one step.
7676

77-
- Web URLs: Directly download and read data from the internet.
78-
79-
- Compressed files: Using connections like gzfile(), bzfile(), xzfile().
80-
81-
- Other connection types: Any R connection that supports text reading.
77+
```{r}
78+
# dt = fread("https://people.sc.fsu.edu/~jburkardt/data/csv/airtravel.csv")
79+
# print(dt)
80+
```
8281

8382
#### 1.1.3 **Automatic decompression of compressed files**
8483

85-
In many cases, `fread()` can automatically detect and decompress files with common compression extensions directly, without needing an explicit connection object like gzfile() or using shell commands. This works by checking the file extension.
86-
Supported extensions typically include:
87-
`.gz` (gzip)
88-
`.bz2` (bzip2)
89-
`.xz` (xz)
90-
`.zip` (ZIP archives - `fread` will attempt to read the first file within the `.zip` archive if no specific file inside the archive is specified using the file#entry syntax, or if the `.zip` contains only one file. For more control over which file to read from a multi-file zip, see `?fread` examples for file#entry or use unzip first).
91-
This functionality may rely on the `R.utils` package or system tools being available for some formats.
92-
93-
```{r}
94-
# 1. Create a sample data.table and write it to a gzipped CSV
95-
set.seed(123)
96-
original_dt <- data.table(A = 1:5, B = runif(5))
97-
fwrite(original_dt, "my_data_auto.csv.gz") # fwrite can also write directly to compressed files
98-
99-
# 2. fread automatically decompresses and reads it
100-
dt_auto_decompressed <- fread("my_data_auto.csv.gz")
101-
print(dt_auto_decompressed)
84+
In many cases, `fread()` can automatically detect and decompress files with common compression extensions directly, without needing an explicit connection object or shell commands. This works by checking the file extension.
10285

103-
# Clean up
104-
file.remove("my_data_auto.csv.gz")
105-
```
86+
**Supported extensions typically include:**
87+
- `.gz` (gzip): Supported and works out of the box.
88+
- `.xz` (xz): Supported and works out of the box.
89+
- `.zip` (ZIP archives, single file): Supported—`fread()` will read the first file in the archive if only one file is present.
10690

10791
### 1.2 **Automatic separator and skip detection**
10892

@@ -182,11 +166,11 @@ c(
182166
),
183167
"insert_script.sql"
184168
)
185-
sql_lines <- readLines("insert_script.sql")
169+
sql_lines = readLines("insert_script.sql")
186170
187-
values_only <- gsub("INSERT INTO tbl VALUES \\((.*)\\);", "\\1", sql_lines)
171+
values_only = gsub("INSERT INTO tbl VALUES \\((.*)\\);", "\\1", sql_lines)
188172
189-
dt_sql <- fread(text = values_only, na.strings = "NULL")
173+
dt_sql = fread(text = values_only, na.strings = "NULL")
190174
print(dt_sql)
191175
192176
file.remove("insert_script.sql")
@@ -275,12 +259,12 @@ When data is written as strings (either inherently, like character columns, or b
275259
**Bypassed for Direct Numeric Output**: If specific columns are written as their underlying numeric types (e.g., via `dateTimeAs="epoch"` for POSIXct, or if a user pre-converts Date to integer), then quoting logic is naturally bypassed for those numeric fields, contributing to efficiency.
276260

277261
```{r}
278-
dt_quoting_scenario <- data.table(
262+
dt_quoting_scenario = data.table(
279263
text_field = c("Contains,a,comma", "Contains \"a quote\"", "Contains\na newline", "Clean_text"),
280264
epoch_time = as.numeric(as.POSIXct("2023-10-26 14:35:00", tz = "GMT") + 1:4), # Already numeric
281265
numeric_field = 1:4
282266
)
283-
temp_quote_adv <- tempfile(fileext = ".csv")
267+
temp_quote_adv = tempfile(fileext = ".csv")
284268
285269
# Quoting applies to text_field, but not to epoch_time or numeric_field
286270
fwrite(dt_quoting_scenario, temp_quote_adv)
@@ -296,11 +280,11 @@ Offers precise control for POSIXct/Date types:
296280
- `dateTimeAs="epoch"`: POSIXct as seconds since epoch (numeric).
297281

298282
```{r}
299-
dt_timestamps <- data.table(
283+
dt_timestamps = data.table(
300284
ts = as.POSIXct("2023-10-26 14:35:45.123456", tz = "GMT"),
301285
dt = as.Date("2023-11-15")
302286
)
303-
temp_dt_iso <- tempfile(fileext = ".csv")
287+
temp_dt_iso = tempfile(fileext = ".csv")
304288
fwrite(dt_timestamps, temp_dt_iso, dateTimeAs = "ISO")
305289
cat(readLines(temp_dt_iso), sep = "\n")
306290
unlink(temp_dt_iso)
@@ -314,8 +298,8 @@ unlink(temp_dt_iso)
314298

315299
```{r}
316300
if (requireNamespace("bit64", quietly = TRUE)) {
317-
dt_i64 <- data.table(uid = bit64::as.integer64("1234567890123456789"), val = 100)
318-
temp_i64_out <- tempfile(fileext = ".csv")
301+
dt_i64 = data.table(uid = bit64::as.integer64("1234567890123456789"), val = 100)
302+
temp_i64_out = tempfile(fileext = ".csv")
319303
fwrite(dt_i64, temp_i64_out)
320304
cat(readLines(temp_i64_out), sep = "\n")
321305
@@ -328,7 +312,7 @@ if (requireNamespace("bit64", quietly = TRUE)) {
328312
To control the order and subset of columns written to file, subset the data.table before calling `fwrite()`. The `col.names` argument in `fwrite()` is a logical (TRUE/FALSE) that controls whether the header row is written, not which columns are written.
329313

330314
```{r}
331-
dt <- data.table(A = 1:3, B = 4:6, C = 7:9)
315+
dt = data.table(A = 1:3, B = 4:6, C = 7:9)
332316
333317
# Write only columns C and A, in that order
334318
fwrite(dt[, .(C, A)], "out.csv")

0 commit comments

Comments
 (0)