You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#### 1.1.2 **Reading from R connections and URLs**
73
+
#### 1.1.2 **Reading from URLs**
74
74
75
-
`fread()`is highly versatile and can accept R connection objects as input to its file (or input) argument. This allows you to read from various sources, including:
75
+
`fread()`can read data directly from web URLs by passing the URL as a character string to its `file`argument. This allows you to download and read data from the internet in one step.
76
76
77
-
- Web URLs: Directly download and read data from the internet.
78
-
79
-
- Compressed files: Using connections like gzfile(), bzfile(), xzfile().
80
-
81
-
- Other connection types: Any R connection that supports text reading.
#### 1.1.3 **Automatic decompression of compressed files**
84
83
85
-
In many cases, `fread()` can automatically detect and decompress files with common compression extensions directly, without needing an explicit connection object like gzfile() or using shell commands. This works by checking the file extension.
86
-
Supported extensions typically include:
87
-
`.gz` (gzip)
88
-
`.bz2` (bzip2)
89
-
`.xz` (xz)
90
-
`.zip` (ZIP archives - `fread` will attempt to read the first file within the `.zip` archive if no specific file inside the archive is specified using the file#entry syntax, or if the `.zip` contains only one file. For more control over which file to read from a multi-file zip, see `?fread` examples for file#entry or use unzip first).
91
-
This functionality may rely on the `R.utils` package or system tools being available for some formats.
92
-
93
-
```{r}
94
-
# 1. Create a sample data.table and write it to a gzipped CSV
95
-
set.seed(123)
96
-
original_dt <- data.table(A = 1:5, B = runif(5))
97
-
fwrite(original_dt, "my_data_auto.csv.gz") # fwrite can also write directly to compressed files
98
-
99
-
# 2. fread automatically decompresses and reads it
In many cases, `fread()` can automatically detect and decompress files with common compression extensions directly, without needing an explicit connection object or shell commands. This works by checking the file extension.
102
85
103
-
# Clean up
104
-
file.remove("my_data_auto.csv.gz")
105
-
```
86
+
**Supported extensions typically include:**
87
+
-`.gz` (gzip): Supported and works out of the box.
88
+
-`.xz` (xz): Supported and works out of the box.
89
+
-`.zip` (ZIP archives, single file): Supported—`fread()` will read the first file in the archive if only one file is present.
106
90
107
91
### 1.2 **Automatic separator and skip detection**
108
92
@@ -182,11 +166,11 @@ c(
182
166
),
183
167
"insert_script.sql"
184
168
)
185
-
sql_lines <- readLines("insert_script.sql")
169
+
sql_lines = readLines("insert_script.sql")
186
170
187
-
values_only <- gsub("INSERT INTO tbl VALUES \\((.*)\\);", "\\1", sql_lines)
171
+
values_only = gsub("INSERT INTO tbl VALUES \\((.*)\\);", "\\1", sql_lines)
@@ -275,12 +259,12 @@ When data is written as strings (either inherently, like character columns, or b
275
259
**Bypassed for Direct Numeric Output**: If specific columns are written as their underlying numeric types (e.g., via `dateTimeAs="epoch"` for POSIXct, or if a user pre-converts Date to integer), then quoting logic is naturally bypassed for those numeric fields, contributing to efficiency.
dt_i64 <- data.table(uid = bit64::as.integer64("1234567890123456789"), val = 100)
318
-
temp_i64_out <- tempfile(fileext = ".csv")
301
+
dt_i64 = data.table(uid = bit64::as.integer64("1234567890123456789"), val = 100)
302
+
temp_i64_out = tempfile(fileext = ".csv")
319
303
fwrite(dt_i64, temp_i64_out)
320
304
cat(readLines(temp_i64_out), sep = "\n")
321
305
@@ -328,7 +312,7 @@ if (requireNamespace("bit64", quietly = TRUE)) {
328
312
To control the order and subset of columns written to file, subset the data.table before calling `fwrite()`. The `col.names` argument in `fwrite()` is a logical (TRUE/FALSE) that controls whether the header row is written, not which columns are written.
0 commit comments