You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/datatable-fread-and-fwrite.Rmd
+10-6Lines changed: 10 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -71,7 +71,8 @@ print(dt_from_text)
71
71
72
72
#### 1.1.2 Reading from URLs
73
73
74
-
`fread()` can read data directly from web URLs by passing the URL as a character string to its `file` argument. This allows you to download and read data from the internet in one step.
74
+
`fread()` can read data directly from web URLs by passing the URL as a character string to its `file` argument.
75
+
This allows you to download and read data from the internet in one step.
@@ -86,7 +87,7 @@ In many cases, `fread()` can automatically detect and decompress files with comm
86
87
-`.gz` / `.bz2` (gzip / bzip2): Supported and works out of the box.
87
88
-`.zip` / `.tar` (ZIP / tar archives, single file): Supported—`fread()` will read the first file in the archive if only one file is present.
88
89
89
-
> Note: If there are multiple files in the archive, `fread()` will fail with an error.
90
+
**Note**: If there are multiple files in the archive, `fread()` will fail with an error.
90
91
91
92
### 1.2 Automatic separator and skip detection
92
93
@@ -112,7 +113,7 @@ By default (`skip="auto"`), `fread` will automatically skip blank lines and comm
112
113
113
114
### 1.3 High-Quality Automatic Column Type Detection
114
115
115
-
Many real-world datasets contain columns that are initially blank, zero-filled, or appear numeric but later contain characters. To handle such inconsistencies, `fread()`in `data.table`employs a robust column type detection strategy.
116
+
Many real-world datasets contain columns that are initially blank, zero-filled, or appear numeric but later contain characters. To handle such inconsistencies, `fread()` employs a robust column type detection strategy.
116
117
117
118
Since v1.10.5, `fread()` samples rows by reading blocks of contiguous rows from multiple equally spaced points across the file, including the start, middle, and end. The total number of rows sampled is chosen dynamically based on the file size and structure, and is typically around 10,000, but can be smaller or slightly larger. This wide sampling helps detect type changes that occur later in the data (e.g., `001` to `0A0` or blanks becoming populated).
118
119
@@ -142,7 +143,9 @@ All detection logic and any rereads are detailed when `verbose=TRUE` is enabled.
142
143
143
144
### 1.4 Early Error Detection at End-of-File
144
145
145
-
Because the large sample explicitly includes the very end of the file, critical issues—such as an inconsistent number of columns, a malformed footer, or an opening quote without a matching closing quote—can be detected and reported almost instantly. This early error detection avoids the unnecessary overhead of processing the entire file or allocating excessive memory, only to encounter a failure at the final step. It ensures faster feedback and more efficient resource usage, especially when working with large datasets.
146
+
Because the large sample explicitly includes the very end of the file, critical issues—such as an inconsistent number of columns, a malformed footer, or an opening quote without a matching closing quote—can be detected and reported almost instantly.
147
+
This early error detection avoids the unnecessary overhead of processing the entire file or allocating excessive memory, only to encounter a failure at the final step.
148
+
It ensures faster feedback and more efficient resource usage, especially when working with large datasets.
146
149
147
150
### 1.5 `integer64` Support
148
151
@@ -195,7 +198,7 @@ Use `skip="string"` in `fread` to search for a line containing a substring (typi
195
198
196
199
Supported Scenarios:
197
200
- Unescaped quotes inside quoted fields
198
-
e.g., `"This "quote" is invalid, but fread works anyway"` — supported as long as column count remains consistent.
201
+
e.g., `"This "quote" is invalid, but fread works anyway"` — supported as long as column count remains consistent :
199
202
200
203
```{r}
201
204
data.table::fread(text='x,y\n"This "quote" is invalid, but fread works anyway",1')
@@ -218,7 +221,8 @@ From v1.10.6, `fread` resolves ambiguities more reliably across the entire file
218
221
219
222
## 2. fwrite()
220
223
221
-
`fwrite()` is the fast file writer companion to `fread()`. It’s designed for speed, sensible defaults, and ease of use, mirroring many of the conveniences found in `fread`.
224
+
`fwrite()` is the fast file writer companion to `fread()`.
225
+
It’s designed for speed, sensible defaults, and ease of use, mirroring many of the conveniences found in `fread`.
222
226
223
227
### 2.1 Intelligent and Minimalist Quoting (quote="auto")
0 commit comments