Skip to content

Commit 9aee7f0

Browse files
committed
updating news and readme.md
1 parent c5ca2e6 commit 9aee7f0

File tree

2 files changed

+77
-70
lines changed

2 files changed

+77
-70
lines changed

NEWS.md

Lines changed: 9 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,32 +1,24 @@
1-
Version 0.5.1
1+
# Version 0.6
2+
* removed dependency on trunc_mat (change in dplyr)
23

4+
# Version 0.5.1
35
* Use DBI functions for db access instead of the dbplyr versions. Thanks to @hadley
6+
* Changed the default settings stringsAsFactors to `FALSE`, compliant with R version 4.0
47

5-
* Changed the default settings stringsAsFactors to `FALSE`, compliant with
6-
R version 4.0
7-
8-
Version 0.5.0
9-
8+
# Version 0.5.0
109
* Fix for release of dplyr 1.0.0
1110

12-
Version 0.4.1
13-
11+
# Version 0.4.1
1412
* Fix for change of default in stringsAsFactors for R 4.0
15-
1613
* Implementation now uses `rlang` instead of `lazyeval`
17-
1814
* Added `stringsAsFactors` as argument to `read_chunkwise` functions.
1915

20-
Version 0.3.1
21-
16+
# Version 0.3.1
2217
* Fix for dplyr upgrade from 0.5 to 0.6
2318

24-
version 0.2.1
25-
19+
# Version 0.2.1
2620
* Updated tests because of `testthat` changes
2721

28-
version 0.2.0
29-
22+
# Version 0.2.0
3023
* implemented `summarize` and `group_by` per chunk.
31-
3224
* fixed a bug in `head` (`n` was not working)

README.md

Lines changed: 68 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,61 +1,70 @@
1+
12
# chunked
23

34
[![version](https://cran.r-project.org/package=chunked)](https://cran.r-project.org/package=chunked)
4-
[![Downloads](https://cranlogs.r-pkg.org/badges/chunked)](https://cran.r-project.org/package=chunked)
5-
[![Travis-CI Build Status](https://travis-ci.org/edwindj/chunked.svg?branch=master)](https://travis-ci.org/edwindj/chunked)
6-
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/edwindj/chunked?branch=master)](https://ci.appveyor.com/project/edwindj/chunked)
7-
[![Coverage Status](https://coveralls.io/repos/edwindj/chunked/badge.svg?branch=master&service=github)](https://coveralls.io/github/edwindj/chunked?branch=master)
8-
R is a great tool, but processing data in large text files is cumbersome.
9-
`chunked` helps you to process large text files with _dplyr_ while loading only a part of the data in memory.
10-
It builds on the excellent R package [_LaF_](https://github.com/djvanderlaan/LaF).
11-
12-
Processing commands are written in dplyr syntax, and `chunked` (using `LaF`) will take care that chunk by chunk is
13-
processed, taking far less memory than otherwise. `chunked` is useful for __select__-ing columns, __mutate__-ing columns
14-
and __filter__-ing rows. It is less helpful in __group__-ing and __summarize__-ation of large text files. It can be used in
15-
data pre-processing.
5+
[![Downloads](https://cranlogs.r-pkg.org/badges/chunked)](https://cran.r-project.org/package=chunked)
6+
[![R-CMD-check](https://github.com/edwindj/chunked/workflows/R-CMD-check/badge.svg)](https://github.com/edwindj/chunked/actions)
7+
[![Coverage
8+
Status](https://coveralls.io/repos/edwindj/chunked/badge.svg?branch=master&service=github)](https://coveralls.io/github/edwindj/chunked?branch=master)
9+
R is a great tool, but processing data in large text files is
10+
cumbersome. `chunked` helps you to process large text files with *dplyr*
11+
while loading only a part of the data in memory. It builds on the
12+
excellent R package [*LaF*](https://github.com/djvanderlaan/LaF).
13+
14+
Processing commands are written in dplyr syntax, and `chunked` (using
15+
`LaF`) will take care that chunk by chunk is processed, taking far less
16+
memory than otherwise. `chunked` is useful for **select**-ing columns,
17+
**mutate**-ing columns and **filter**-ing rows. It is less helpful in
18+
**group**-ing and **summarize**-ation of large text files. It can be
19+
used in data pre-processing.
1620

1721
## Install
1822

19-
'chunked' can be installed with
23+
chunked can be installed with
2024

21-
```r
25+
``` r
2226
install.packages('chunked')
2327
```
2428

25-
beta version with:
26-
```r
29+
beta version with:
30+
31+
``` r
2732
install.packages('chunked', repos=c('https://cran.rstudio.com', 'https://edwindj.github.io/drat'))
2833
```
2934

3035
and the development version with:
3136

32-
```r
37+
``` r
3338
devtools::install_github('edwindj/chunked')
3439
```
3540

36-
37-
Enjoy! Feedback is welcome...
41+
Enjoy! Feedback is welcome…
3842

3943
# Usage
4044

4145
## Text file -> process -> text file
4246

43-
Most common case is processing a large text file, select or add columns, filter it and
44-
write the result back to a text file
45-
```r
47+
Most common case is processing a large text file, select or add columns,
48+
filter it and write the result back to a text file
49+
50+
``` r
4651
read_chunkwise("./large_file_in.csv", chunk_size=5000) %>%
4752
select(col1, col2, col5) %>%
4853
filter(col1 > 10) %>%
4954
mutate(col6 = col1 + col2) %>%
5055
write_chunkwise("./large_file_out.csv")
5156
```
5257

53-
`chunked` will write process the above statement in chunks of 5000 records. This is different from for example `read.csv` which reads all data into memory before processing it.
58+
`chunked` will write process the above statement in chunks of 5000
59+
records. This is different from for example `read.csv` which reads all
60+
data into memory before processing it.
5461

5562
## Text file -> process -> database
5663

57-
Another option is to use `chunked` as a preprocessing step before adding it to a database
58-
```r
64+
Another option is to use `chunked` as a preprocessing step before adding
65+
it to a database
66+
67+
``` r
5968
con <- DBI::dbConnect(RSQLite::SQLite(), 'test.db', create=TRUE)
6069
db <- dbplyr::src_dbi(con)
6170

@@ -69,14 +78,18 @@ tbl <-
6978
# tbl now points to the table in sqlite.
7079
```
7180

72-
## Db -> process -> Text file
73-
Chunked can be used to export chunkwise to a text file. Note however that in that case processing
74-
takes place in the database and the chunkwise restrictions only apply to the writing.
81+
## Db -> process -> Text file
82+
83+
Chunked can be used to export chunkwise to a text file. Note however
84+
that in that case processing takes place in the database and the
85+
chunkwise restrictions only apply to the writing.
7586

7687
## Lazy processing
7788

78-
`chunked` will not start processing until `collect` or `write_chunkwise` is called.
79-
```r
89+
`chunked` will not start processing until `collect` or `write_chunkwise`
90+
is called.
91+
92+
``` r
8093
data_chunks <-
8194
read_chunkwise("./large_file_in.csv", chunk_size=5000) %>%
8295
select(col1, col3)
@@ -88,40 +101,42 @@ write_chunkwise(data_chunks, "test.csv")
88101
# or
89102
write_chunkwise(data_chunks, db, "test")
90103
```
91-
Syntax completion of variables of a chunkwise file in RStudio works like a charm...
104+
105+
Syntax completion of variables of a chunkwise file in RStudio works like
106+
a charm…
92107

93108
# Dplyr verbs
94109

95110
`chunked` implements the following dplyr verbs:
96111

97-
- `filter`
98-
- `select`
99-
- `rename`
100-
- `mutate`
101-
- `mutate_each`
102-
- `transmute`
103-
- `do`
104-
- `tbl_vars`
105-
- `inner_join`
106-
- `left_join`
107-
- `semi_join`
108-
- `anti_join`
109-
112+
- `filter`
113+
- `select`
114+
- `rename`
115+
- `mutate`
116+
- `mutate_each`
117+
- `transmute`
118+
- `do`
119+
- `tbl_vars`
120+
- `inner_join`
121+
- `left_join`
122+
- `semi_join`
123+
- `anti_join`
110124

111125
Since data is processed in chunks, some dplyr verbs are not implemented:
112126

113-
- `arrange`
114-
- `right_join`
115-
- `full_join`
127+
- `arrange`
128+
- `right_join`
129+
- `full_join`
116130

117-
`summarize` and `group_by` are implemented but generate a warning: they operate on each chunk and
118-
__not__ on the whole data set. However this makes is more easy to process a large file, by repeatedly
119-
aggregating the resulting data.
131+
`summarize` and `group_by` are implemented but generate a warning: they
132+
operate on each chunk and **not** on the whole data set. However this
133+
makes is more easy to process a large file, by repeatedly aggregating
134+
the resulting data.
120135

121-
- `summarize`
122-
- `group_by`
136+
- `summarize`
137+
- `group_by`
123138

124-
```R
139+
``` r
125140
tmp <- tempfile()
126141
write.csv(iris, tmp, row.names=FALSE, quote=FALSE)
127142
iris_cw <- read_chunkwise(tmp, chunk_size = 30) # read in chunks of 30 rows for this example

0 commit comments

Comments
 (0)