updating news and readme.md

edwindj · edwindj · commit 9aee7f02089b · 2022-03-02T11:53:19.000+01:00
diff --git a/NEWS.md b/NEWS.md
@@ -1,32 +1,24 @@
-Version 0.5.1
+# Version 0.6 
+* removed dependency on trunc_mat  (change in dplyr)
 
+# Version 0.5.1
 * Use DBI functions for db access instead of the dbplyr versions. Thanks to @hadley
+* Changed the default settings stringsAsFactors to `FALSE`, compliant with R version 4.0
 
-* Changed the default settings stringsAsFactors to `FALSE`, compliant with 
-R version 4.0
-
-Version 0.5.0
-
+# Version 0.5.0
 * Fix for release of dplyr 1.0.0
 
-Version 0.4.1
-
+# Version 0.4.1
 * Fix for change of default in stringsAsFactors for R 4.0
-
 * Implementation now uses `rlang` instead of `lazyeval`
-
 * Added `stringsAsFactors` as argument to `read_chunkwise` functions.
 
-Version 0.3.1
-
+# Version 0.3.1
 * Fix for dplyr upgrade from 0.5 to 0.6
 
-version 0.2.1
-
+# Version 0.2.1
 * Updated tests because of `testthat` changes
 
-version 0.2.0
-
+# Version 0.2.0
 * implemented `summarize` and `group_by` per chunk.
-
 * fixed a bug in `head` (`n` was not working)
diff --git a/README.md b/README.md
@@ -1,61 +1,70 @@
+
 # chunked
 
 [![version](https://cran.r-project.org/package=chunked)](https://cran.r-project.org/package=chunked)
-[![Downloads](https://cranlogs.r-pkg.org/badges/chunked)](https://cran.r-project.org/package=chunked) 
-[![Travis-CI Build Status](https://travis-ci.org/edwindj/chunked.svg?branch=master)](https://travis-ci.org/edwindj/chunked)
-[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/edwindj/chunked?branch=master)](https://ci.appveyor.com/project/edwindj/chunked)
-[![Coverage Status](https://coveralls.io/repos/edwindj/chunked/badge.svg?branch=master&service=github)](https://coveralls.io/github/edwindj/chunked?branch=master)
-R is a great tool, but processing data in large text files is cumbersome.
-`chunked` helps you to process large text files with _dplyr_ while loading only a part of the data in memory.
-It builds on the excellent R package [_LaF_](https://github.com/djvanderlaan/LaF).
-
-Processing commands are written in dplyr syntax, and `chunked` (using `LaF`) will take care that chunk by chunk is
-processed, taking far less memory than otherwise. `chunked` is useful for __select__-ing columns, __mutate__-ing columns
-and __filter__-ing rows. It is less helpful in __group__-ing and __summarize__-ation of large text files. It can be used in
-data pre-processing.
+[![Downloads](https://cranlogs.r-pkg.org/badges/chunked)](https://cran.r-project.org/package=chunked)
+[![R-CMD-check](https://github.com/edwindj/chunked/workflows/R-CMD-check/badge.svg)](https://github.com/edwindj/chunked/actions)
+[![Coverage
+Status](https://coveralls.io/repos/edwindj/chunked/badge.svg?branch=master&service=github)](https://coveralls.io/github/edwindj/chunked?branch=master)
+R is a great tool, but processing data in large text files is
+cumbersome. `chunked` helps you to process large text files with *dplyr*
+while loading only a part of the data in memory. It builds on the
+excellent R package [*LaF*](https://github.com/djvanderlaan/LaF).
+
+Processing commands are written in dplyr syntax, and `chunked` (using
+`LaF`) will take care that chunk by chunk is processed, taking far less
+memory than otherwise. `chunked` is useful for **select**-ing columns,
+**mutate**-ing columns and **filter**-ing rows. It is less helpful in
+**group**-ing and **summarize**-ation of large text files. It can be
+used in data pre-processing.
 
 ## Install
 
-'chunked' can be installed with
+‘chunked’ can be installed with
 
-```r
+``` r
 install.packages('chunked')
 ```
 
-beta version with: 
-```r
+beta version with:
+
+``` r
 install.packages('chunked', repos=c('https://cran.rstudio.com', 'https://edwindj.github.io/drat'))
 ```
 
 and the development version with:
 
-```r
+``` r
 devtools::install_github('edwindj/chunked')
 ```
 
-
-Enjoy! Feedback is welcome...
+Enjoy! Feedback is welcome…
 
 # Usage
 
 ## Text file -> process -> text file
 
-Most common case is processing a large text file, select or add columns, filter it and 
-write the result back to a text file
-```r
+Most common case is processing a large text file, select or add columns,
+filter it and write the result back to a text file
+
+``` r
   read_chunkwise("./large_file_in.csv", chunk_size=5000) %>% 
   select(col1, col2, col5) %>%
   filter(col1 > 10) %>% 
   mutate(col6 = col1 + col2) %>% 
   write_chunkwise("./large_file_out.csv")
 ```
 
-`chunked` will write process the above statement in chunks of 5000 records. This is different from for example `read.csv` which reads all data into memory before processing it.
+`chunked` will write process the above statement in chunks of 5000
+records. This is different from for example `read.csv` which reads all
+data into memory before processing it.
 
 ## Text file -> process -> database
 
-Another option is to use `chunked` as a preprocessing step before adding it to a database
-```r
+Another option is to use `chunked` as a preprocessing step before adding
+it to a database
+
+``` r
 con <- DBI::dbConnect(RSQLite::SQLite(), 'test.db', create=TRUE)
 db <- dbplyr::src_dbi(con)
 
@@ -69,14 +78,18 @@ tbl <-
 # tbl now points to the table in sqlite.
 ```
 
-##  Db -> process -> Text file
-Chunked can be used to export chunkwise to a text file. Note however that in that case processing 
-takes place in the database and the chunkwise restrictions only apply to the writing.
+## Db -> process -> Text file
+
+Chunked can be used to export chunkwise to a text file. Note however
+that in that case processing takes place in the database and the
+chunkwise restrictions only apply to the writing.
 
 ## Lazy processing
 
-`chunked` will not start processing until `collect` or `write_chunkwise` is called.
-```r
+`chunked` will not start processing until `collect` or `write_chunkwise`
+is called.
+
+``` r
 data_chunks <- 
   read_chunkwise("./large_file_in.csv", chunk_size=5000) %>% 
   select(col1, col3)
@@ -88,40 +101,42 @@ write_chunkwise(data_chunks, "test.csv")
 # or
 write_chunkwise(data_chunks, db, "test")
 ```
-Syntax completion of variables of a chunkwise file in RStudio works like a charm...
+
+Syntax completion of variables of a chunkwise file in RStudio works like
+a charm…
 
 # Dplyr verbs
 
 `chunked` implements the following dplyr verbs:
 
-- `filter`
-- `select`
-- `rename`
-- `mutate`
-- `mutate_each`
-- `transmute`
-- `do`
-- `tbl_vars`
-- `inner_join`
-- `left_join`
-- `semi_join`
-- `anti_join`
-
+-   `filter`
+-   `select`
+-   `rename`
+-   `mutate`
+-   `mutate_each`
+-   `transmute`
+-   `do`
+-   `tbl_vars`
+-   `inner_join`
+-   `left_join`
+-   `semi_join`
+-   `anti_join`
 
 Since data is processed in chunks, some dplyr verbs are not implemented:
 
-- `arrange`
-- `right_join`
-- `full_join`
+-   `arrange`
+-   `right_join`
+-   `full_join`
 
-`summarize` and `group_by` are implemented but generate a warning: they operate on each chunk and
-__not__ on the whole data set. However this makes is more easy to process a large file, by repeatedly
-aggregating the resulting data.
+`summarize` and `group_by` are implemented but generate a warning: they
+operate on each chunk and **not** on the whole data set. However this
+makes is more easy to process a large file, by repeatedly aggregating
+the resulting data.
 
-- `summarize`
-- `group_by`
+-   `summarize`
+-   `group_by`
 
-```R
+``` r
 tmp <- tempfile()
 write.csv(iris, tmp, row.names=FALSE, quote=FALSE)
 iris_cw <- read_chunkwise(tmp, chunk_size = 30) # read in chunks of 30 rows for this example