Skip to content

Commit c738222

Browse files
committed
edits
1 parent a82f11c commit c738222

File tree

1 file changed

+65
-28
lines changed

1 file changed

+65
-28
lines changed

proposal.Rmd

Lines changed: 65 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,9 @@ knitr::opts_chunk$set(echo = TRUE)
1414
R is an amazing interpreted language, giving a flexible and agile foundation for Data Science.
1515
Efforts such as Rcpp and reticulate have established that it can be an advantage
1616
to pair R with another programming language. Sometimes for speed, but most importantly
17-
to have alternative options of expression.
17+
to have alternative options of expression.
18+
19+
> Maybe mention that we sometimes want to use existing implementations?
1820
1921
[Go](https://golang.org) is an open source programming language that makes it easy to
2022
build simple, reliable and efficient software. It is sometimes said to be the language
@@ -23,21 +25,53 @@ compatibility to C and a taste for complexity.
2325

2426
Go is beautiful and simple, its standard library is one of the most impressive
2527
for a programming language. It comes with concurrency built in, which includes
26-
(but is not limited to) running code in parallel. The static site generator [hugo](https://gohugo.io)
27-
and the containerization plaform [docker](https://www.docker.com/) are examples
28-
of systems that are built on Go.
28+
(but is not limited to) running code in parallel. The static site generator [hugo](https://gohugo.io),
29+
the containerization plaform [docker](https://www.docker.com/), and the profiling utility [pprof](https://github.com/google/pprof) are examples
30+
of systems that are built with Go.
2931

3032
There currently is no end to end solution to easily connect R and Go, i.e. invoke Go code
3133
from R, and this is what the `ergo` project is about, the ability for R packages to
32-
leverage existing or original Go code. There admittedly are no
33-
specific use cases in mind, but at the same time it would have been impossible
34-
to imagine the importance of Rcpp when it was first developed.
35-
36-
Having Go as an alternative high performance language will open
34+
leverage existing or original Go code. Having Go as an alternative high performance language will open
3735
interesting avenues for R package development.
3836

37+
# Prior art
38+
39+
## Rcpp
40+
41+
The Rcpp package by Eddelbuettel and François is the current state of the practice to connect C++ code with R, used by over 1300 CRAN packages.
42+
With Rcpp it is very easy to make a C++ function callable from R: mark the function with [attributes](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-attributes.pdf), and Rcpp generates the glue code that takes care of converting input and output data and propagating errors.
43+
Rcpp offers the `Vector` class, with specializations like `IntegerVector`, `CharacterVector`, and `List`, that allow accessing R data structures in a way idiomatic to C++.
44+
Rcpp sugar supports writing C++ code that almost looks like R code but implements the operations internally without roundtripping to R.
45+
46+
## rmq
47+
48+
The rmq package (https://github.com/glycerine/rmq) by Jason E. Aten is scoped
49+
as a proof of concept to embed a Go library in an R package.
50+
The code uses msgpack, a serialization protocol, to pass data between R and Go,
51+
and implements a client-server system using websockets.
52+
According to the author, the project is finished.
53+
Unfortunately, installation of this package requires tweaking the code, so far I
54+
was unable to install the package and test the code on OS X and Ubuntu.
55+
56+
57+
## r-go proof of concept
58+
59+
Independently of rmq, I have written four blog posts at
60+
https://purrple.cat/tags/go/ that describe how to
61+
embed a Go library in an R package so that it is compiled when
62+
the R package is installed.
63+
I show how to call functions in the library and how to pass data to and from Go.
64+
Even though the code in the blog posts has been written manually, it is explicitely
65+
divided in two different categories:
66+
67+
- Code that the user would write. This is typical Go code using Go data structures such
68+
as Go strings and slices.
69+
- Code that uses both `cgo` and `R` internal `C` apis. This code eventually is supposed to
70+
be generated automatically at development time.
71+
3972
# The plan
4073

74+
The previous efforts to connect C++ and Go to R give me enough confidence about the feasability of the project.
4175
As opposed to `Rcpp` which is a dependency in all stages (development, build, runtime),
4276
`ergo` will only be *development time dependency* that facilitates the generation of
4377
code to interface R and Go via their respective C apis. From the point of view of the user of `ergo`,
@@ -49,26 +83,26 @@ the workflow will be:
4983

5084
The role of `ergo` is to hide the C layer entirely, so that users can focus on writing Go and R code.
5185
In a way, this is similar to the feature of
52-
[Rcpp attributes](https://cran.r-project.org/web/packages/Rcpp/vignettes/Rcpp-attributes.pdf)
53-
but it goes further. Once `ergo` has generated the code, the target package is autonomous.
86+
Rcpp attributes,
87+
but it goes further. Once `ergo` has generated the code, the target package is autonomous.
5488

55-
Prior [work](https://purrple.cat/tags/go/) acts as a proof of concept,
56-
and even though the code in the blog posts has been written manually, it is explicitely
57-
divided in two different categories:
58-
- Code that the user would write. This is typical Go code using Go data structures such
59-
as Go strings and slices.
60-
- Code that uses both `cgo` and `R` internal `C` apis. This code eventually is supposed to
61-
be generated automatically at development time.
62-
This previous effort gives me enough confidence about the feasability of the project.
89+
## First iteration
6390

64-
The first step: Automatic generation
91+
Automatic generation
6592
of boiler plate code to connect all basic R vector types and their associated
66-
scalar types in both function inputs and return types. The Go standard library
93+
scalar types in both function inputs and return types.
94+
The Go standard library
6795
includes a [parser](https://golang.org/pkg/go/parser/)
6896
package, this gives an abstract syntax tree of Go code that
6997
can drive the code generation.
98+
The details of the connection implemented by the boiler plate are subject to
99+
further research, I will also consider a msgpack-based connection like in rmq.
100+
101+
At that stage, the project will need community adoption.
102+
103+
## Second iteration
70104

71-
At that stage, the project will need community adoption. The second step will
105+
The second step will
72106
involve promotion and development of use cases that demonstrate the use of `ergo`,
73107
this will without doubt reveal needs that were not planned for.
74108

@@ -78,6 +112,10 @@ separate sections to isolate the technical
78112
issues and feedback related to the development of `ergo` itself, from use case
79113
material, perhaps featuring invited posts from the community.
80114

115+
The third step of the plan will consider the distribution of such packages,
116+
can we use CRAN? If not, what else? Do we need code inside the base R distribution,
117+
i.e. something similar to `R CMD javareconf` to help mitigate these issues?
118+
81119
## Failure modes
82120

83121
Go is currently not one of the languages supported by R, which might create friction
@@ -87,9 +125,9 @@ having installation instructions about the tools needed to use `ergo`.
87125
But ultimately a package with Go code should be as easy to install as any other R package,
88126
in all the supported platforms.
89127

90-
The third step of the plan will consider the distribution of such packages,
91-
can we use CRAN? If not, what else? Do we need code inside the base R distribution,
92-
i.e. something similar to `R CMD javareconf` to help mitigate these issues.
128+
Admittedly, this project does not have
129+
specific use cases in mind, but at the same time it would have been impossible
130+
to imagine the importance of Rcpp when it was first developed.
93131

94132
## About the author
95133

@@ -144,14 +182,14 @@ as the `ergo` package.
144182
# Dissemination
145183

146184
The project will most likely require several public github repositories.
147-
The [rstats-go](https://github.com/rstats-go) organisation is setup to manage these.
185+
I have set up the [rstats-go](https://github.com/rstats-go) organisation to manage these.
148186
The community will be encouraged to engage with these repos.
149187

150188
A blogdown/hugo website (`https://go.rbind.io`) is in place
151189
to host blog posts related to the development, case studies, and documentation.
152190

153191
In addition, I plan to document the progress from a bird's eye view on the consortium's blog.
154-
Depending on the community's need for instant interraction, we can setup
192+
Depending on the community's need for instant interraction, we can set up
155193
a slack team, or a gitter community.
156194

157195
A more formal scientific article in e.g. R Journal or the Journal of Statistical Software
@@ -160,4 +198,3 @@ will be considered once the project is stable enough.
160198
It is unclear at the time of writing this proposal if `ergo` and the packages
161199
containing `ergo` generated code can be hosted on CRAN. Both situations
162200
can be considered, but CRAN delivered packages are preferable.
163-

0 commit comments

Comments
 (0)