-
Notifications
You must be signed in to change notification settings - Fork 26
Automatic differentiation in NIMBLE
We have a rough first version of generating C++ to use CppAD to obtain automatic derivatives from nimbleFunctions.
Here is how to try it:
Install NIMBLE from the R_derivs branch.
You need to install CppAD on your own. It's not in our repo yet. To do so:
- Download CppAD from https://www.coin-or.org/CppAD/Doc/install.htm
- Unpack using "tar -xvzf [downloaded file name]"
- Go into the unpacked directory and "mkdir build".
- Go into build: "cd build"
- Make sure you have cmake installed. On OS X I installed via homebrew (which you can install using the shell line provided at their web site) and then "brew install cmake"
- From the build directory, do "cmake .." You do not need to do the "make install" step, although I guess you could.
After these steps, copy the cppad/ directory from the unpacked directory into packages/nimble/inst/include/. The cppad/ directory must contain the file cppad.hpp.
Before using any AD features in NIMBLE, be sure to set nimbleOptions(experimentalEnableDerivs = TRUE).
Then try this example:
ADfun <- nimbleFunction(
setup = function(){},
run = function(x = double(0)) {
outList <- derivs(testMethod(x))
returnType(ADNimbleList())
return(outList)
},
methods = list(
testMethod = function(x = double(0)) {
out <- dnorm(x,0,1)
returnType(double())
return(out)
}
), enableDerivs = list('testMethod')
)
ADfunInst <- ADfun()
cADfunInst <- compileNimble(ADfunInst)
## Two methods for getting derivatives:
cADfunInst$run(x)
cADfunInst$testMethod_deriv(x, c(0, 1, 2))
## Second argument in above call to `testMethod_deriv` specifies which orders of derivs we are interested in.
## Currently available are 0th order (function value), 1st order (gradient), and 2nd order (Hessian).On the R_derivs branch, derivatives of uncompiled nimbleFunctions can be obtained as follows:
Suppose we have a function foo, which takes a single double scalar argument x. We can obtain the derivatives of foo using the syntax
derivOutput <- nimDerivs(foo(5.1), order = c(0, 1, 2), wrt = 'x')This will take the derivatives of foo with respect to x, evaluated at x = 5.1. The object returned by a call to nimDerivs() is a nimbleList with the following elements:
-
value(a vector): The function evaluated at the given argument values. -
gradient(a matrix, should be renamed Jacobian): This is the M by N Jacobian matrix of first derivatives, where M is the length of the value returned byfoo(), and N is the sum of the lengths of the input arguments. Currently, the columns are ordered according to the order given by thewrtargument tonimDerivs(). Thus, ifwrt = c('x', 'y'), the first N_x columns ofgradientwill be derivatives w.r.t. elements ofx, and the last N_y columns will be derivatives w.r.t elements ofy. Here, N_x and N_y are the lengths of thexandyarguments, respectively. -
hessian(an array): This is the N by N by M array of second derivatives, where again M is the length of the value returned byfoo(), and N is the sum of the lengths of the input arguments. To clarify,derivOutput$hessian[ , ,1]will be the Hessian matrix of second derivatives for the first dimension of the return value offoo(). Iffoo()returns a vector,derivOutput$hessian[ , ,2]will be the Hessian matrix of second derivatives for the second dimension of the return value offoo(), and so on.
The wrt argument to nimDerivs determines the ordering of the first and second derivatives that are returned. In the future, it may be nice to have some method of labeling the columns of gradient (and the rows and columns for each dimension of hessian) so that users will know which parameter a given gradient value is taken with respect to.
The wrt argument currently accepts indexed arguments (e.g. wrt = c('x[1:2]', 'y[1]') works), but does not yet accept multiple arguments for the same node (e.g. wrt = c('x[1]', 'x[2]') will currently fail).
If wrt is not manually specified, derivatives will be taken w.r.t. all arguments.
Derivatives of model$calculate(nodes) and calculate(model, nodes) are currently functional in uncompiled nimbleFunctions on the R_derivs branch. An example of the syntax for these derivatives is:
calcDerivs <- derivs(model$calculate(nodes), wrt = subsetOfNodes)
As hinted at above, it makes the most sense for the wrt argument of a call to derivs(model$calculate(nodes)) to be some subset of the nodes that are being calculate-ed. However, it is technically possible to take derivs w.r.t. any node that exists in the model.
Some initial ideas:
First issue is how to ask for different (multiple) orders of derivatives
y <- foo(x, derivs = c(0, 1, 2)) ## return a nimbleList of 0th, 1st, and 2nd order derivatives
## This implies a deriv-enabled nimbleFunction automatically gets a "derivs" argument
## Potential confusion:
y[[1]] ## would be 0th order derivs
y[[2]] ## would be 1st order derivs
## but could be named
y[['value']]
y[['gradient']]
y[['hessian']] ## or other naming ideas
y <- derivs(foo(x), order = c(0, 1, 2)) ## This adds a call layer to the syntax but doesn't mess with foo's arguments
## similar issues/ideas on using y as aboveAnother set of issues is about naming which arguments one wants derivatives with respect to.
y <- derivs(foo(x, a, b), order = c(0, 2), args = c('x', 'a[2]'))
## I'm assuming x is scalar but a is vector. This raises all our familiar issues: writing a[2] as code, or character, and converting the meaning of 'a[2]' to a[2].Similar issues to above will come up for model$calculate:
logProb <- model$calculate(nodes, derivs = c(1, 2))
## but now information variables should established in setup code
logProb <- model$calculate(nodes, derivs = c(1), args = subsetOfNodes)Or if we go with the extra call layer above, then should it be:
logProb <- derivs(model$calculate(nodes), order = c(1, 2), args = subsetOfNodes)