Skip to content

Automatic differentiation in NIMBLE

Nicholas Michaud edited this page Jul 3, 2017 · 15 revisions

We have a rough first version of generating C++ to use CppAD to obtain automatic derivatives from nimbleFunctions.

Here is how to try it:

Install NIMBLE from the R_derivs branch.

You need to install CppAD on your own. It's not in our repo yet. To do so:

  1. Download CppAD from https://www.coin-or.org/CppAD/Doc/install.htm
  2. Unpack using "tar -xvzf [downloaded file name]"
  3. Go into the unpacked directory and "mkdir build".
  4. Go into build: "cd build"
  5. Make sure you have cmake installed. On OS X I installed via homebrew (which you can install using the shell line provided at their web site) and then "brew install cmake"
  6. From the build directory, do "cmake .." You do not need to do the "make install" step, although I guess you could.

After these steps, copy the cppad/ directory from the unpacked directory into packages/nimble/inst/include/. The cppad/ directory must contain the file cppad.hpp.

Before using any AD features in NIMBLE, be sure to set nimbleOptions(experimentalEnableDerivs = TRUE).

Then try this example:

 ADfun <- nimbleFunction(
      setup = function(){},
      run = function(x = double(0)) {
        outList <- derivs(testMethod(x))
        returnType(ADNimbleList())
        return(outList)
      },
      methods = list(
        testMethod = function(x = double(0)) {
          out <- dnorm(x,0,1)
          returnType(double())
          return(out)
        }
      ), enableDerivs = list('testMethod')
    )
 
    ADfunInst <- ADfun()
    cADfunInst <- compileNimble(ADfunInst)

    ## Two methods for getting derivatives:
    cADfunInst$run(x)
    cADfunInst$testMethod_deriv(x, c(0, 1, 2)) 
    ## Second argument in above call to `testMethod_deriv` specifies which orders of derivs we are interested in.
    ## Currently available are 0th order (function value), 1st order (gradient), and 2nd order (Hessian).

Discussion of current implementation

On the R_derivs branch, derivatives of uncompiled nimbleFunctions can be obtained as follows:

Suppose we have a function foo, which takes a single double scalar argument x. We can obtain the derivatives of foo using the syntax

derivOutput <- nimDerivs(foo(5.1), order = c(0, 1, 2), wrt = 'x')

This will take the derivatives of foo with respect to x, evaluated at x = 5.1. The object returned by a call to nimDerivs() is a nimbleList with the following elements:

  • value (a vector): The function evaluated at the given argument values.
  • gradient (a matrix, should be renamed Jacobian): This is the M by N Jacobian matrix of first derivatives, where M is the length of the value returned by foo(), and N is the sum of the lengths of the input arguments. Currently, the columns are ordered according to the order given by the wrt argument to nimDerivs(). Thus, if wrt = c('x', 'y'), the first N_x columns of gradient will be derivatives w.r.t. elements of x, and the last N_y columns will be derivatives w.r.t elements of y. Here, N_x and N_y are the lengths of the x and y arguments, respectively.
  • hessian (an array): This is the N by N by M array of second derivatives, where again M is the length of the value returned by foo(), and N is the sum of the lengths of the input arguments. To clarify, derivOutput$hessian[ , ,1] will be the Hessian matrix of second derivatives for the first dimension of the return value of foo(). If foo() returns a vector, derivOutput$hessian[ , ,2] will be the Hessian matrix of second derivatives for the second dimension of the return value of foo(), and so on.

The wrt argument:

The wrt argument to nimDerivs determines the ordering of the first and second derivatives that are returned. In the future, it may be nice to have some method of labeling the columns of gradient (and the rows and columns for each dimension of hessian) so that users will know which parameter a given gradient value is taken with respect to.

The wrt argument currently accepts indexed arguments (e.g. wrt = c('x[1:2]', 'y[1]') works), but does not yet accept multiple arguments for the same node (e.g. wrt = c('x[1]', 'x[2]') will currently fail).

If wrt is not manually specified, derivatives will be taken w.r.t. all arguments.

Derivs of model$calculate(nodes)

Derivatives of model$calculate(nodes) and calculate(model, nodes) are currently functional in uncompiled nimbleFunctions on the R_derivs branch. An example of the syntax for these derivatives is:

calcDerivs <- derivs(model$calculate(nodes), wrt = subsetOfNodes)

As hinted at above, it makes the most sense for the wrt argument of a call to derivs(model$calculate(nodes)) to be some subset of the nodes that are being calculate-ed. However, it is technically possible to take derivs w.r.t. any node that exists in the model.

Previous discussion of eventual usage(s)

How do we want derivatives to be called from R or nimbleFunctions?

Some initial ideas:

First issue is how to ask for different (multiple) orders of derivatives

y <- foo(x, derivs = c(0, 1, 2)) ## return a nimbleList of 0th, 1st, and 2nd order derivatives
                                 ## This implies a deriv-enabled nimbleFunction automatically gets a "derivs" argument
## Potential confusion:
y[[1]] ## would be 0th order derivs
y[[2]] ## would be 1st order derivs
## but could be named
y[['value']]
y[['gradient']]
y[['hessian']] ## or other naming ideas

y <- derivs(foo(x), order = c(0, 1, 2)) ## This adds a call layer to the syntax but doesn't mess with foo's arguments
## similar issues/ideas on using y as above

Another set of issues is about naming which arguments one wants derivatives with respect to.

y <- derivs(foo(x, a, b), order = c(0, 2), args = c('x', 'a[2]'))
## I'm assuming x is scalar but a is vector.  This raises all our familiar issues: writing a[2] as code, or character, and converting the meaning of 'a[2]' to a[2].

What about from model$calculate?

Similar issues to above will come up for model$calculate:

logProb <- model$calculate(nodes, derivs = c(1, 2))
## but now information variables should established in setup code
logProb <- model$calculate(nodes, derivs = c(1), args = subsetOfNodes)

Or if we go with the extra call layer above, then should it be:

logProb <- derivs(model$calculate(nodes), order = c(1, 2), args = subsetOfNodes)

Clone this wiki locally