-
Notifications
You must be signed in to change notification settings - Fork 26
Design Doc: Tensorflow as a backend
Speed up large NIMBLE DSL computations by compiling DSL code to Tensorflow.
See also: Vectorizing model operations
Nimble currently uses the Eigen C++ library as a back-end for tensor and linear algebra computations.
Tensorflow is a tensor computation library that targets all of: multicore CPUs, NVIDIA GPUs (via cuda), AMD GPUs (via OpenCL), and Google TPUs. The Tensorflow architecture includes a C++ core exposed through a C API, and multiple language clients including a Python client (the most mature), a C++ client (less mature), and an R client (that wraps the Python client).
This design doc proposes to support Tensorflow as an alternative to Eigen as a computational back-end for Nimble.
Nimble currently handles Eigenization logic in the sizeProcessing compiler stage.
The first step to supporting Tensor flow as a NIMBLE back-end is to factor out this Eigen-specific logic into a separate compiler stage, so that we can implement an alternative Tensorflow compiler stage.
To take full advantage of Tensorflow, we plan to compile large chunks of DSL code to Tensorflow. These chunks can be much large than the Eigen expressions that Nimble currently compiles. Specifically, we can compile math expressions, multiple assignment statements, some conditional statements, and limited control flow to Tensorflow. Tran et al. (2017) found that Edward achieved a 6x speedup over PyMC3 in one task, because Edward compiled to a single Tensorflow graph, whereas PyMC3 compiled to multiple smaller graphs and was bottlenecked in shuttling data between CPU and GPU.
There are multiple possibilities for dynamically generating Tensorflow code from Nimble DSL.
As of June 2017, Tensorflow does not supports custom C++ extension (e.g. for custom ops), but does not distribut a C++ library interface for using Tensorflow in existing projects (see Github issue).