|
86 | 86 | \item \code{has.nf=FALSE} uses faster implementation that does not support non-finite values. Then depending on the rolling function it will either: |
87 | 87 | \itemize{ |
88 | 88 | \item (\emph{mean, sum, prod}) detect non-finite, re-run non-finite aware. |
89 | | - \item (\emph{max, min, median}) does not detect non-finites and may silently give incorrect answer. |
| 89 | + \item (\emph{max, min, median}) does not detect non-finites and may silently produce an incorrect answer. |
90 | 90 | } |
91 | 91 | In general \code{has.nf=FALSE && any(!is.finite(x))} should be considered as undefined behavior. Therefore \code{has.nf=FALSE} should be used with care. |
92 | 92 | } |
93 | 93 | } |
94 | 94 | \section{Implementation}{ |
95 | | - Each rolling function has 4 different implementations. First factor that decides which implementation is being used is \code{adaptive} argument, see setion below for details. Then for each of those two algorithms (adaptive \code{TRUE} or \code{FALSE}) there are usually two implementations depending on the \code{algo} argument. |
| 95 | + Each rolling function has 4 different implementations. First factor that decides which implementation is used is the \code{adaptive} argument (either \code{TRUE} or \code{FALSE}), see section below for details. Then for each of those two algorithms there are usually two implementations depending on the \code{algo} argument. |
96 | 96 | \itemize{ |
97 | 97 | \item \code{algo="fast"} uses \emph{"online"}, single pass, algorithm. |
98 | 98 | \itemize{ |
99 | | - \item \emph{max} and \emph{min} rolling function will not do only a single pass but, on average, \code{length(x)/n} nested loops will be computed. The bigger the window the bigger advantage over algo \emph{exact} which computes \code{length(x)} nested loops. Note that \emph{exact} uses multiple CPUs so for a small window size and many CPUs it is possible it will be actually faster than \emph{fast} but in those cases elapsed timings will likely be far below a single second. |
| 99 | + \item \emph{max} and \emph{min} rolling function will not do only a single pass but, on average, they will compute \code{length(x)/n} nested loops. The larger the window, the greater the advantage over the \emph{exact} algorithm, which computes \code{length(x)} nested loops. Note that \emph{exact} uses multiple CPUs so for a small window sizes and many CPUs it may actually be faster than \emph{fast}. However, in such cases the elapsed timings will likely be far below a single second. |
100 | 100 | \item \emph{median} will use a novel algorithm described by \emph{Jukka Suomela} in his paper \emph{Median Filtering is Equivalent to Sorting (2014)}. See references section for the link. Implementation here is extended to support arbitrary length of input and an even window size. Despite extensive validation of results this function should be considered experimental. When missing values are detected it will fall back to slower \code{algo="exact"} implementation. |
101 | 101 | \item Not all functions have \emph{fast} implementation available. As of now adaptive \emph{max}, adaptive \emph{min} and adaptive \emph{median} do not have \emph{fast} implementation, therefore it will automatically fall back to \emph{exact} implementation. \code{datatable.verbose} option can be used to check that. |
102 | 102 | } |
103 | | - \item \code{algo="exact"} will make rolling functions to use a more computationally-intensive algorithm. For each observation from input vector it will compute a function on a rolling window from scratch (complexity \eqn{O(n^2)}). |
| 103 | + \item \code{algo="exact"} will make the rolling functions use a more computationally-intensive algorithm. For each observation in the input vector it will compute a function on a rolling window from scratch (complexity \eqn{O(n^2)}). |
104 | 104 | \itemize{ |
105 | 105 | \item Depeneding on the function, this algorithm may suffers less from floating point rounding error (the same consideration applies to base \code{\link[base]{mean}}). |
106 | 106 | \item In case of \emph{mean} (and possibly other functions in future), it will additionally make extra pass to perform floating point error correction. Error corrections might not be truly exact on some platforms (like Windows) when using multiple threads. |
|
0 commit comments