Skip to content

potential improvements to speed in deriv_criterion_lr() #10

@adw96

Description

@adw96

I'm looking to see if anything can be done to speedup tinyvamp for very large numbers of species J (eg 1000-2000). I'd love this for a collaborative project to be feasible.

Some profiling with Rprof() shows that a lot of time is spent in the function deriv_criterion_lr(), but mostly on C functions. However, I have a suspicion that some of the way these are called could be more efficient through vectorization. Namely, lapply() is called 4 times, but I think all of these could be vectorized.

A brief proof of concept:

microbenchmark::microbenchmark(
  {sapply(1000:2000, sqrt)}, 
  {sqrt(1000:2000)}
)

shows that (here) vectorization improves speed by ~70x.

I would love some help vectorizing the following function and its relatives

log_P <- lapply(1:K,
                  function(k) {
                    if (k %in% which_k_p) {
                      rho <- c(varying_lr_df$value[varying_lr_df$param == "rho" &
                                                     varying_lr_df$k == k],0);
                      rho_cent <- rho -
                        sum_of_logs(rho) +
                        log(fixed_P_multipliers[k])
                    } else{
                      rho_cent <- NULL
                    }
                    return(rho_cent)})

which it seems to me could look a lot like

get_varying_lr_df_values <- varying_lr_df$value[varying_lr_df$param == "rho"]
get_varying_lr_df_ks <- varying_lr_df$k
rhos_maybe <- c(get_varying_lr_df_values[get_varying_lr_df_ks], 0)
log_P[which_k_p] <- rhos_maybe - sum_of_logs(rhos_maybe) + log(fixed_P_multipliers)
log_P[setdiff(1:K, which_k_p)] <- NULL

though this is obviously wrong (!!!), I provide it as an example of what I think it could look like to avoid so many equality checks for large vectors.

This function in the lapply loop is called ~100 million times (I would estimate), so any reduction in its speed would be a massive help.

It is absolutely critical that nothing in the output of this function changes in any test that can be run. If there is an issue in this down the line that we don't spot now, it could be almost impossible to debug.

Thanks to anyone who can help with this!

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions