potential improvements to speed in deriv_criterion_lr()

I'm looking to see if anything can be done to speedup tinyvamp for very large numbers of species J (eg 1000-2000). I'd love this for a collaborative project to be feasible. 

Some profiling with `Rprof()` shows that a lot of time is spent in the function `deriv_criterion_lr()`, but mostly on C functions. However, I have a suspicion that some of the way these are called could be more efficient through vectorization. Namely, `lapply()` is called 4 times, but I think all of these could be vectorized. 

A brief proof of concept:

```
microbenchmark::microbenchmark(
  {sapply(1000:2000, sqrt)}, 
  {sqrt(1000:2000)}
)
```

shows that (here) vectorization improves speed by ~70x. 

I would love some help vectorizing the following function and its relatives

```
log_P <- lapply(1:K,
                  function(k) {
                    if (k %in% which_k_p) {
                      rho <- c(varying_lr_df$value[varying_lr_df$param == "rho" &
                                                     varying_lr_df$k == k],0);
                      rho_cent <- rho -
                        sum_of_logs(rho) +
                        log(fixed_P_multipliers[k])
                    } else{
                      rho_cent <- NULL
                    }
                    return(rho_cent)})
``` 

which it seems to me could look a lot like

```
get_varying_lr_df_values <- varying_lr_df$value[varying_lr_df$param == "rho"]
get_varying_lr_df_ks <- varying_lr_df$k
rhos_maybe <- c(get_varying_lr_df_values[get_varying_lr_df_ks], 0)
log_P[which_k_p] <- rhos_maybe - sum_of_logs(rhos_maybe) + log(fixed_P_multipliers)
log_P[setdiff(1:K, which_k_p)] <- NULL
```

though this is obviously wrong (!!!), I provide it as an example of what I think it could look like to avoid so many equality checks for large vectors. 

This function in the lapply loop is called ~100 million times (I would estimate), so any reduction in its speed would be a massive help. 

It is absolutely critical that nothing in the output of this function changes **in any test that can be run**. If there is an issue in this down the line that we don't spot now, it could be almost impossible to debug. 

Thanks to anyone who can help with this! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

potential improvements to speed in deriv_criterion_lr() #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

potential improvements to speed in deriv_criterion_lr() #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions