Skip to content

Faster assess_temporal_independence() #353

@Rafnuss

Description

@Rafnuss

assess_temporal_independence <- function(df, minDeltaTime_dur, deltaTimeComparedTo) {

Here is a suggestion to make assess_temporal_independence() faster using more vectorial approach and more basic R functions. It's a slightly modified version with only a vector timestamp rather than the whole data.frame for increased modularity.

Let me know if you're intrested and I can create a PR.

#' Assess temporal independence
#'
#' @param timestamp A vector of datetime (or numeric in minutes)
#' @param minDeltaTime_dur: Duration in minutes between records of the same
#'   species at the same station to be considered independent.
#' @param deltaTimeComparedTo: Character, `"lastIndependentRecord"` or
#'   `"lastRecord"`.
#'   For two records to be considered independent, must the second one be at
#'   least `minDeltaTime` minutes after the last independent record of the same
#'   species (`deltaTimeComparedTo = "lastIndependentRecord"`), or
#'   `minDeltaTime` minutes after the last record (`deltaTimeComparedTo =
#'   "lastRecord"`)?
#'   If `minDeltaTime` is 0, `deltaTimeComparedTo` should be NULL.
#' @noRd
assess_temporal_independence <- function(timestamp, minDeltaTime_dur = 60, deltaTimeComparedTo = "lastRecord") {
  # Convert to numeric
  t <- as.numeric(timestamp)

  # Compute for lastRecord:
  # Are idpt if the duration since last record is greater than minDeltaTime_dur. First record is always a new event
  independent <- c(T, diff(t) > minDeltaTime_dur * 60)

  # For lastIndependentRecord, it's a bit more complicated
  if (deltaTimeComparedTo == "lastIndependentRecord") {
    # keep a copy to compare later in case.
    independent_old <- independent

    # lastIndependentRecord can only have more sequence/event than lastRecord, so we start from lastRecord sequence and split new sequences within if required.
    # cumsum(independent) allow to create groups based on the independent vector
    independent <- split(t, cumsum(independent), drop = FALSE) %>%
      lapply(\(tt){
        idpt <- rep(F, length(tt))
        continue <- T
        i <- 1
        while (continue) {
          idpt[i] <- T
          # findInterval is a fast way to compute the next index of the +minDeltaTime_dur record which will make the new sequence.
          e <- findInterval(tt[i] + minDeltaTime_dur * 60, tt)
          if (e == length(idpt)) {
            continue <- F
          } else {
            i <- e + 1
          }
        }
        return(idpt)
      }) %>%
      unlist() %>%
      unname()

    # Should always be zero
    # sum(independent_old & !independent)
    # New group/event/sequence
    # sum(!independent_old & independent)
  }
  return(independent)
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions