Skip to content

Implement parallelization#159

Open
sbamin wants to merge 1 commit intocytomining:masterfrom
sbamin:patch-1
Open

Implement parallelization#159
sbamin wants to merge 1 commit intocytomining:masterfrom
sbamin:patch-1

Conversation

@sbamin
Copy link

@sbamin sbamin commented May 8, 2024

Suggesting an improvement by adding parallelization for scaling using furrr::future_map function. On my end, resulting scaled df is identical with or without using parallel mode.

Also, using dispersion default value of 0.000001 instead of 0 to avoid NaN for entries not divisible by zero.

        dispersion <-
          stratum %>%
          dplyr::summarise_at(.funs = dispersion, .vars = variables) %>%
          dplyr::mutate(across(everything(), ~ if_else(. == 0, 0.000001, .))) %>%
          dplyr::collect()

Minor: Though I used data %>% dplyr::select(! any_of(variables)) instead of data %>% dplyr::select(-variables) for one of select statements, I think it should be data %>% dplyr::select(- all_of(variables)), to have a stricter implementation of select statement. Use of all_of will stop the code from running if not all variables are being excluded using a select query versus any_of will let it pass without any warning or error.

Suggesting an improvement by adding parallelization for scaling using furrr::future_map function. On my end, resulting scaled df is identical with or without using parallel mode.

Also, using  dispersion default value of 0.000001 instead of 0 to avoid NaN for entries not divisible by zero.

```
        dispersion <-
          stratum %>%
          dplyr::summarise_at(.funs = dispersion, .vars = variables) %>%
          dplyr::mutate(across(everything(), ~ if_else(. == 0, 0.000001, .))) %>%
          dplyr::collect()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant