Use of pmax(na.rm =TRUE) in addPeak2GeneLinks #1522

RegnerM2015 · 2022-07-25T18:25:35Z

RegnerM2015
Jul 25, 2022

I wanted to better understand the logic for using the pmax function in the code below for addPeak2GeneLinks:

.logDiffTime(main="Computing Correlations", t1=tstart, verbose=verbose, logFile=logFile)
  o$Correlation <- rowCorCpp(as.integer(o$A), as.integer(o$B), assay(seATAC), assay(seRNA))
  o$VarAssayA <- .getQuantiles(matrixStats::rowVars(assay(seATAC)))[o$A]
  o$VarAssayB <- .getQuantiles(matrixStats::rowVars(assay(seRNA)))[o$B]
  o$TStat <- (o$Correlation / sqrt((pmax(1-o$Correlation^2, 0.00000000000000001, na.rm = TRUE))/(ncol(seATAC)-2))) #T-statistic P-value
  o$Pval <- 2*pt(-abs(o$TStat), ncol(seATAC) - 2)
  o$FDR <- p.adjust(o$Pval, method = "fdr")

To my understanding, pmax with na.rm = TRUE is used to remove potential NaNs from the output of rowCorCpp, is this correct?

Can we interpret NaNs from rowCorCpp as low correlations? Or should users not interpret NaN correlations at all? Based on the pmax code above, it seems like the T statistic and P-value are still calculated for NaN correlations by swapping in a value of 0.00000000000000001.

Alternatively, could users just ignore the NaN correlations or remove them from the analysis all together?

Thank you for your help.

jeffmgranja · 2022-08-05T18:12:25Z

jeffmgranja
Aug 5, 2022
Maintainer

Hi @RegnerM2015, I think the base idea of it was to make sure that the TStat is 0 for a correlation of 0 instead of NaN for computing the Pvalues and FDR. If you have an NA correlation you should still get NA because its in the numerator. I just did it in this manner to be safe in some sense. rowCorCpp shouldnt give NA correlations unless there are NA's in the input to my understanding. I guess the na.rm=TRUE doesnt do anything I am just exceedingly paranoid about NA's creating unwanted bugs...

> cor <- 0
> (cor / sqrt((pmax(cor, 0.00000000000000001, na.rm = TRUE))/(10-2)))
[1] 0
> (cor / sqrt(cor)/(10-2))
[1] NaN

cor <- NA
> (cor / sqrt((pmax(cor, 0.00000000000000001, na.rm = TRUE))/(10-2)))
[1] NA
> (cor / sqrt(cor)/(10-2))
[1] NA

Please let me know if that helps or if I am mistaken!

3 replies

RegnerM2015 Aug 5, 2022
Author

In some cases, rowCorCpp may give NA correlations when there is no variation in one the variables being tested (to my understanding). Like this example below:

> x
[1] 0 0 0 0 0 0 0 0
> y
[1] 0.10 0.30 0.10 0.25 0.40 0.10 0.20 0.40
> cor(x,y)
[1] NA
Warning message:
In cor(x, y) : the standard deviation is zero

I think in the context of ArchR, you will sometimes have a gene or a peak that has the same value across all metacells. This would then return a NA correlation.

I guess from a big picture standpoint, we can think of correlations as variation in one variable explaining the variation in an other variable, but when one variable has no variation to be explained, then there is really no correlation or association. Therefore, I would interpret the NA correlations as little to no correlation.

Let me know what you think or if I could be mistaken as well. While I have found instances of NA correlations in my data, I have not tested this on the tutorial data, so this could depend on the user's situation and may not generalize to all scenarios.

jeffmgranja Aug 5, 2022
Maintainer

You are probably right. I think the implementation as is is fine? Let me know if you think something should be altered.

RegnerM2015 Aug 5, 2022
Author

I also think the current implementation is fine!

I just wanted to brainstorm and improve my understanding of how things work. Thanks for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use of pmax(na.rm =TRUE) in addPeak2GeneLinks #1522

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Use of pmax(na.rm =TRUE) in addPeak2GeneLinks #1522

Uh oh!

RegnerM2015 Jul 25, 2022

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

jeffmgranja Aug 5, 2022 Maintainer

Uh oh!

Uh oh!

RegnerM2015 Aug 5, 2022 Author

Uh oh!

jeffmgranja Aug 5, 2022 Maintainer

Uh oh!

Uh oh!

RegnerM2015 Aug 5, 2022 Author

RegnerM2015
Jul 25, 2022

Replies: 1 comment 3 replies

jeffmgranja
Aug 5, 2022
Maintainer

RegnerM2015 Aug 5, 2022
Author

jeffmgranja Aug 5, 2022
Maintainer

RegnerM2015 Aug 5, 2022
Author