DModX values are normalized on RSD of new observations instead of RSD of the model?

Hello!

I ran upon a strange behavior  of the DModX() function when projecting new data into my PCA model. I've pasted some code to reproduce this below. First, I train a PCA model on the majority of the observations in the `mtcars` dataset:

```r
# load package and demo data
set.seed(32)
data("mtcars")
library(pcaMethods)

# split dataset
ind <- sample(1:nrow(mtcars), 5)
test <- mtcars[ind,]
train <- mtcars[-ind,]

# train pca model
pca.mod <- pca(train, scale = "uv", center = TRUE, nPcs = 3)
```

When evaluating the DModX values of new observations, I noticed that I get different results depending on how many new observations I evaluate:

```r
> DModX(pca.mod, dat = test, newdata = TRUE, type = "normalized")
 Maserati Bora        Valiant      Merc 280C Toyota Corolla       Merc 230 
     3.6666773      0.9110368      1.7617564      1.7419626      1.6084993

> DModX(pca.mod, dat = test[1:3,], newdata = TRUE, type = "normalized")
Maserati Bora       Valiant     Merc 280C 
     4.218257      1.048084      2.026778 
```

 You also get the same value for a single new observation (no matter which observation):
```r
> DModX(pca.mod, dat = test[1,], newdata = TRUE, type = "normalized")
Maserati Bora 
     4.795832 

> DModX(pca.mod, dat = test[3,], newdata = TRUE, type = "normalized")
Merc 280C 
 4.795832 
```

Finally, it only seems to occur for normalized DModX values (type = "normalized"), which is the default setting. The example below show that the same DModX values are obtained when using absolute DModX values (type = "absolute").
```r
> DModX(pca.mod, dat = test, newdata = TRUE, type = "absolute")
 Maserati Bora        Valiant      Merc 280C Toyota Corolla       Merc 230 
     27.268176       6.775156      13.101749      12.954547      11.962013 

> DModX(pca.mod, dat = test[1:3,], newdata = TRUE, type = "absolute")
Maserati Bora       Valiant     Merc 280C 
    27.268176      6.775156     13.101749
```

In the documentation you write that the _"Normalized values are adjusted to the total RSD of the model"_. Hence, as I am providing the same PCA model, the generated DModX value of a single new observation should not vary depending on the set of new observations that are projected into the model space.

When looking at the function code (pasted below), I have a guess on where this issue might arise. In the calculation of the normalization factor `s0` it looks like the squared residuals of the **new observations** are summed (`sum(E2)`). Should this not be the sum of the squared residuals of the **training data** (i.e., `sum(resid(object, object@completeObs)^2)`)?

```r
setMethod("DModX", "pcaRes",
          function(object, dat, newdata=FALSE, type=c("normalized","absolute"), ...) {
            type <- match.arg(type)
            if(missing(dat)) {
              if(!is.null(completeObs(object)))
                dat <- completeObs(object)
              else stop("missing data when calculating DModX")
            }
            A0 <- as.integer(centered(object))
            ny <- ifelse(newdata, 1,
                         sqrt(nObs(object) /
                              (nObs(object) - nP(object) - A0)))
            E2 <- resid(object, dat)^2
            s <- sqrt(rowSums(E2) / (nVar(object) - nP(object))) * ny
            if(type == "absolute")
              return(s)
            s0 <- sqrt(sum(E2) / ((nObs(object) - nP(object) - A0) *
                                  (nVar(object) - nP(object))))
            s / s0
          })
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DModX values are normalized on RSD of new observations instead of RSD of the model? #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

DModX values are normalized on RSD of new observations instead of RSD of the model? #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions