Skip to content

Commit 343883e

Browse files
authored
Add files via upload
1 parent a5da22f commit 343883e

File tree

1 file changed

+6
-32
lines changed

1 file changed

+6
-32
lines changed

vignettes/o2plsda.Rmd

Lines changed: 6 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,6 @@ The relation between $T$ and $U$ makes the joint part the joint part: $U = TB_U
4040
In order to avoid overfitting of the model, the optimal number of latent variables for each model structure was estimated using group-balanced Monte Carlo cross-validation (MCCV). The package could use the group information when we select the best parameters with cross-validation. In cross-validation (CV) one minimizes a certain measure of error over some parameters that should be determined a priori. Here, we have three parameters: $(nc, nx, ny)$. A popular measure is the prediction error $||Y - \hat{Y}||$, where $\hat{Y}$ is a prediction of $Y$. In our case the O2PLS method is symmetric in $X$ and $Y$, so we minimize the sum of the prediction errors:
4141
$||X - \hat{X}||+||Y - \hat{Y}||$.
4242

43-
And we also calculate the the average $Q^2$ values:
44-
45-
$Q^2$ = 1 - $err$ / $Var_{total}$;
46-
47-
$err$ = $Var_{expected}$ - $Var_{estimated}$;
48-
4943
Here $nc$ should be a positive integer, and $nx$ and $ny$ should be non-negative. The 'best' integers are then the minimizers of the prediction error.
5044

5145
The O2PLS-DA analysis was performed as described by Bylesjö et al. (2007); briefly, the O2PLS predictive variation [$TW^\top$, $UC^\top$] was used for a subsequent O2PLS-DA analysis. The Variable Importance in the Projection (VIP) value was calculated as a weighted sum of the squared correlations between the OPLS-DA components and the original variable.
@@ -81,39 +75,19 @@ set.seed(123)
8175
## ncores : parallel paramaters for large datasets
8276
cv <- o2cv(X,Y,1:5,1:3,1:3, group = group, nr_folds = 10)
8377
#####################################
84-
# The best parameters are nc = 5 , nx = 3 , ny = 3
78+
#The best parameters are nc = 1, nx = 2, ny = 3
8579
#####################################
86-
# The Qxy is 0.08222935 and the RMSE is: 2.030108
80+
#The the RMSE is: 1.98311611667341
8781
#####################################
8882
```
8983

9084
Then we can do the O2PLS analysis with nc = 5, nx = 3, ny =3. You can also select the best parameters by looking at the cross validation results.
9185
```{r}
92-
fit <- o2pls(X,Y,5,3,3)
86+
fit <- o2pls(X,Y,1,2,3)
9387
summary(fit)
94-
######### Summary of the O2PLS results #########
95-
### Call o2pls(X, Y, nc= 5 , nx= 3 , ny= 3 ) ###
96-
### Total variation
97-
### X: 4900 ; Y: 4900 ###
98-
### Total modeled variation ### X: 0.286 ; Y: 0.304 ###
99-
### Joint, Orthogonal, Noise (proportions) ###
100-
# X Y
101-
#Joint 0.176 0.192
102-
#Orthogonal 0.110 0.112
103-
#Noise 0.714 0.696
104-
### Variation in X joint part predicted by Y Joint part: 0.906
105-
### Variation in Y joint part predicted by X Joint part: 0.908
106-
### Variation in each Latent Variable (LV) in Joint part:
107-
# LV1 LV2 LV3 LV4 LV5
108-
#X 181.764 179.595 191.210 152.174 157.819
109-
#Y 229.308 204.829 175.926 173.382 155.934
110-
### Variation in each Latent Variable (LV) in X Orthogonal part:
111-
# LV1 LV2 LV3
112-
#X 227.856 166.718 143.602
113-
### Variation in each Latent Variable (LV) in Y Orthogonal part:
114-
# LV1 LV2 LV3
115-
#Y 225.833 166.231 157.976
11688
89+
90+
############################################
11791
```
11892

11993
Extract the loadings and scores from the fit results and generated figures
@@ -127,7 +101,7 @@ plot(fit,type="loading",var="Xjoint", group=group,repel=F,rotation=TRUE)
127101

128102
Do the OPLSDA based on the O2PLS results and calculate the VIP values
129103
```{r}
130-
res <- oplsda(fit,group, nc=5)
104+
res <- oplsda(fit,group, nc=1)
131105
plot(res,type="score", group=group)
132106
vip <- vip(res)
133107
plot(res,type="vip", group = group, repel = FALSE,order=TRUE)

0 commit comments

Comments
 (0)