You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/o2plsda.Rmd
+6-32Lines changed: 6 additions & 32 deletions
Original file line number
Diff line number
Diff line change
@@ -40,12 +40,6 @@ The relation between $T$ and $U$ makes the joint part the joint part: $U = TB_U
40
40
In order to avoid overfitting of the model, the optimal number of latent variables for each model structure was estimated using group-balanced Monte Carlo cross-validation (MCCV). The package could use the group information when we select the best parameters with cross-validation. In cross-validation (CV) one minimizes a certain measure of error over some parameters that should be determined a priori. Here, we have three parameters: $(nc, nx, ny)$. A popular measure is the prediction error $||Y - \hat{Y}||$, where $\hat{Y}$ is a prediction of $Y$. In our case the O2PLS method is symmetric in $X$ and $Y$, so we minimize the sum of the prediction errors:
41
41
$||X - \hat{X}||+||Y - \hat{Y}||$.
42
42
43
-
And we also calculate the the average $Q^2$ values:
44
-
45
-
$Q^2$ = 1 - $err$ / $Var_{total}$;
46
-
47
-
$err$ = $Var_{expected}$ - $Var_{estimated}$;
48
-
49
43
Here $nc$ should be a positive integer, and $nx$ and $ny$ should be non-negative. The 'best' integers are then the minimizers of the prediction error.
50
44
51
45
The O2PLS-DA analysis was performed as described by Bylesjö et al. (2007); briefly, the O2PLS predictive variation [$TW^\top$, $UC^\top$] was used for a subsequent O2PLS-DA analysis. The Variable Importance in the Projection (VIP) value was calculated as a weighted sum of the squared correlations between the OPLS-DA components and the original variable.
@@ -81,39 +75,19 @@ set.seed(123)
81
75
## ncores : parallel paramaters for large datasets
82
76
cv <- o2cv(X,Y,1:5,1:3,1:3, group = group, nr_folds = 10)
83
77
#####################################
84
-
#The best parameters are nc = 5 , nx = 3 , ny = 3
78
+
#The best parameters are nc = 1, nx = 2, ny = 3
85
79
#####################################
86
-
#The Qxy is 0.08222935 and the RMSE is: 2.030108
80
+
#The the RMSE is: 1.98311611667341
87
81
#####################################
88
82
```
89
83
90
84
Then we can do the O2PLS analysis with nc = 5, nx = 3, ny =3. You can also select the best parameters by looking at the cross validation results.
0 commit comments