Skip to content

Commit e864a04

Browse files
committed
update vignette, depricate computeStructuralMetrics
1 parent 4d5bfc1 commit e864a04

File tree

5 files changed

+16
-93
lines changed

5 files changed

+16
-93
lines changed

DESCRIPTION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Description: MsImpute is a package for imputation of peptide intensity in proteo
1515
MNAR ("v2-mnar"), or by Peptide Identity Propagation (PIP).
1616
Depends: R (> 4.1.0)
1717
SystemRequirements: python
18-
Imports: softImpute, methods, stats, graphics, pdist, reticulate,
18+
Imports: softImpute, methods, stats, graphics, pdist, LaplacesDemon,
1919
data.table, FNN, matrixStats, limma, mvtnorm,
2020
tidyr, dplyr
2121
License: GPL (>=2)

NAMESPACE

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
export(CPD)
44
export(KNC)
55
export(KNN)
6-
export(computeStructuralMetrics)
76
export(evidenceToMatrix)
87
export(msImpute)
98
export(mspip)

R/computeStructuralMetrics.R

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
#' Metrics for the assessment of post-imputation structural preservation
1+
#' Metrics for the assessment of post-imputation structural preservation
22
#'
3-
#' For an imputed dataset, it computes within phenotype/experimental condition similarity
3+
#' DEPRICATED. For an imputed dataset, it computes within phenotype/experimental condition similarity
44
#' (i.e. preservation of local structures), between phenotype distances
55
#' (preservation of global structures), and the Gromov-Wasserstein (GW)
66
#' distance between original (source) and imputed data.
@@ -52,7 +52,7 @@
5252
#' group <- as.factor(gsub("_[1234]", "", colnames(y)))
5353
#' computeStructuralMetrics(y, group, y=NULL)
5454
#'
55-
#' @export
55+
#'
5656
computeStructuralMetrics <- function(x, group=NULL, y = NULL, k=2){
5757
if(!is.null(group)){
5858
out <- list(withinness = log(withinness(x, group)),
@@ -114,8 +114,8 @@ gromov_wasserstein <- function(x, y, k, min.mean = 0.1){
114114

115115

116116
cat("Computing GW distance using k=", k, "Principal Components\n")
117-
reticulate::source_python(system.file("python", "gw.py", package = "msImpute"))
118-
return(gw(C1,C2, ncol(x)))
117+
# reticulate::source_python(system.file("python", "gw.py", package = "msImpute"))
118+
# return(gw(C1,C2, ncol(x)))
119119
}
120120

121121

man/computeStructuralMetrics.Rd

Lines changed: 2 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vignettes/msImpute-vignette.Rmd

Lines changed: 8 additions & 85 deletions
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,6 @@ The package consists of the following main functions:
4343

4444
- `findVariableFeatures`: finds peptide with high biological variance. We use this in `computeStructuralMetrics`
4545

46-
- `computeStructuralMetrics`: returns a number of metrics that measure distortions into the data after imputation.
47-
4846
- `plotCV2`: Plots the square of coefficient of variation versus average log-expression i.e. mean-$CV^2$ plot
4947

5048

@@ -59,7 +57,6 @@ The data was acquired in two batches (over two days). We are interested to know
5957

6058

6159
```{r setup, message=FALSE}
62-
library(reticulate)
6360
library(msImpute)
6461
library(limma)
6562
library(imputeLCMD)
@@ -236,80 +233,16 @@ condition of each sample.
236233
237234
y_qrilc <- impute.QRILC(y)[[1]]
238235
239-
y_msImpute <- msImpute(y, method = "v2-mnar", group = group)
240-
241-
242-
243236
group <- as.factor(sample_annot$group)
237+
design <- model.matrix(~0+group)
238+
y_msImpute <- msImpute(y, method = "v2-mnar", design = design)
244239
245240
```
246241

247-
## Assessment of preservation of local and global structures
248-
249-
If you've installed python, and have set up a python environment in your session, you can run this section to compute the GW distance. Please see the user's guide for setup instructions. Note that you can still run `computeStructuralMetrics` by setting `y=NULL`, if there are no python environments setup.
250-
251-
**Withinness, betweenness and Gromov-Wasserstein (GW) distance**
252-
253-
`computeStructuralMerics` returns three metrics that can be used to compare various imputation procedures:
254-
255-
- `withinness` is the sum of the squared distances between samples from the same experimental group (e.g. control, treatment, Het, WT). More specifically the similarity of the samples is measured by the distance of the (expression profile of the) sample from group centroid. This is a measure of preservation of local structures.
256-
257-
- `betweenness` is the sum of the squared distances between the experimental groups, more specifically the distance between group centroids. This is a measure of preservation of global structures.
258-
259-
- `gw_dist` is the Gromov-Wasserstein distance computed between Principal Components of imputed and source data. It is a measure of how well the structures are overall preserved over all principal axis of variation in the data. Hence, it captures preservation of both local and global structures. PCs of the source data are computed using highly variable peptides (i.e. peptides with high biological variance).
260-
261-
An ideal imputation method results in smaller `withinness`, larger `withinness` and smaller `gw_dist` among other imputation methods.
262242

263-
```{r eval=FALSE}
264-
virtualenv_create('msImpute-reticulate')
265-
virtualenv_install("msImpute-reticulate","scipy")
266-
virtualenv_install("msImpute-reticulate","cython")
267-
virtualenv_install("msImpute-reticulate","POT")
268-
269-
use_virtualenv("msImpute-reticulate")
270-
271-
top.hvp <- findVariableFeatures(y)
272-
273-
computeStructuralMetrics(y_msImpute, group, y[rownames(top.hvp)[1:50],], k = 16)
274-
```
275-
276-
```
277-
Computing GW distance using k= 16 Principal Components
278-
$withinness
279-
Mild Control Moderate Severe
280-
10.39139 11.53781 10.54993 10.46477
281-
282-
$betweenness
283-
[1] 11.50008
284-
285-
$gw_dist
286-
[1] 0.01717915
287-
```
288243

289244

290-
```{r eval=FALSE}
291-
computeStructuralMetrics(y_qrilc, group, y[rownames(top.hvp)[1:50],], k = 16)
292-
```
293-
294-
```
295-
Computing GW distance using k= 16 Principal Components
296-
$withinness
297-
Mild Control Moderate Severe
298-
10.34686 11.84049 10.62378 10.73958
299-
300-
$betweenness
301-
[1] 11.62664
302-
303-
$gw_dist
304-
[1] 0.008877501
305-
```
306-
307-
308-
309-
`Withinness` tends to be smaller by `msImpute`, which indicates that local structures are better preserved by these two methods. The `gw_dist` over all PCs for the two methods is very similar (rounded to 2 decimals). This suggests the enhancements in `v2-mnar` is just as good as left-censored MNAR methods such as `QRILC`. Note that `k` is set to the number of samples to capture all dimensions of the data.
310-
311-
312-
Also note that that, unlike `QRILC`, msImpute `v2-mnar` dose not drastically increase the variance of peptides (measured by squared coefficient of variation) post imputation.
245+
Note that that, unlike `QRILC`, msImpute `v2-mnar` dose not drastically increase the variance of peptides (measured by squared coefficient of variation) post imputation.
313246
```{r}
314247
par(mfrow=c(2,2))
315248
pcv <- plotCV2(y, main = "data")
@@ -395,24 +328,14 @@ missing peptides exhibit structured missing out of total number of partially obs
395328

396329

397330
```{r}
398-
y_msImpute_mar <- msImpute(y, method = "v2") # no need to specify group if data is MAR.
399-
y_msImpute_mnar <- msImpute(y, method = "v2-mnar", group = group)
400-
```
401-
402-
## Assessment of preservation of local and global structures
403-
404-
In this example, we do not compute `gw_dist` and only rely on `withinness` and `betweenness` metrics to assess imputation.
405-
```{r}
406-
407-
computeStructuralMetrics(y_msImpute_mar, group, y=NULL)
408-
```
331+
design <- model.matrix(~0+group)
332+
y_msImpute_mar <- msImpute(y, method = "v2") # no need to specify group/design if data is MAR.
333+
y_msImpute_mnar <- msImpute(y, method = "v2-mnar", design = design)
409334
410-
411-
```{r}
412-
computeStructuralMetrics(y_msImpute_mnar, group, y = NULL)
335+
# rank-2 approximation allowing peptides with less than 4 measurements
336+
y_msImpute_mnar <- msImpute(y, method = "v2-mnar", design = design, rank.max = 2, relax_min_obs = TRUE)
413337
```
414338

415-
There do not appear to be substantial difference between the two methods.
416339

417340

418341
Additionally, both of the method preserve variations in the data well:

0 commit comments

Comments
 (0)