Skip to content

Commit 50a4f40

Browse files
committed
Updating a bit the documentation and small bug correction.
1 parent f467f77 commit 50a4f40

File tree

12 files changed

+234
-128
lines changed

12 files changed

+234
-128
lines changed

DESCRIPTION

Lines changed: 27 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,27 @@
1-
Package: EPIC
2-
Type: Package
3-
Title: Estimate the Proportion of Immune and Cancer cells
4-
Version: 1.1.6
5-
Authors@R: as.person(c(
6-
"Julien Racle <julien.racle@unil.ch> [aut, cre]",
7-
"David Gfeller <david.gfeller@unil.ch> [aut]"
8-
))
9-
Description: Package implementing EPIC method to estimate the proportion of
10-
immune, stromal, endothelial and cancer or other cells from bulk gene
11-
expression data.
12-
It is based on reference gene expression profiles for the main non-malignant
13-
cell types and it predicts the proportion of these cells and of the
14-
remaining "other cells" (that are mostly cancer cells) for which no
15-
reference profile is given.
16-
Depends:
17-
R (>= 3.2.0)
18-
License: file LICENSE
19-
LazyData: TRUE
20-
RoxygenNote: 7.2.1
21-
Suggests:
22-
testthat,
23-
knitr,
24-
rmarkdown
25-
Imports:
26-
stats
27-
VignetteBuilder: knitr
1+
Package: EPIC
2+
Type: Package
3+
Title: Estimate the Proportion of Immune and Cancer cells
4+
Version: 1.1.7
5+
Authors@R: as.person(c(
6+
"Julien Racle <julien.racle@unil.ch> [aut, cre]",
7+
"David Gfeller <david.gfeller@unil.ch> [aut]"
8+
))
9+
Description: Package implementing EPIC method to estimate the proportion of
10+
immune, stromal, endothelial and cancer or other cells from bulk gene
11+
expression data.
12+
It is based on reference gene expression profiles for the main non-malignant
13+
cell types and it predicts the proportion of these cells and of the
14+
remaining "other cells" (that are mostly cancer cells) for which no
15+
reference profile is given.
16+
Depends:
17+
R (>= 3.2.0)
18+
License: file LICENSE
19+
LazyData: TRUE
20+
RoxygenNote: 7.2.1
21+
Suggests:
22+
testthat,
23+
knitr,
24+
rmarkdown
25+
Imports:
26+
stats
27+
VignetteBuilder: knitr

NAMESPACE

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
# Generated by roxygen2: do not edit by hand
2-
3-
export(EPIC)
1+
# Generated by roxygen2: do not edit by hand
2+
3+
export(EPIC)

NEWS

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
Version 1.1.7
2+
------------------------------------------------------------------------
3+
* Small changes in the documentation (in particular, explaining in the
4+
README's FAQ section when to use the *mRNAProportions* or *cellFractions*).
5+
* Removed the warning message about unknown *mRNA_cell* values that was written
6+
nearly in all runs (writing the caution message about this directly in the FAQ
7+
section).
8+
* Corrected a bug when there were duplicated *empty* gene names (i.e., genes
9+
named simply as "").
10+
111
Version 1.1.6
212
------------------------------------------------------------------------
313
* Changed person of contact for commercial licenses to Nadette Bulgin.

R/EPIC_descr.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
#' estimate the proportion of immune, stromal, endothelial and cancer or other
66
#' cells from bulk gene expression data.
77
#'
8-
#' See the package \link[=../doc/info.html]{vignette} and function definitions
9-
#' below.
8+
#' See the package vignette (command in the R console: \emph{vignette("EPIC")} )
9+
#' and function definitions below.
1010
#'
1111
#' @section EPIC functions:
1212
#' \code{\link{EPIC}} is the main function to call to estimate the

R/EPIC_fun.R

Lines changed: 22 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,11 @@
112112
#' @return A list of 3 matrices:\describe{
113113
#' \item{\code{mRNAProportions}}{(\code{nSamples} x (\code{nCellTypes+1})) the
114114
#' proportion of mRNA coming from all cell types with a ref profile + the
115-
#' uncharacterized other cell.}
115+
#' uncharacterized other cell. Please note that if working with reconstructed
116+
#' in silico bulk samples built for example from single-cell RNA-seq data,
117+
#' then you should compare the 'true' proportions against these
118+
#' 'mRNAProportions', while if working with true bulk samples, then you should
119+
#' compare the cell proportions against the 'cellFractions'.}
116120
#' \item{\code{cellFractions}}{(\code{nSamples} x (\code{nCellTypes+1})) this
117121
#' gives the proportion of cells from each cell type after accounting for
118122
#' the mRNA / cell value.}
@@ -392,18 +396,20 @@ EPIC <- function(bulk, reference=NULL, mRNA_cell=NULL, mRNA_cell_sub=NULL,
392396
if (anyNA(tInds)){
393397
defaultInd <- match("default", names(mRNA_cell))
394398
if (is.na(defaultInd)){
395-
tStr <- paste(" and no default value is given for this mRNA per cell,",
396-
"so we cannot estimate the cellFractions, only",
397-
"the mRNA proportions")
399+
warning("mRNA_cell value unknown for some cell types: ",
400+
paste(colnames(mRNAProportions)[is.na(tInds)], collapse=", "),
401+
" and no default value is given for the mRNA per cell, so we cannot ",
402+
"estimate the cellFractions, only the mRNA proportions")
398403
} else {
399-
tStr <- paste(" - using the default value of", mRNA_cell[defaultInd],
400-
"for these but this might bias the true cell proportions from",
401-
"all cell types.")
404+
# warning("mRNA_cell value unknown for some cell types: ",
405+
# paste(colnames(mRNAProportions)[is.na(tInds)], collapse=", "),
406+
# " - using the default value of", mRNA_cell[defaultInd], " for these but ",
407+
# "this might bias the true cell proportions from all cell types.")
408+
# Not indicating this warning message as it comes about always if the
409+
# user doesn't define additional mRNA_cell values by himself. Instead,
410+
# I've indicated this warning in the documentation directly.
411+
tInds[is.na(tInds)] <- defaultInd
402412
}
403-
warning("mRNA_cell value unknown for some cell types: ",
404-
paste(colnames(mRNAProportions)[is.na(tInds)], collapse=", "),
405-
tStr)
406-
tInds[is.na(tInds)] <- defaultInd
407413
}
408414
cellFractions <- t( t(mRNAProportions) / mRNA_cell[tInds])
409415
cellFractions <- cellFractions / rowSums(cellFractions, na.rm=FALSE)
@@ -465,15 +471,17 @@ merge_duplicates <- function(mat, warn=TRUE, in_type=NULL){
465471
if (warn){
466472
warning("There are ", length(dupl_genes), " duplicated gene names",
467473
ifelse(!is.null(in_type), paste(" in the", in_type), ""),
468-
". We'll use the median value for each of these cases.")
474+
" (e.g., ", paste0("'", dupl_genes[1:(min(5, length(dupl_genes)))],
475+
"'", collapse=", "), "). We'll use the median value for ",
476+
"each of these cases.")
469477
}
470478
mat_dupl <- mat[rownames(mat) %in% dupl_genes,,drop=F]
471479
mat_dupl_names <- rownames(mat_dupl)
472480
mat <- mat[!dupl,,drop=F]
473481
# First put the dupl cases in a separate matrix and keep only the unique
474482
# gene names in the mat matrix.
475-
mat[dupl_genes,] <- t(sapply(dupl_genes, FUN=function(cgene)
476-
apply(mat_dupl[mat_dupl_names == cgene,,drop=F], MARGIN=2, FUN=median)))
483+
mat[match(dupl_genes, rownames(mat)),] <- t(sapply(dupl_genes, FUN=function(cgene)
484+
apply(mat_dupl[mat_dupl_names == cgene,,drop=F], MARGIN=2, FUN=stats::median)))
477485
}
478486
return(mat)
479487
}

README.Rmd

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,21 @@ and David Gfeller ([david.gfeller@unil.ch](mailto:david.gfeller@unil.ch)).
8484

8585

8686
## FAQ
87+
##### Which proportions returned by EPIC should I use?
88+
* EPIC is returning two proportion values: *mRNAProportions* and *cellFractions*,
89+
where the 2nd represents the true proportion of cells coming from the different
90+
cell types when considering differences in mRNA expression between cell types.
91+
So in principle, it is best to consider these *cellFractions*.
92+
93+
However, please note, that when the goal is to benchmark EPIC predictions, if
94+
the 'bulk samples' correspond in fact to in silico samples reconstructed for
95+
example from single-cell RNA-seq data, then it is usually better to compare the
96+
'true' proportions against the *mRNAProportions* from EPIC. Indeed, when
97+
building such in silico samples, the fact that different cell types express
98+
different amount of mRNA is usually not taken into account. On the other side,
99+
if working with true bulk samples, then you should compare the true cell
100+
proportions (measured e.g., by FACS) against the *cellFractions*.
101+
87102
##### What do the "*other cells*" represent?
88103
* EPIC predicts the proportions of the various cell types for which we have
89104
gene expression reference profiles (and corresponding gene signatures). But,
@@ -99,7 +114,7 @@ epithelial cells for example.
99114
Please make sure that your bulk data is in the form of a matrix (and also
100115
your reference gene expression profiles if using custom ones).
101116

102-
##### What is the meaning of the warning message telling that some mRNA_cell values are unknown?
117+
##### Is there some caution to consider about the *cellFractions* and *mRNA_cell* values?
103118
* As described in our manuscript, EPIC first estimates the proportion of mRNA
104119
per cell type in the bulk and then it uses the fact that some cell types have
105120
more mRNA copies per cell than other to normalize this and obtain an estimate of
@@ -108,10 +123,10 @@ if you need the one or the other). For this normalization we had either measured
108123
the amount of mRNA per cell or found it in the literature (fig. 1 – fig.
109124
supplement 2 of our paper). However we don’t currently have such values for the
110125
endothelial cells and CAFs. Therefore for these two cell types, we use an average
111-
value, which might not reflect their true value and this is the reason why we
112-
output this message. If you have some values for these mRNA/cell abundances, you
113-
can also add them into EPIC, with help of the parameter "*mRNA_cell*" or
114-
*mRNA_cell_sub*” (and that would be great to share these values).
126+
value, which might not reflect their true value and this could bias a bit the
127+
predictions, especially for these cell types. If you have some values for these
128+
mRNA/cell abundances, you can also add them into EPIC, with help of the parameter
129+
"*mRNA_cell*" or *mRNA_cell_sub*” (and that would be great to share these values).
115130

116131
If the mRNA proportions of these cell types are low, then even if you don't
117132
correct the results with their true mRNA/cell abundances, it would not really

README.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,24 @@ Julien Racle (<julien.racle@unil.ch>), and David Gfeller
8585

8686
## FAQ
8787

88+
##### Which proportions returned by EPIC should I use?
89+
90+
- EPIC is returning two proportion values: *mRNAProportions* and
91+
*cellFractions*, where the 2nd represents the true proportion of cells
92+
coming from the different cell types when considering differences in
93+
mRNA expression between cell types. So in principle, it is best to
94+
consider these *cellFractions*.
95+
96+
However, please note, that when the goal is to benchmark EPIC
97+
predictions, if the ‘bulk samples’ correspond in fact to in silico
98+
samples reconstructed for example from single-cell RNA-seq data, then
99+
it is usually better to compare the ‘true’ proportions against the
100+
*mRNAProportions* from EPIC. Indeed, when building such in silico
101+
samples, the fact that different cell types express different amount
102+
of mRNA is usually not taken into account. On the other side, if
103+
working with true bulk samples, then you should compare the true cell
104+
proportions (measured e.g., by FACS) against the *cellFractions*.
105+
88106
##### What do the “*other cells*” represent?
89107

90108
- EPIC predicts the proportions of the various cell types for which we
@@ -104,7 +122,7 @@ Julien Racle (<julien.racle@unil.ch>), and David Gfeller
104122
matrix (and also your reference gene expression profiles if using
105123
custom ones).
106124

107-
##### What is the meaning of the warning message telling that some mRNA_cell values are unknown?
125+
##### Is there some caution to consider about the *cellFractions* and *mRNA_cell* values?
108126

109127
- As described in our manuscript, EPIC first estimates the proportion of
110128
mRNA per cell type in the bulk and then it uses the fact that some
@@ -115,11 +133,11 @@ Julien Racle (<julien.racle@unil.ch>), and David Gfeller
115133
mRNA per cell or found it in the literature (fig. 1 – fig. supplement
116134
2 of our paper). However we don’t currently have such values for the
117135
endothelial cells and CAFs. Therefore for these two cell types, we use
118-
an average value, which might not reflect their true value and this is
119-
the reason why we output this message. If you have some values for
120-
these mRNA/cell abundances, you can also add them into EPIC, with help
121-
of the parameter “*mRNA_cell*” or*mRNA_cell_sub*” (and that would be
122-
great to share these values).
136+
an average value, which might not reflect their true value and this
137+
could bias a bit the predictions, especially for these cell types. If
138+
you have some values for these mRNA/cell abundances, you can also add
139+
them into EPIC, with help of the parameter “*mRNA_cell*” or
140+
*mRNA_cell_sub*” (and that would be great to share these values).
123141

124142
If the mRNA proportions of these cell types are low, then even if you
125143
don’t correct the results with their true mRNA/cell abundances, it

inst/doc/EPIC.Rmd

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,21 @@ and David Gfeller ([david.gfeller@unil.ch](mailto:david.gfeller@unil.ch)).
8080

8181

8282
## FAQ
83+
##### Which proportions returned by EPIC should I use?
84+
* EPIC is returning two proportion values: *mRNAProportions* and *cellFractions*,
85+
where the 2nd represents the true proportion of cells coming from the different
86+
cell types when considering differences in mRNA expression between cell types.
87+
So in principle, it is best to consider these *cellFractions*.
88+
89+
However, please note, that when the goal is to benchmark EPIC predictions, if
90+
the 'bulk samples' correspond in fact to in silico samples reconstructed for
91+
example from single-cell RNA-seq data, then it is usually better to compare the
92+
'true' proportions against the *mRNAProportions* from EPIC. Indeed, when
93+
building such in silico samples, the fact that different cell types express
94+
different amount of mRNA is usually not taken into account. On the other side,
95+
if working with true bulk samples, then you should compare the true cell
96+
proportions (measured e.g., by FACS) against the *cellFractions*.
97+
8398
##### What do the "*other cells*" represent?
8499
* EPIC predicts the proportions of the various cell types for which we have
85100
gene expression reference profiles (and corresponding gene signatures). But,
@@ -95,7 +110,7 @@ epithelial cells for example.
95110
Please make sure that your bulk data is in the form of a matrix (and also
96111
your reference gene expression profiles if using custom ones).
97112

98-
##### What is the meaning of the warning message telling that some mRNA_cell values are unknown?
113+
##### Is there some caution to consider about the *cellFractions* and *mRNA_cell* values?
99114
* As described in our manuscript, EPIC first estimates the proportion of mRNA
100115
per cell type in the bulk and then it uses the fact that some cell types have
101116
more mRNA copies per cell than other to normalize this and obtain an estimate of
@@ -104,10 +119,10 @@ if you need the one or the other). For this normalization we had either measured
104119
the amount of mRNA per cell or found it in the literature (fig. 1 – fig.
105120
supplement 2 of our paper). However we don’t currently have such values for the
106121
endothelial cells and CAFs. Therefore for these two cell types, we use an average
107-
value, which might not reflect their true value and this is the reason why we
108-
output this message. If you have some values for these mRNA/cell abundances, you
109-
can also add them into EPIC, with help of the parameter "*mRNA_cell*" or
110-
*mRNA_cell_sub*” (and that would be great to share these values).
122+
value, which might not reflect their true value and this could bias a bit the
123+
predictions, especially for these cell types. If you have some values for these
124+
mRNA/cell abundances, you can also add them into EPIC, with help of the parameter
125+
"*mRNA_cell*" or *mRNA_cell_sub*” (and that would be great to share these values).
111126

112127
If the mRNA proportions of these cell types are low, then even if you don't
113128
correct the results with their true mRNA/cell abundances, it would not really

inst/doc/EPIC.html

Lines changed: 31 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
<meta name="author" content="Julien Racle and David Gfeller" />
1414

15-
<meta name="date" content="2023-03-13" />
15+
<meta name="date" content="2023-07-12" />
1616

1717
<title>EPIC package</title>
1818

@@ -340,7 +340,7 @@
340340

341341
<h1 class="title toc-ignore">EPIC package</h1>
342342
<h4 class="author">Julien Racle and David Gfeller</h4>
343-
<h4 class="date">2023-03-13</h4>
343+
<h4 class="date">2023-07-12</h4>
344344

345345

346346

@@ -409,6 +409,26 @@ <h2>Contact information</h2>
409409
</div>
410410
<div id="faq" class="section level2">
411411
<h2>FAQ</h2>
412+
<div id="which-proportions-returned-by-epic-should-i-use" class="section level5">
413+
<h5>Which proportions returned by EPIC should I use?</h5>
414+
<ul>
415+
<li><p>EPIC is returning two proportion values: <em>mRNAProportions</em>
416+
and <em>cellFractions</em>, where the 2nd represents the true proportion
417+
of cells coming from the different cell types when considering
418+
differences in mRNA expression between cell types. So in principle, it
419+
is best to consider these <em>cellFractions</em>.</p>
420+
<p>However, please note, that when the goal is to benchmark EPIC
421+
predictions, if the ‘bulk samples’ correspond in fact to in silico
422+
samples reconstructed for example from single-cell RNA-seq data, then it
423+
is usually better to compare the ‘true’ proportions against the
424+
<em>mRNAProportions</em> from EPIC. Indeed, when building such in silico
425+
samples, the fact that different cell types express different amount of
426+
mRNA is usually not taken into account. On the other side, if working
427+
with true bulk samples, then you should compare the true cell
428+
proportions (measured e.g., by FACS) against the
429+
<em>cellFractions</em>.</p></li>
430+
</ul>
431+
</div>
412432
<div id="what-do-the-other-cells-represent" class="section level5">
413433
<h5>What do the “<em>other cells</em>” represent?</h5>
414434
<ul>
@@ -433,9 +453,9 @@ <h5>I receive an error message “<em>attempt to set ‘colnames’ on an
433453
ones).</li>
434454
</ul>
435455
</div>
436-
<div id="what-is-the-meaning-of-the-warning-message-telling-that-some-mrna_cell-values-are-unknown" class="section level5">
437-
<h5>What is the meaning of the warning message telling that some
438-
mRNA_cell values are unknown?</h5>
456+
<div id="is-there-some-caution-to-consider-about-the-cellfractions-and-mrna_cell-values" class="section level5">
457+
<h5>Is there some caution to consider about the <em>cellFractions</em>
458+
and <em>mRNA_cell</em> values?</h5>
439459
<ul>
440460
<li><p>As described in our manuscript, EPIC first estimates the
441461
proportion of mRNA per cell type in the bulk and then it uses the fact
@@ -446,11 +466,12 @@ <h5>What is the meaning of the warning message telling that some
446466
mRNA per cell or found it in the literature (fig. 1 – fig. supplement 2
447467
of our paper). However we don’t currently have such values for the
448468
endothelial cells and CAFs. Therefore for these two cell types, we use
449-
an average value, which might not reflect their true value and this is
450-
the reason why we output this message. If you have some values for these
451-
mRNA/cell abundances, you can also add them into EPIC, with help of the
452-
parameter “<em>mRNA_cell</em>” or “<em>mRNA_cell_sub</em>” (and that
453-
would be great to share these values).</p>
469+
an average value, which might not reflect their true value and this
470+
could bias a bit the predictions, especially for these cell types. If
471+
you have some values for these mRNA/cell abundances, you can also add
472+
them into EPIC, with help of the parameter “<em>mRNA_cell</em>” or
473+
<em>mRNA_cell_sub</em>” (and that would be great to share these
474+
values).</p>
454475
<p>If the mRNA proportions of these cell types are low, then even if you
455476
don’t correct the results with their true mRNA/cell abundances, it would
456477
not really have a big impact on the results. On the other side, if there

0 commit comments

Comments
 (0)