Skip to content

Commit 6d3e353

Browse files
authored
Merge pull request #450 from atrigila/add_complex_contrasts
feat: allow usage of strings for `makeContrasts` in `DREAM`
2 parents 69669dd + 07ae87f commit 6d3e353

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+600
-243
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
### Added
99

10+
- [[#450](https://github.com/nf-core/differentialabundance/pull/441)] - Allow usage of strings for makeContrasts in DREAM. ([@atrigila](https://github.com/atrigila), review by [@pinin4fjords](https://github.com/pinin4fjords), [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst)).
1011
- [[#441](https://github.com/nf-core/differentialabundance/pull/441)] - Add dream differential module. ([@nschcolnicov](https://github.com/nschcolnicov) and [@alanmmobbs93](https://github.com/alanmobbs93), review by [@pinin4fjords](https://github.com/pinin4fjords), [@suzannejin](https://github.com/suzannejin) and [@grst](https://github.com/grst)).
1112
- [[#440](https://github.com/nf-core/differentialabundance/pull/440)] - Add handling for formula based contrasts. ([@nschcolnicov](https://github.com/nschcolnicov), review by [@pinin4fjords](https://github.com/pinin4fjords))
1213
- [[#437](https://github.com/nf-core/differentialabundance/pull/437)] - Add nf-tests to deseq2/differential, rmarkdownnotebook, and proteus modules. ([@nschcolnicov](https://github.com/nschcolnicov), review by [@TODO](TODO)).

assets/schema_contrasts.json

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
},
1616
"formula": {
1717
"type": "string",
18-
"pattern": "^~\\s*[a-zA-Z_][a-zA-Z0-9_]*(\\s*\\+\\s*[a-zA-Z_][a-zA-Z0-9_]*)*$"
18+
"pattern": "^~\\s*[a-zA-Z_][a-zA-Z0-9_]*(\\s*([:+*])\\s*[a-zA-Z_][a-zA-Z0-9_]*)*$"
1919
},
2020
"comparison": {
2121
"type": "array",
@@ -31,9 +31,12 @@
3131
},
3232
"minItems": 1,
3333
"uniqueItems": true
34+
},
35+
"make_contrasts_str": {
36+
"type": "string"
3437
}
3538
},
36-
"required": ["id", "comparison"],
39+
"required": ["id"],
3740
"additionalProperties": false
3841
},
3942
"minItems": 1

conf/test.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ params {
2020
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv'
2121
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv'
2222
transcript_length_matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.spoofed_lengths.tsv'
23-
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/formula_contrasts/SRP254919.contrasts.yaml'
23+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/SRP254919.contrasts.yaml'
2424

2525
// To do: replace this with a cut-down mouse GTF matching the matrix for testing
2626
gtf = 'https://ftp.ensembl.org/pub/release-81/gtf/mus_musculus/Mus_musculus.GRCm38.81.gtf.gz'

conf/test_affy.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ params {
2020

2121
// Input data
2222
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/differentialabundance/testdata/GSE50790.csv'
23-
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/formula_contrasts/GSE50790_contrasts.yaml'
23+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/GSE50790_contrasts.yaml'
2424
affy_cel_files_archive = 'https://raw.githubusercontent.com/nf-core/test-datasets/differentialabundance/testdata/GSE50790_RAW.tar'
2525

2626
// Observations

conf/test_full.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ params {
1616

1717
// Input data
1818
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/differentialabundance/testdata/rnaseq_featurecounts_sample_preparations.tsv'
19-
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/formula_contrasts/rnaseq_featurecounts_contrast_file.yaml'
19+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/rnaseq_featurecounts_contrast_file.yaml'
2020
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/differentialabundance/testdata/rnaseq_featurecounts_merged_gene_counts.tsv'
2121
gtf = 'https://ftp.ensembl.org/pub/release-81/gtf/mus_musculus/Mus_musculus.GRCm38.81.gtf.gz'
2222

conf/test_maxquant.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ params {
2121
// Input data
2222
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/proteomics/maxquant/MaxQuant_samplesheet.tsv'
2323
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/proteomics/maxquant/MaxQuant_proteinGroups.txt'
24-
contrasts_yml = 'https://github.com/nf-core/test-datasets/raw/refs/heads/differentialabundance/testdata/formula_contrasts/MaxQuant_contrasts.yaml'
24+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/MaxQuant_contrasts.yaml'
2525

2626
// Observations
2727
observations_id_col = 'Experiment'

conf/test_nogtf.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ params {
2222

2323
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv'
2424
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv'
25-
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/formula_contrasts/SRP254919.contrasts.yaml'
25+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/SRP254919.contrasts.yaml'
2626

2727
//Features
2828
features_metadata_cols = 'gene_id,gene_name'

conf/test_rnaseq_limma.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ params {
2222
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.samplesheet.csv'
2323
matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.salmon.merged.gene_counts.top1000cov.tsv'
2424
transcript_length_matrix = 'https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/mus_musculus/rnaseq_expression/SRP254919.spoofed_lengths.tsv'
25-
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/formula_contrasts/SRP254919.contrasts.yaml'
25+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/SRP254919.contrasts.yaml'
2626

2727
// To do: replace this with a cut-down mouse GTF matching the matrix for testing
2828
gtf = 'https://ftp.ensembl.org/pub/release-81/gtf/mus_musculus/Mus_musculus.GRCm38.81.gtf.gz'

conf/test_soft.config

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ params {
2020

2121
// Input
2222
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/differentialabundance/testdata/GSE50790.csv'
23-
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/formula_contrasts/GSE50790_contrasts.yaml'
23+
contrasts_yml = 'https://raw.githubusercontent.com/nf-core/test-datasets/refs/heads/differentialabundance/testdata/GSE50790_contrasts.yaml'
2424
querygse = 'GSE50790'
2525

2626
}

docs/usage.md

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -175,30 +175,58 @@ contrasts:
175175
blocking_factors: ["replicate"]
176176
```
177177
178-
Alternatively, the YAML contrasts also supports formula based model definitions:
178+
The necessary fields in order are:
179+
180+
- `id` - an arbitrary identifier, will be used to name contrast-wise output files
181+
- `comparison`(respectively):
182+
- - `variable` - which column from the observations information will be used to define groups
183+
- - `reference` - the base/ reference level for the comparison. If features have higher values in this group than target they will generate negative fold changes
184+
- - `target` - the target/ non-reference level for the comparison. If features have higher values in this group than the reference they will generate positive fold changes
185+
- `blocking_factors` - Any additional variables (also observation columns) that should be modelled alongside the contrast variable
186+
- `exclude_samples_col` and `exclude_samples_values` - the former being a valid column in the samples sheet, the latter a list of values in that column which should be used to select samples prior to differential modelling. This is helpful where certain samples need to be excluded prior to analysis of a given contrast.
187+
188+
Alternatively, the YAML contrasts also supports formula based model definitions for tools such as `VARIANCEPARTITION_DREAM`:
179189

180190
```yaml
181191
contrasts:
182192
- id: condition_control_treated
183193
formula: "~ condition"
184-
comparison: ["condition", "control", "treated"]
194+
make_contrasts_str: "conditiontreated"
185195
- id: condition_control_treated_blockrep
186196
formula: "~ condition + replicate"
187-
comparison: ["condition", "control", "treated"]
197+
make_contrasts_str: "conditiontreated"
188198
```
189199

190200
The necessary fields in order are:
191201

192-
- `id` - an arbitrary identifier, will be used to name contrast-wise output files
193-
- `comparison`(respectively):
194-
- - `variable` - which column from the observations information will be used to define groups
195-
- - `reference` - the base/ reference level for the comparison. If features have higher values in this group than target they will generate negative fold changes
196-
- - `target` - the target/ non-reference level for the comparison. If features have higher values in this group than the reference they will generate positive fold changes
202+
- `formula` - A string representation of the model formula. It is used to build the design matrix.
203+
- `make_contrasts_str` - An explicit literal contrast string (e.g., "treatmenthND6 - treatmentmCherry") that is passed directly to [`limma::makeContrasts()`](https://rdrr.io/bioc/limma/man/makeContrasts.html) in `VARIANCEPARTITION_DREAM`. The parameter names must be syntactically valid variable names in R (see [`make.names`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/make.names.html)). This field provides full control for complex designs. Requires `formula`.
197204

198-
You can optionally supply:
205+
> [!WARNING]
206+
> Formula-based contrasts are currently only supported by `VARIANCEPARTITION_DREAM`. They **do not work** with tools like `DESEQ2` or base `LIMMA` implementations yet.
199207

200-
- `blocking_factors` - Any additional variables (also observation columns) that should be modelled alongside the contrast variable
201-
- `exclude_samples_col` and `exclude_samples_values` - the former being a valid column in the samples sheet, the latter a list of values in that column which should be used to select samples prior to differential modelling. This is helpful where certain samples need to be excluded prior to analysis of a given contrast.
208+
> [!NOTE]
209+
>
210+
> #### Notes on `make_contrasts_str`
211+
>
212+
> This string must match exactly the name of the coefficient in the model matrix as generated by the specified `formula`. It is passed to `limma::makeContrasts()` without modification. For example:
213+
>
214+
> - `formula: "~ condition"` will generate model coefficients like `conditiontreated` (if `control` is the reference).
215+
> - Then, `make_contrasts_str: "conditiontreated"` selects that coefficient for testing.
216+
>
217+
> This gives full control over the contrast definition but requires understanding of the model matrix.
218+
219+
Beyond the basic one-factor comparison, the YAML contrasts format supports advanced experimental designs through the use of interaction terms and custom contrast strings. These are particularly useful in multifactorial experiments where the effect of one variable may depend on the level of another (e.g. genotype × treatment). To model an interaction between genotype and treatment, use a formula like `~ genotype * treatment`, which expands the yaml to:
220+
221+
```yaml
222+
contrasts:
223+
- id: genotype_WT_KO_treatment_Control_Treated
224+
formula: "~ genotype * treatment"
225+
comparison: ["genotype", "WT", "KO"]
226+
make_contrasts_str: "genotypeKO.treatmentTreated"
227+
```
228+
229+
To facilitate constructing and validating such models and contrast strings, consider using the [`ExploreModelMatrix`](https://www.bioconductor.org/packages/release/bioc/html/ExploreModelMatrix.html) Shiny app to have visual inspection of the design matrix and interactive contrast building. Another helpful resource is the [guide to creating design matrices for gene expression experiments](https://bioconductor.org/packages/release/workflows/vignettes/RNAseq123/inst/doc/designmatrices.html).
202230

203231
## Feature annotations
204232

0 commit comments

Comments
 (0)