Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### `Fixed`

- [#182](https://github.com/nf-core/seqinspector/pull/182) Keep modules diff to a minimum
- [#183](https://github.com/nf-core/seqinspector/pull/183) Fix tag collision warning message that was actually printed for every tag

### `Changed`

Expand Down
15 changes: 15 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,21 @@ sample4 path/to/run_dir/sample4_lane2_group3_r1.fq.gz path/to/run_dir co

Another [example samplesheet](../assets/samplesheet.csv) has been provided with the pipeline.

### tags

Tags can be used to group samples in special reports, for example in the MultiQC per tag report.
They are optional and can be used for any purpose you like.
For example, you could use them to group samples by experimental condition, or by sequencing run.
Tags are meant to be case-sensitive and should be separated by a colon (`:`) if you want to use multiple tags for a sample.
Some file systems are not case sensitive, e.g. on MacOS. We recommend precaution when using similar tags with different cases on such file systems.
A warning will be displayed if you have multiple tags that only differ in case, but the pipeline will not stop and will run as normal.

```bash
WARN: Tag name collision: [lane1, Lane1, LANE1]
WARN: Tag name collision: [group1, Group1]
WARN: Tag name collision: [test, Test, TEST]
```

## Running the pipeline

A typical command for running the pipeline is as follows:
Expand Down
34 changes: 11 additions & 23 deletions subworkflows/local/utils_nfcore_seqinspector_pipeline/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -104,46 +104,34 @@ workflow PIPELINE_INITIALISATION {
.toList()
.size()

channel.fromList(samplesheetToList(input, "${projectDir}/assets/schema_input.json"))
ch_samplesheet = channel.fromList(samplesheetToList(input, "${projectDir}/assets/schema_input.json"))
.toList()
.flatMap { item -> item.withIndex().collect { entry, idx -> entry + "${idx + 1}" } }
.map { meta, fastq_1, fastq_2, idx ->
def tags = meta.tags ? meta.tags.tokenize(":") : []
def pad_positions = [nr_samples.length(), 2].max()
def zero_padded_idx = idx.padLeft(pad_positions, "0")
def updated_meta = meta + [id: "${meta.sample}_${zero_padded_idx}", tags: tags]
if (!fastq_2) {
return [
updated_meta.id,
updated_meta + [single_end: true],
[fastq_1],
]
}
else {
return [
updated_meta.id,
updated_meta + [single_end: false],
[fastq_1, fastq_2],
]
}
def new_meta = [id: "${meta.sample}_${zero_padded_idx}"]
return [
new_meta.id,
meta + [id: new_meta.id, tags: tags, single_end: fastq_2 ? false : true],
fastq_2 ? [fastq_1, fastq_2] : [fastq_1],
]
}
.groupTuple()
.map { meta ->
validateInputSamplesheet(meta)
}
.map { meta -> validateInputSamplesheet(meta) }
.transpose()
.set { ch_samplesheet }

ch_samplesheet
.map { meta, _fastqs ->
meta.tags
[meta.tags]
}
.flatten()
.unique()
.map { tag_name -> [tag_name.toLowerCase(), tag_name] }
.map { tag -> [tag.toLowerCase(), tag] }
.groupTuple()
.map { _tag_lowercase, tags ->
if (tags.size() == 1) {
if (tags.size() != 1) {
log.warn("Tag name collision: " + tags)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we make this warning more clear? For example

Suggested change
log.warn("Tag name collision: " + tags)
log.warn("Tag name collision, these tags will be handled as one tag: " + tags)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's more, each of these tags will be a separate one

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe Tag name collision, on macs, these tags will be handled as one tag: ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah something like that is better

}
}
Expand Down
6 changes: 2 additions & 4 deletions tests/MiSeq.nf.test.snap
Original file line number Diff line number Diff line change
Expand Up @@ -699,11 +699,9 @@
"SAMPLE_PAIRED_END_1_01.bam:md5,ac811b20d56132f5e039b3e8fd217405",
"SAMPLE_PAIRED_END_2_02.bam:md5,57892fc4314e66e8e20de4f5d82f9b5d"
],
[
"WARN: Tag name collision: [Bpacificus]"
]
"No warnings"
],
"timestamp": "2026-02-19T16:07:01.63849991",
"timestamp": "2026-02-26T16:06:35.381277259",
"meta": {
"nf-test": "0.9.4",
"nextflow": "25.10.4"
Expand Down
6 changes: 6 additions & 0 deletions tests/NovaSeq6000.nf.test
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,12 @@ nextflow_pipeline {
params: [
input : pipelines_testdata_base_path + 'seqinspector/testdata/NovaSeq6000/samplesheet_no_rundirs.csv'
]
],
[
name: "NovaSeq6000 data test - tag collision",
params: [
input : pipelines_testdata_base_path + 'seqinspector/testdata/NovaSeq6000/samplesheet_tag_collision.csv'
]
]
]

Expand Down
Loading
Loading