Skip to content

Commit 0fbad98

Browse files
SantaMcCloudpaulzierepbernt-matthias
authored
Add fastq-groupmerge (#7424)
* save some work * Add fastq-groupmerge * finish wrapper * Update tools/fastq_groupmerge/.shed.yml Co-authored-by: paulzierep <[email protected]> * change long desc in .shed * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * change help text form param * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: paulzierep <[email protected]> * change the help section a bit * fix linting * delet uneeded param * wrong element name * Update fastq_groupmerge.xml * change wrapper because of parseing * change , to comma * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: M Bernt <[email protected]> * Update tools/fastq_groupmerge/fastq_groupmerge.xml Co-authored-by: M Bernt <[email protected]> * Update fastq_groupmerge.xml * Update metadata_single.csv * Update fastq_groupmerge.xml * change to a tabular test --------- Co-authored-by: paulzierep <[email protected]> Co-authored-by: M Bernt <[email protected]>
1 parent 6935ad9 commit 0fbad98

24 files changed

+280
-0
lines changed

tools/fastq_groupmerge/.shed.yml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
name: fastq_groupmerge
2+
owner: iuc
3+
description: A tool for merging fastq reads by metadata
4+
homepage_url: https://github.com/SantaMcCloud/fastq-groupmerge
5+
long_description: |
6+
This tool takes multiple fastq read files as input and merges them based on
7+
the provided metadata file (multiple groups allowed).
8+
It is designed to support workflows that require specific sample grouping
9+
- such as co- and group-assembly - or any other use cases where merging reads is necessary.
10+
remote_repository_url: https://github.com/galaxyproject/tools-iuc/tree/main/tools/fastq_groupmerge
11+
type: unrestricted
12+
categories:
13+
- Metagenomics
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
<tool id="fastq_groupmerge" name="Fastq groupmerge" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="@PROFILE@">
2+
<macros>
3+
<token name="@TOOL_VERSION@">1.0.1</token>
4+
<token name="@VERSION_SUFFIX@">0</token>
5+
<token name="@PROFILE@">25.0</token>
6+
</macros>
7+
<requirements>
8+
<requirement type="package" version="@TOOL_VERSION@">fastq-groupmerge</requirement>
9+
</requirements>
10+
<command detect_errors="exit_code">
11+
<![CDATA[
12+
13+
mkdir 'output' 'samples' &&
14+
15+
#if $input.is_select == "pair":
16+
#for $sample in $samples:
17+
ln -s '$sample.forward' 'samples/${sample.element_identifier}_forward.${sample.forward.ext}' &&
18+
ln -s '$sample.reverse' 'samples/${sample.element_identifier}_reverse.${sample.reverse.ext}' &&
19+
#end for
20+
#else:
21+
#for $sample in $samples:
22+
ln -s '$sample' 'samples/$sample.element_identifier.${sample.ext}' &&
23+
#end for
24+
#end if
25+
26+
fastq_groupmerge.py
27+
'samples'
28+
'output'
29+
#if $metadata:
30+
--metadata '$metadata'
31+
--group_col '$group_col'
32+
#if $metadata.ext == "csv"
33+
--sep ","
34+
#else
35+
--sep "\t"
36+
#end if
37+
#end if
38+
#if $input.is_select == 'pair':
39+
--forward_suffix '_forward'
40+
--reverse_suffix '_reverse'
41+
#else:
42+
--single_reads
43+
#end if
44+
45+
]]>
46+
</command>
47+
<inputs>
48+
<conditional name="input">
49+
<param name="is_select" type="select" label="Check type of fastq read library">
50+
<option value="single">Single reads</option>
51+
<option value="pair" selected="true">Paired reads</option>
52+
</param>
53+
<when value="single">
54+
<param name="samples" type="data_collection" collection_type="list" format="fastq,fastq.gz" label="Input single sample(s) read(s)"/>
55+
</when>
56+
<when value="pair">
57+
<param name="samples" type="data_collection" collection_type="list:paired" format="fastq,fastq.gz" label="Input paired sample(s) read(s) collection"/>
58+
</when>
59+
</conditional>
60+
<param argument="--metadata" type="data" multiple="false" format="tabular,csv,tsv" optional="true" label="Metadata table file" help="Metadata file with first column sample name and another column with group ID. Multiple grouping is allowed, see the help section. If no metadata table is provided, this tool will merge all samples!"/>
61+
<param argument="--group_col" type="text" value="group" label="Input the column name of the `group` column" help="The metadata file should contain two columns, one with the sample names and one with sample group ID. Use the same ID for samples that should be grouped. Look at the help section for more information!"/>
62+
</inputs>
63+
<outputs>
64+
<collection name="merged_samples_pairs" type="list:paired" label="${tool.name} on ${on_string}: Merged samples (pairs)">
65+
<discover_datasets pattern="(?P&lt;identifier_0&gt;[^_]+)_(?P&lt;identifier_1&gt;[^_]+)\.fastq.gz" ext="fastq.gz" directory="output"/>
66+
<filter>input['is_select'] == 'pair'</filter>
67+
</collection>
68+
<collection name="merged_samples_single" type="list" label="${tool.name} on ${on_string}: Merged samples (single)">
69+
<discover_datasets pattern="(?P&lt;identifier_0&gt;[^_]+)\.fastq.gz" ext="fastq.gz" directory="output"/>
70+
<filter>input['is_select'] == 'single'</filter>
71+
</collection>
72+
</outputs>
73+
<tests>
74+
<test expect_num_outputs="1">
75+
<conditional name="input">
76+
<param name="is_select" value="pair"/>
77+
<param name="samples">
78+
<collection type="list:paired">
79+
<element name="A1">
80+
<collection type="paired">
81+
<element name="forward" value="A1_forward.fastq.gz" ftype="fastq.gz"/>
82+
<element name="reverse" value="A1_reverse.fastq.gz" ftype="fastq.gz"/>
83+
</collection>
84+
</element>
85+
<element name="B1">
86+
<collection type="paired">
87+
<element name="forward" value="B1_forward.fastq" ftype="fastq"/>
88+
<element name="reverse" value="B1_reverse.fastq" ftype="fastq"/>
89+
</collection>
90+
</element>
91+
</collection>
92+
</param>
93+
</conditional>
94+
<param name="metadata" value="metadata_1.csv" ftype="tabular"/>
95+
<param name="group_col" value="TEST_COLUMN"/>
96+
<output_collection name="merged_samples_pairs" type="list:paired" count="2">
97+
<element name="control">
98+
<element name="forward" value="control_forward.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
99+
<element name="reverse" value="control_reverse.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
100+
</element>
101+
<element name="single">
102+
<element name="forward" value="single_forward.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
103+
<element name="reverse" value="single_reverse.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
104+
</element>
105+
</output_collection>
106+
</test>
107+
<test expect_num_outputs="1">
108+
<conditional name="input">
109+
<param name="is_select" value="pair"/>
110+
<param name="samples">
111+
<collection type="list:paired">
112+
<element name="A2">
113+
<collection type="paired">
114+
<element name="forward" value="A2_R1.fastq" ftype="fastq"/>
115+
<element name="reverse" value="A2_R2.fastq" ftype="fastq"/>
116+
</collection>
117+
</element>
118+
<element name="B2">
119+
<collection type="paired">
120+
<element name="forward" value="B2_R1.fastq" ftype="fastq"/>
121+
<element name="reverse" value="B2_R2.fastq" ftype="fastq"/>
122+
</collection>
123+
</element>
124+
</collection>
125+
</param>
126+
</conditional>
127+
<param name="metadata" value="metadata_2.csv" ftype="csv"/>
128+
<output_collection name="merged_samples_pairs" type="list:paired" count="1">
129+
<element name="treatment">
130+
<element name="forward" value="treatment_forward.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
131+
<element name="reverse" value="treatment_reverse.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
132+
</element>
133+
</output_collection>
134+
</test>
135+
<test expect_num_outputs="1">
136+
<conditional name="input">
137+
<param name="is_select" value="single"/>
138+
<param name="samples">
139+
<collection type="list">
140+
<element name="A1_forward" value="A1_forward.fastq.gz" ftype="fastq.gz"/>
141+
<element name="A1_reverse" value="A1_reverse.fastq.gz" ftype="fastq.gz"/>
142+
<element name="B1_forward" value="B1_forward.fastq" ftype="fastq"/>
143+
<element name="B1_reverse" value="B1_reverse.fastq" ftype="fastq"/>
144+
</collection>
145+
</param>
146+
</conditional>
147+
<param name="metadata" value="metadata_single.csv" ftype="csv"/>
148+
<output_collection name="merged_samples_single" type="list" count="1">
149+
<element name="Test" value="Test.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
150+
</output_collection>
151+
</test>
152+
<test expect_num_outputs="1">
153+
<conditional name="input">
154+
<param name="is_select" value="pair"/>
155+
<param name="samples">
156+
<collection type="list:paired">
157+
<element name="A1">
158+
<collection type="paired">
159+
<element name="forward" value="A1_forward.fastq.gz" ftype="fastq.gz"/>
160+
<element name="reverse" value="A1_reverse.fastq.gz" ftype="fastq.gz"/>
161+
</collection>
162+
</element>
163+
<element name="B1">
164+
<collection type="paired">
165+
<element name="forward" value="B1_forward.fastq" ftype="fastq"/>
166+
<element name="reverse" value="B1_reverse.fastq" ftype="fastq"/>
167+
</collection>
168+
</element>
169+
</collection>
170+
</param>
171+
</conditional>
172+
<output_collection name="merged_samples_pairs" type="list:paired" count="1">
173+
<element name="merged">
174+
<element name="forward" value="merged_forward.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
175+
<element name="reverse" value="merged_reverse.fastq.gz" ftype="fastq.gz" compare="sim_size"/>
176+
</element>
177+
</output_collection>
178+
</test>
179+
</tests>
180+
<help>
181+
<![CDATA[
182+
183+
**What does this tool**
184+
185+
This tool is designed to group sample fastq reads together based on a grouping defined in a metadata file.
186+
This tool can be used to support grouped-assembly. In some cases you want to group them in multiple ways. E.g. merge technical replicas but also merge samples from similar samples (e.g. all from the gut). To this end you can provide multiple groupings.
187+
188+
**Input**
189+
190+
- A collection of pair reads which can be in fastq or fastq format
191+
- OPTIONAL BUT RECOMMENDED: a metadata file either tab separated in format: tabular/tsv or comma-separated in format: csv
192+
193+
The metadata file can look look like this for example:
194+
195+
.. metadata table::
196+
197+
sample_id,group
198+
A1,control
199+
B1,control
200+
A1,A1
201+
Test,
202+
,Test
203+
204+
Important to this:
205+
206+
- The metadata file required to have a column sample_id with sample names (this are the pair name for example 'A1' is the pair collection name so 'A1' has to be written in the sample_id column) when using the pair collection option.
207+
- The column 'group' can be called anything. All samples with the same ID will be merged together in the output file. In the example file the output 'control_forward.fastq.gz' will contain the forward reads from 'A1' and 'B1'
208+
- When there is a empty entry in any column this line will be ignored!
209+
- When using the single read option note that in the 'sample_id' column the file name has to be stated completely therefore as example for input 'test_read.fastq' a line in the metadata table has to be 'test_read'
210+
- If metadata file is given only the sample reads stated in this file will be taken into account so you can also add the collection where other sample reads in this collection, they will be ignored if there are not stated in the metadata file!
211+
212+
**Output**
213+
214+
- For each group stated in the 'group' column a forward file [{group_name}_{forward_suffix}.fastq.gz] and a reverse file [{group_name}_{reverse_suffix}.fastq.gz] will be created
215+
- When no metadata is given all inputs which match to the 'forward_suffix' and 'reverse_suffix' will be merged together into one file each for forward and reverse!
216+
217+
]]>
218+
</help>
219+
<citations>
220+
<citation type="bibtex">@misc{BibEntry2025Oct,
221+
title = {{fastq-groupmerge}},
222+
author = {Santino Faack (SantaMcCloud)},
223+
journal = {GitHub},
224+
year = {2025},
225+
month = oct,
226+
url = {https://github.com/SantaMcCloud/fastq-groupmerge}
227+
}</citation>
228+
</citations>
229+
</tool>
214 Bytes
Binary file not shown.
213 Bytes
Binary file not shown.
219 Bytes
Binary file not shown.
218 Bytes
Binary file not shown.
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
@S0R226/1
2+
GGTTTACCAATTGAAAATGCAATTCAAAAATTAGGTGTCAATCGTAAAGAAATGCCGACATACGAATTTAGAGCACTTTGTGAGAAATATGCGCGTGAACAAGTTGCAATTCAAATCGGTGATTTTAAATCGTTAGGTAAAAGTGGGGAT
3+
+
4+
DAAGGGCGBIEHHKKGHKGKFKKKJGJKIGIIJJ@KHDHHKIJE@GGE=IKCEJKJKEEEEIG8EGJEICJGEE0EIE;3$HCDEA6EEDEF?E$DCE4<@$EEEEEEDEEDEEED$E)D@FDCDDEECEEEEEDCE=$;$EDECA;C4:
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
@S0R226/2
2+
ATGCCCTTCATCGCCATTACCGCAAATACACGAATTTGTGTTGCTTCAAAATCTTTATTGAGTGTTAAATAAGGATTCTTATAATCCCCAATTTTACATAACTATTTAAAATAAGCCATTTGAATTGCAATTTATTTATGAGCAGATTGC
3+
+
4+
<CAGGG@GIIIIEJKK9J$CK=KHKHKCKKJKIKJGKJKICJBFAJ=BA>CJA:<0EJADIGCAEIE@EEEE:G$BE=EEBEEDEEEBEEEBDFEEE$9CDE$EE$E@;B$C$EEEAC;E6E1DEECEED$;C$E$$EA;$BE5$D$@$=
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
@S0R15658/1
2+
AATGGGAAGATGATTTTAGGGAATATCTCAAAATGCTGGATGAGACTAAACCTGTAGTCTTATGTGGGGACTTAAGCGTCGCTCATAAAGAGATTGACTTGAAAAATCCTTCAGCGAATCGTAAAAACCCTGGCTTTAGTTATCAAGAAC
3+
+
4+
AADGGGEEIIHII<JCKKJKKKFFJFJKIJIJKKKII@JHKKKDKEEJKJJHEKJK$CIEH>JEKGIEGDGE@EE$K$EIEEEADC?DEEE?EEEECDC1E9EEEDEEDEE=)CCCEDE1BC??FE7?7CAEDEDAAC3E$EE$;;CEE$
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
@S0R11796/2
2+
AACCCCGTTGCTCACGACGGAATCGTTGTAAGTGATCTTTCTTAAGTTTTGTAATATTTTGACCAGTTACACCAATGGTAAGGCTGGTCACCGAGTCAATTGTGGAAATGCCATTAAGAAGCGTCGTTTTACCAGAACCTTATGGTCCCA
3+
+
4+
ADD>E$3E=I=?IHHJKKKJK$GKKFHKJGFKKK*KK:FKJGKCJCIH8I:KHDKDF>B$DEK<B$848DKEFIGD;EG@)$:ECGGE6?EDEEEEEE$$:FEEEDA$$??$?BEEAECCFAEE=DD4@D6CDEBCAECB$EC9$=EA$;

0 commit comments

Comments
 (0)