Skip to content

Commit f7a004d

Browse files
authored
Merge pull request #584 from Smeds/update-pretext
feat: updated version of pretext workflow developed by Delphine that takes HiFi and HiC as input.
2 parents 0eb35f0 + 7c5c96b commit f7a004d

File tree

5 files changed

+2925
-0
lines changed

5 files changed

+2925
-0
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
version: 1.2
2+
workflows:
3+
- name: main
4+
subclass: Galaxy
5+
publish: true
6+
primaryDescriptorPath: /hi-c-map-for-assembly-manual-curation.ga
7+
testParameterFiles:
8+
- /hi-c-map-for-assembly-manual-curation-tests.yml
9+
authors:
10+
- name: Patrik Smeds
11+
orcid: 0000-0001-6228-2785
12+
- name: Delphine Lariviere
13+
orcid: 0000-0001-6421-3484
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
# Changelog
2+
3+
## [1.0beta1] 2024-11-12
4+
5+
- Creation of a workflow for the generation of Hi-C Maps with coverage, gaps and Telomere Tracks
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Hi-C Contact map generation for manual curation of genome assemblies
2+
3+
This workflow generates Hi-C contact maps for diploid genome assemblies in the Pretext format. It includes tracks for PacBio read coverage, Gaps, and telomeres. The Pretext files can be open in PretextView for the manual curation of genome assemblies.
4+
5+
6+
## Inputs
7+
8+
1. **Haplotype 1** [fasta]
9+
2. **Will you use a second haplotype?**
10+
3. **Haplotype 2** [fasta]
11+
4. **Do you want to add suffixes to the scaffold names?** Select yes if the scaffold names in your assembly do not contain haplotype information.
12+
5. **Haplotype 1 suffix** This suffix will be added to haplotype 1 scaffold names if you selected to add suffixes to the scaffold names.
13+
6. **Haplotype 2 suffix** This suffix will be added to haplotype 2 scaffold names if you selected to add suffixes to the scaffold names.
14+
7. **Hi-C reads** [fastq] Paired Collection containing the Hi-D data
15+
8. **Do you want to trim the Hi-C data?** If *yes*, remove 5bp at the end of Hi-C reads. Use with Arima Hi-C data if the Hi-C map looks "noisy".
16+
9. **Telomere repeat to suit species** Expected value of the repeated sequences in the telomeres. Default value [CCCTAA] is suited to vertebrates.
17+
10. **PacBio reads** [fastq] Collection of PacBio reads.
18+
19+
20+
## Outputs
21+
22+
1. Concatenated Assembly [fasta] If two haplotypes are used.
23+
2. Trimmed Hi-C data (If trimming option is selected) [fastq]
24+
3. Mapped Hi-C reads [bam]
25+
4. Telomeres track [bedgraph]
26+
5. Gap track [bedgraph]
27+
6. Coverage track [bigwig]
28+
7. Pretext Map without tracks [pretext]
29+
8. Pretext Map with tracks [pretext]
30+
9. Pretext Snapshot image of the Hi-C contact map [png]
Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
- doc: Test outline for hi-c-map-for-assembly-manual-curation.ga 1
2+
job:
3+
Haplotype 1:
4+
class: File
5+
location: https://zenodo.org/records/14230702/files/Haplotype%202.fasta
6+
filetype: fasta
7+
Haplotype 2:
8+
class: File
9+
location: https://zenodo.org/records/14230702/files/Haplotype%202.fasta
10+
filetype: fasta
11+
Hi-C reads:
12+
class: Collection
13+
collection_type: list:paired
14+
elements:
15+
- class: Collection
16+
type: paired
17+
identifier: Hi-C reads
18+
elements:
19+
- identifier: forward
20+
class: File
21+
path: https://zenodo.org/records/14230702/files/HiC%20forward.fastqsanger.gz
22+
- identifier: reverse
23+
class: File
24+
path: https://zenodo.org/records/14230702/files/HiC%20reverse.fastqsanger.gz
25+
PacBio reads:
26+
class: Collection
27+
collection_type: list
28+
elements:
29+
- class: File
30+
identifier: PacBio reads.fastq.gz
31+
location: https://zenodo.org/records/14230702/files/PacBio%20reads.fastq.gz
32+
Do you want to add suffixes to the scaffold names?: true
33+
Will you use a second haplotype?: false
34+
First Haplotype suffix: H1
35+
Second Haplotype suffix: H2
36+
Do you want to trim the Hi-C data?: true
37+
Telomere repeat to suit species: CCCTAA
38+
outputs:
39+
Assembly for curation:
40+
asserts:
41+
has_text:
42+
text: ">scaffold_10.H1"
43+
Gaps Bed:
44+
asserts:
45+
has_text:
46+
text: "scaffold_10.H1 835498 835698"
47+
Seqtk-telo Output:
48+
asserts:
49+
has_text:
50+
text: "scaffold_10.H1 0 11012 139653677"
51+
Gaps Bedgraph:
52+
asserts:
53+
has_text:
54+
text: "scaffold_10.H1 835498 835698 200"
55+
BigWig Coverage:
56+
asserts:
57+
has_size:
58+
value : 100000
59+
delta: 40000
60+
Telomeres Bedgraph:
61+
asserts:
62+
has_text:
63+
text: "scaffold_10.H1 0 11012 11012"
64+
Merged Hi-C Alignments:
65+
asserts:
66+
has_size:
67+
value : 400000000
68+
delta: 50000000
69+
Pretext All tracks:
70+
asserts:
71+
has_size:
72+
value : 1700000
73+
delta: 500000
74+
- doc: Test outline for hi-c-map-for-assembly-manual-curation.ga 2
75+
job:
76+
Haplotype 1:
77+
class: File
78+
location: https://zenodo.org/records/14230702/files/Haplotype%201.fasta
79+
filetype: fasta
80+
Haplotype 2:
81+
class: File
82+
location: https://zenodo.org/records/14230702/files/Haplotype%201.fasta
83+
filetype: fasta
84+
Hi-C reads:
85+
class: Collection
86+
collection_type: list:paired
87+
elements:
88+
- class: Collection
89+
type: paired
90+
identifier: Hi-C reads
91+
elements:
92+
- identifier: forward
93+
class: File
94+
path: https://zenodo.org/records/14230702/files/HiC%20forward.fastqsanger.gz
95+
- identifier: reverse
96+
class: File
97+
path: https://zenodo.org/records/14230702/files/HiC%20reverse.fastqsanger.gz
98+
PacBio reads:
99+
class: Collection
100+
collection_type: list
101+
elements:
102+
- class: File
103+
identifier: PacBio reads.fastq.gz
104+
location: https://zenodo.org/records/14230702/files/PacBio%20reads.fastq.gz
105+
Do you want to add suffixes to the scaffold names?: true
106+
Will you use a second haplotype?: false
107+
First Haplotype suffix: H1
108+
Second Haplotype suffix: H2
109+
Do you want to trim the Hi-C data?: true
110+
Telomere repeat to suit species: CCCTAA
111+
outputs:
112+
Assembly for curation:
113+
asserts:
114+
has_text:
115+
text: ">scaffold_10.H1"
116+
Gaps Bed:
117+
asserts:
118+
has_text:
119+
text: "scaffold_10.H1 34145604 34145804"
120+
Gaps Bedgraph:
121+
asserts:
122+
has_text:
123+
text: "scaffold_10.H1 34145604 34145804 200"
124+
BigWig Coverage:
125+
asserts:
126+
has_size:
127+
value : 100000
128+
delta: 40000
129+
Merged Hi-C Alignments:
130+
asserts:
131+
has_size:
132+
value : 400000000
133+
delta: 50000000
134+
Pretext All tracks:
135+
asserts:
136+
has_size:
137+
value : 1600000
138+
delta: 500000
139+

0 commit comments

Comments
 (0)