|
1 | 1 | # Assembly pipeline for Mytilus genomes |
2 | 2 |
|
| 3 | +Assembly pipeline from 10x chromium reads from the preprint |
| 4 | +"Three new genome assemblies of blue mussel lineages: North and South European Mytilus edulis and Mediterranean Mytilus galloprovincialis" bioRxiv ([https://doi.org/10.1101/2022.09.02.506387](https://doi.org/10.1101/2022.09.02.506387 )). |
| 5 | + |
3 | 6 | [`snakemake`](https://snakemake.readthedocs.io/en/stable/) (in a conda environnement for example) and |
4 | 7 | [`singularity`](https://github.com/hpcng/singularity) need to be installed. |
5 | 8 |
|
| 9 | +## Supernova storage workarounds |
| 10 | + |
| 11 | +Supernova use large amount of storage for temporary and final results. |
| 12 | + |
6 | 13 | The supernova results are stored on a distant NAS that needs to be mounted first on my system. |
7 | 14 | ``` |
8 | 15 | sshfs nas4:/share/sea/sea/projects/ref_genomes/assembly_10x/results/supernova_assemblies \ |
9 | 16 | results/supernova_assemblies \ |
10 | 17 | -o idmap=user,compression=no,uid=1000,gid=1000,allow_root |
11 | 18 | ``` |
12 | 19 |
|
13 | | -I also use a 4T disk as a temporary local storage for supernova computation |
| 20 | +I also used a 4T disk as a temporary local storage for supernova computation |
14 | 21 | `sudo mount /dev/sd[x]1 /data/ref_genomes/assembly_10x/tmp` |
15 | 22 |
|
16 | | -To run use: |
17 | | -``` |
18 | | -conda activate snake_env |
19 | | -
|
20 | | -snakemake --use-conda --conda-frontend mamba --conda-prefix .conda \ |
21 | | ---use-singularity --singularity-args "-B /nas_sea:/nas_sea" \ |
22 | | --j {threads} |
23 | | -``` |
24 | | - |
25 | | -Final versions are *_v6.pseudohap.fasta.gz and they correspond to: |
26 | | -- mgal_01 |
27 | | -- medu_01 |
28 | | -- mtro_01 |
29 | 23 |
|
30 | | -Another version of mtro is done, tros_v7, also called mtro_02 which is improved by LRScaf with nanopore reads, scaffolding on the *Mytilus coruscus* reference genome and Pilon corrections. |
| 24 | +## How to run |
31 | 25 |
|
| 26 | +To run use: |
32 | 27 | ``` |
33 | 28 | conda activate snake_env |
34 | 29 |
|
35 | | -snakemake --use-conda --conda-frontend mamba --conda-prefix .conda \ |
| 30 | +snakemake --use-conda \ |
36 | 31 | --use-singularity --singularity-args "-B /nas_sea:/nas_sea" \ |
37 | | --j {threads} mtro_improvement |
38 | | -``` |
39 | | - |
40 | | -## Calling for pop check |
41 | | - |
42 | | -This part uses another dataset of reference individuals called with angsd. |
43 | | -For comparison we also call with angsd (especially ANGSD puts major allele as REF in bcf and is therefore incompatible with bcftools call). |
44 | | -``` |
45 | | -ln -s /data2/myt_popgen/angsd_calling/results/post_analysis/subset.sites resources/angsd_subset.sites |
46 | | -ln -s /data2/myt_popgen/angsd_calling/results/post_analysis/subset.beagle.gz resources/angsd_ref_subset.beagle.gz |
| 32 | +-j {threads} \ |
| 33 | +[either all_v6, asm_improvement, stats, repeats, annotation, finalize or ncbi_submission (see workflow/Snakefile)] |
47 | 34 | ``` |
48 | 35 |
|
49 | | -## Annotation tools to build beforehand |
50 | | - |
51 | | -``` |
52 | | -sudo singularity build -F resources/cactus_v1.3.0-gpu.sif \ |
53 | | -docker://quay.io/comparative-genomics-toolkit/cactus:v1.3.0-gpu |
54 | | -
|
55 | | -sudo singularity build resources/cat.sif docker://quay.io/ucsc_cgl/cat |
56 | | -``` |
0 commit comments