Skip to content

Conversation

KateSakharova
Copy link
Collaborator

@KateSakharova KateSakharova commented Sep 22, 2021

- Drep-subwf:
I tried to decrease number of returning files and intermediate copies

  1. drep step returns Cdb.csv and Mdb.csv files for split_drep and Sdb.csv to detect cluster reps in clusters
  2. split_drep creates split_text file to create clusters
    ex:
many_genomes:1_1:CAJJTO01.fa,CAJKGB01.fa,CAJLGA01.fa
many_genomes:2_1:CAJKRE01.fa,CAJKXJ01.fa
one_genome:3_0:CAJKRY01.fa
one_genome:4_0:CAJKXZ01.fa

Also this script creates mash files in mash_folder (as classify_drep step did before)
3) classify_drep uses text file to create many_genomes and one_genome folders. It does nothing with mash files

- GUNC
One-genome-subwf doesn't return all files with GUNC decisions (_complete.txt or _empty.txt). There is a step that generates two reports about complete genomes and genomes that didn't pass filtering.

- GTDB-Tk and rRNA input folder
Drep step doesn't return folder with dereplicated genomes. It returns Sdb.csv with scores. Those scores will be used to identify cluster representative in each cluster.
Added step to identify list of dreplicated genomes and genomes that passed GUNC filtering. Script takes split_text file from drep-subwf and identifies many-genomes clusters and genomes. Then using Sdb.csv script detects the best cluster representative. Those chosen genomes come to list of drep-filtered genomes.
Also script adds genomes from GUNC-report that passed filtering. Finally, this step creates folder with drep-filtered+GUNC-passed genomes. This folder goes as input to GTDB-Tk and rRNA detection.

Issues:

  1. condition for GTDB-Tk seems not working. This step was fully commented.
  2. drep-subwf doesn't work in main.cwl. All steps from drep-subwf were moved to main.cwl.
  3. mgyg + drep-subwf still don't work together. Singularity container mount/build failed
    Reduce bind mounting (and thus --bind args) in containers. common-workflow-language/cwltool#1387
singularity \
	    --quiet \
	    exec \
	    --contain \
	    --ipc \
	    --pid \
	    --home \
	    /tmp/5b6632eee34f5fd7bafb4df0fff87c91/9feb/e97b/tmped6yf0er/tmp-out1ovad92y:/ZyhnaJ \
	    --bind \
	    /tmp/5b6632eee34f5fd7bafb4df0fff87c91/9feb/e97b/tmp2apzt19papp1cjnv:/tmp:rw \
	    --bind \
	    /tmp/d7e49f9a67cc50f0850ee65330c22031/6c4b/dcca/tmpu4f2nbr4/out/mgyg_genomes:/var/lib/cwl/stg26ff51f8-440d-4afe-b649-b38b754ace9f/mgyg_genomes:ro \
	    --bind \
	    /hps/nobackup/rdf/metagenomics/toil-jobstore/genomes-pipeline-test/marine-12/files/for-job/kind-CWLJob/instance-yhofsng8/file-ae38c694f6044584b802c6595ec8de87/MGYG000296542.fa:/var/lib/cwl/stg26ff51f8-440d-4afe-b649-b38b754ace9f/mgyg_genomes/MGYG000296542.fa:ro \
...... all genomes .......
		--bind \
	    /hps/nobackup/rdf/metagenomics/toil-jobstore/genomes-pipeline-test/marine-12/files/for-job/kind-CWLJob/instance-yhofsng8/file-741f2433a0e94b7889d5ce1f7720011c/MGYG000296143.fa:/var/lib/cwl/stg26ff51f8-440d-4afe-b649-b38b754ace9f/mgyg_genomes/MGYG000296143.fa:ro \
	    --pwd \
	    /ZyhnaJ \
	    /hps/nobackup/rdf/metagenomics/singularity_cache/microbiomeinformatics_genomes-pipeline.genome-catalog-update:v1.sif \
	    generate_extra_weight_table.py \
	    -o \
	    extra_weight_table.txt \
	    -d \
	    /var/lib/cwl/stg26ff51f8-440d-4afe-b649-b38b754ace9f/mgyg_genomes
	WARNING: Overriding HOME environment variable with SINGULARITYENV_HOME is not permitted
	WARNING: skipping mount of /tmp/d7e49f9a67cc50f0850ee65330c22031/6c4b/dcca/tmpu4f2nbr4/out/mgyg_genomes: stat /tmp/d7e49f9a67cc50f0850ee65330c22031/6c4b/dcca/tmpu4f2nbr4/out/mgyg_genomes: no such file or directory
	FATAL:   container creation failed: mount /tmp/d7e49f9a67cc50f0850ee65330c22031/6c4b/dcca/tmpu4f2nbr4/out/mgyg_genomes->/var/lib/cwl/stg26ff51f8-440d-4afe-b649-b38b754ace9f/mgyg_genomes error: while mounting /tmp/d7e49f9a67cc50f0850ee65330c22031/6c4b/dcca/tmpu4f2nbr4/out/mgyg_genomes: mount source /tmp/d7e49f9a67cc50f0850ee65330c22031/6c4b/dcca/tmpu4f2nbr4/out/mgyg_genomes doesn't exist

Solution:
generate_extra_weight_table step is running without container

  1. Too long input for container - too many input files to mount. Decreasing number of files - solve problem
  • cat step (container was removed)
  • generate_extra_weight_table (container was removed)
  • generate_gunc_report (one intermediate step was added)

@KateSakharova KateSakharova self-assigned this Sep 22, 2021
@KateSakharova KateSakharova marked this pull request as draft September 22, 2021 16:08
@KateSakharova KateSakharova marked this pull request as ready for review September 24, 2021 12:58
@KateSakharova KateSakharova merged commit 1ec55cd into master Oct 28, 2021
@KateSakharova
Copy link
Collaborator Author

This subwf was also updated in further branch prepare-output, but subwf still doesn't work as part of pipeline (separate steps work)

@KateSakharova KateSakharova deleted the modify-drep-subwf branch October 29, 2021 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant