-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation
Description
Please consider the documentation at dib-lab/genome-grist#284 to rapidly include lists into a config file.
This is able to take the tsv sample below and import the list of Assembly Accession identifiers directly into the config file for the spacegraphcats workflow:
The config:
# This is the file path to the metadata file.
# In this case, the file is the full metadata
# output of the SRA Run Selector.
metadata_file_path: metadata/SraRunTable.txt
# Directories
workdir: ~/dissertation-project/seqs
outdir: dissertation-project/seqs
prevent_sra_download: False
# The kmer size within the database (`sourmash sig fileinfo`)
k_size:
- 21
- 31
# - 51 is too large for khmer abundtrimming
# Query genomes for spacegraphcats
query_genomes:
- GCA_000349525.1
query_radius:
- 1
- 5
- 10
# The amount to scale representative kmer set
scale:
- 1000
The tsv:
Assembly Accession Assembly Name Organism Name Annotation Name Assembly Stats Total Sequence Length Assembly Level Assembly Release Date WGS project accession
GCA_000143535.4 ASM14353v4 Botrytis cinerea B05.10 Annotation submitted by Syngenta Biotechnology, Inc. 42630066 Complete Genome 2015-02-05
GCF_000143535.2 ASM14353v4 Botrytis cinerea B05.10 Annotation submitted by Syngenta Biotechnology, Inc. 42630066 Complete Genome 2015-02-05
GCA_019186565.1 ASM1918656v1 Botrytis cinerea 42721243 Contig 2021-07-09 JAHHFM01
GCA_019186575.1 ASM1918657v1 Botrytis cinerea 42739314 Contig 2021-07-09 JAHHFN01
GCA_031205075.1 Bcin_M3a_1.1 Botrytis cinerea 43592014 Contig 2023-09-07 JARWBL01
GCA_015148055.1 ASM1514805v1 Botrytis cinerea 41439596 Contig 2020-10-30 JACVFN01
The code:
awk -F'\t' 'NR>1 && NF {print " - " $1}' assembly-test.tsv | sed "/query_genomes:/r /dev/stdin" -i sgc-prep-config.yml
The updated config:
# This is the file path to the metadata file.
# In this case, the file is the full metadata
# output of the SRA Run Selector.
metadata_file_path: metadata/SraRunTable.txt
# Directories
workdir: ~/dissertation-project/seqs
outdir: dissertation-project/seqs
prevent_sra_download: False
# The kmer size within the database (`sourmash sig fileinfo`)
k_size:
- 21
- 31
# - 51 is too large for khmer abundtrimming
# Query genomes for spacegraphcats
query_genomes:
- GCA_000143535.4
- GCF_000143535.2
- GCA_019186565.1
- GCA_019186575.1
- GCA_031205075.1
- GCA_015148055.1
- GCA_000349525.1
query_radius:
- 1
- 5
- 10
# The amount to scale representative kmer set
scale:
- 1000
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
documentationImprovements or additions to documentationImprovements or additions to documentation