Skip to content

Prebuilt databases do not have nodes.dmp for the use of classifiedRefiner module; segfault if using gtdb_r226 instead of gtdb_r220+virus+human database #175

@TomKLHui

Description

@TomKLHui

(1) The prebuilt database worked well for the classify workflow but when I proceed to remove unclassified/ human portions it failed. should i just copy NCBI taxdump files here?
(2) Is it normal to have ~95% read as unclassified? What should I expect?

'''
classifiedRefiner 07a.metabuli/JL304_B27_2_classifications.tsv ../metabuli/gtdb+virus+human --threads 4 --remove-unclassified --report 1

Metabuli Version (commit): 1.1.1
Remove unclassified reads true
Exclude taxId as well as its children
Select taxId as well as its children
Select columns with number, (7:full lineage, generated if absent)
Make report of refined classification file true
Adjust classification to the specified rank
0: without higher rank, 1: with higher rank, 2: separate file for higher rank classification 0
Threads 4
Min. sequence similarity score 0

Loading nodes file ...File ../metabuli/gtdb+virus+human/nodes.dmp not found!
'''

(3) It also happened that the classify workflow only worked for trimmed reads using prebuilt gtdb_r220+virus+human but not for the prebuilt gtdb_r226. Any other use cases cause segfault. I wonder if it is caused by large data size limited RAM (remote server max: 500 Gb; input file size: ~5Gb , paired)

Tom

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions