(1) The prebuilt database worked well for the classify workflow but when I proceed to remove unclassified/ human portions it failed. should i just copy NCBI taxdump files here?
(2) Is it normal to have ~95% read as unclassified? What should I expect?
'''
classifiedRefiner 07a.metabuli/JL304_B27_2_classifications.tsv ../metabuli/gtdb+virus+human --threads 4 --remove-unclassified --report 1
Metabuli Version (commit): 1.1.1
Remove unclassified reads true
Exclude taxId as well as its children
Select taxId as well as its children
Select columns with number, (7:full lineage, generated if absent)
Make report of refined classification file true
Adjust classification to the specified rank
0: without higher rank, 1: with higher rank, 2: separate file for higher rank classification 0
Threads 4
Min. sequence similarity score 0
Loading nodes file ...File ../metabuli/gtdb+virus+human/nodes.dmp not found!
'''
(3) It also happened that the classify workflow only worked for trimmed reads using prebuilt gtdb_r220+virus+human but not for the prebuilt gtdb_r226. Any other use cases cause segfault. I wonder if it is caused by large data size limited RAM (remote server max: 500 Gb; input file size: ~5Gb , paired)
Tom