-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Hello,
I am using SpacePHARER for a project, and had a few questions about how query database size can affect predictmatch results. I ran spacepharer predictmatch for two query databases, both times against the same target database. Only 31% of hits from the first run (which used a smaller query database) were found in the second run.
#First run spacer query DB:
2,206 spacers from 38 genomes.
#First run results:
161 spacer hits to viral target DB.
#Second run spacer query DB:
15,730 spacers from 450 genomes.
#Second run results:
1,764 spacer hits to viral target DB.
50 of the 161 spacers with hits in the first run were retained in the second run output.
Main question
I am wondering why spacers from the smaller query database that had a hit in the first run are not present from the output of the second run which has the increased query database size. Does the --simple-best-hit setting affect this?
The tmp folder was emptied after each run.
Environment
SpacePHARER Version: 2.fc5e668
Conda
Ubuntu 16.04
#The same parameters are used for both runs:
--strand 2 --fmt 2 --fdr 0.01 --simple-best-hit 1 --use-all-table-starts 1 --translate 1 --search-type 1 --translation-table 11 --rescore-mode 0 --num-iterations 4 --cov 0.50 --e-profile 0.0001 -s 5.70 --report-pam 1 --gap-open 16 --gap-extend 2 --cov-mode 0 --min-seq-id 0.95 --max-seq-id 1.00 --orf-start-mode 1 --remove-tmp-files 0
I'm reading through the supplemental info on the bioRxiv paper to try and understand the algorithm better.
Thank you very much for your time and help.
Cheers,
Ian