Query database size and predictmatch matches

Hello,

I am using SpacePHARER for a project, and had a few questions about how query database size can affect predictmatch results. I ran spacepharer predictmatch for two query databases, both times against the same target database. Only 31% of hits from the first run (which used a smaller query database) were found in the second run. 

#First run spacer query DB:
2,206 spacers from 38 genomes. 

#First run results:
161 spacer hits to viral target DB.

#Second run spacer query DB:
15,730 spacers from 450 genomes.

#Second run results:
1,764 spacer hits to viral target DB.
50 of the 161 spacers with hits in the first run were retained in the second run output.

## Main question
I am wondering why spacers from the smaller query database that had a hit in the first run are not present from the output of the second run which has the increased query database size. Does the --simple-best-hit setting affect this?

The tmp folder was emptied after each run.

## Environment
SpacePHARER Version: 2.fc5e668
Conda
Ubuntu 16.04

#The same parameters are used for both runs:
--strand 2 --fmt 2 --fdr 0.01 --simple-best-hit 1 --use-all-table-starts 1 --translate 1 --search-type 1 --translation-table 11 --rescore-mode 0 --num-iterations 4 --cov 0.50 --e-profile 0.0001 -s 5.70 --report-pam 1 --gap-open 16 --gap-extend 2 --cov-mode 0 --min-seq-id 0.95 --max-seq-id 1.00 --orf-start-mode 1 --remove-tmp-files 0


I'm reading through the supplemental info on the bioRxiv paper to try and understand the algorithm better.

Thank you very much for your time and help. 

Cheers, 
Ian



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query database size and predictmatch matches #1

Main question

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Query database size and predictmatch matches #1

Description

Main question

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions