Skip to content

Query database size and predictmatch matches #1

@imrambo

Description

@imrambo

Hello,

I am using SpacePHARER for a project, and had a few questions about how query database size can affect predictmatch results. I ran spacepharer predictmatch for two query databases, both times against the same target database. Only 31% of hits from the first run (which used a smaller query database) were found in the second run.

#First run spacer query DB:
2,206 spacers from 38 genomes.

#First run results:
161 spacer hits to viral target DB.

#Second run spacer query DB:
15,730 spacers from 450 genomes.

#Second run results:
1,764 spacer hits to viral target DB.
50 of the 161 spacers with hits in the first run were retained in the second run output.

Main question

I am wondering why spacers from the smaller query database that had a hit in the first run are not present from the output of the second run which has the increased query database size. Does the --simple-best-hit setting affect this?

The tmp folder was emptied after each run.

Environment

SpacePHARER Version: 2.fc5e668
Conda
Ubuntu 16.04

#The same parameters are used for both runs:
--strand 2 --fmt 2 --fdr 0.01 --simple-best-hit 1 --use-all-table-starts 1 --translate 1 --search-type 1 --translation-table 11 --rescore-mode 0 --num-iterations 4 --cov 0.50 --e-profile 0.0001 -s 5.70 --report-pam 1 --gap-open 16 --gap-extend 2 --cov-mode 0 --min-seq-id 0.95 --max-seq-id 1.00 --orf-start-mode 1 --remove-tmp-files 0

I'm reading through the supplemental info on the bioRxiv paper to try and understand the algorithm better.

Thank you very much for your time and help.

Cheers,
Ian

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions