Pathogenicity of DNA Sequences Predictor

This tool predicts the pathogenicity of sequences read from a FASTA file using a pre-trained RandomForest model. The sequences are encoded into k-mer frequencies, and the model outputs the probability of each sequence being pathogenic. Details see the report.

Features

Reads sequences from a FASTA file.
Encodes sequences into k-mer frequencies.
Uses a RandomForest model to predict pathogenicity.
Outputs predictions to a CSV file.

Requirements

Python 3.6+
pandas
Biopython
joblib

Installation

First, clone the repository or download the source code. Then, navigate to the project directory and install the required Python packages using pip:

pip install -r requirements.txt

Usage

To run the tool, you'll need to specify the path to the pre-trained RandomForest model, the input FASTA file, and the desired output CSV file for the predictions.

python patho_predict.py --model_path "path/to/model.joblib" --input_fasta "path/to/input.fasta" --output_csv "path/to/output.csv"

Parameters --model_path (required): Path to the pre-trained RandomForest model file (.joblib). --input_fasta (required): Path to the FASTA file containing sequences to predict. --output_csv (required): Path where the prediction output CSV file will be saved.

Output

The output CSV file will contain three columns:

sequence_id: The ID of the sequence from the FASTA file. prediction_value: The probability of the sequence being pathogenic. label: A label indicating "pathogenic" if the probability is 0.5 or higher, and "non-pathogenic" otherwise.

Demo

Run the script with the following command:

python patho_predict.py --model_path "demo/random_forest.joblib" --input_fasta "demo/demo.fna" --output_csv "result_demo.csv"

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
BacRefSeq		BacRefSeq
MAG		MAG
__pycache__		__pycache__
demo		demo
patho_predictor		patho_predictor
.gitignore		.gitignore
PathoVF.pdf		PathoVF.pdf
README.md		README.md
patho_predictor.py		patho_predictor.py
patho_predictor_demo.ipynb		patho_predictor_demo.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pathogenicity of DNA Sequences Predictor

Features

Requirements

Installation

Usage

Output

Demo

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pathogenicity of DNA Sequences Predictor

Features

Requirements

Installation

Usage

Output

Demo

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages