Skip to content

Latest commit

 

History

History
69 lines (51 loc) · 1.9 KB

File metadata and controls

69 lines (51 loc) · 1.9 KB

Proteus

  • Requires python and the machine learning package 'scikit-learn'
  • A slightly modified version of DISOPRED3.0 is distributed with this package
  • 'scikit-learn' may require additional packages to be installed
  • Also requires the uniref90.fasta database and its associated files in the folder DB
  • download the updated-most version of 'uniref90.fasta' file (sequence database) from the web (http://www.ebi.ac.uk/uniprot/database/download.html)
  • do a format database on it (formatdb) to generate the associated files
  • And empty DB directory is provided with the installation

Installation Notes for scikit-learn

This tutorial requires the following packages:

Installation of scikit-learn in Ubuntu 14.04

sudo apt-get install python-sklearn
sudo apt-get update sudo apt-get install build-essential python-dev python-setuptools python-numpy python-scipy libatlas-dev libatlas3gf-base pip install --user --install-option="--prefix=" -U scikit-learn

Installing Proteus

$ git clone https://github.com/bjornwallner/proteus
$ cd proteus
$ chmod +x proteus/run_proteus.py

The program has just one inputs

    1. A fasta file containing a single amino acid sequence in fasta format
Run Step:
$ ./proteus/run_proteus.py <basename.fasta>

EXAMPLE OUTPUT:

$ cat basename.seq.proteus
#Proteus v1.1
#pos res pred prob
1 M 1 0.518
2 R 1 0.561
3 V 0 0.439
4 K 0 0.416
5 E 0 0.438
6 I 0 0.439
7 R 0 0.392
8 K 0 0.427
9 N 0 0.405
10 Y 0 0.400

and a graphical representation (.png) of the same (Protean segment prediction score vs. Residue)

Example output graph