01 fev 2024.
Python3.11 based -- Run Environment: Preferential Ubuntu / Linux64Bits
Dependencies:
- biopython
- pandas
- mmseqs
To install you can simply:
conda env create --name OPUs python=3.11
conda activate OPUs
pip install pandas biopython
mamba install -c bioconda mmseqs2
python3 main.py -h
Usage: main.py [options]
options:
-h, --help show this help message and exit
-ofolder OFOLDER Output folder path
-pfolder PFOLDER Project folder path
-annofolder ANNOFOLDER Annotation folder path
-annoxt ANNOXT Annotation file extension
-minseqid MINSEQID Minimum sequence identity (0.975 default)
-threads THREADS Number of threads (3 default)
-minocc MINOCC Minimum number of occurrences to keep a OPU (2 default)
-minlen MINLEN Minimum number of residues to consider true a protein (35 default)
-xt XT Protein sequences fasta file extension
-maxmem MAXMEM Maximum memory (4G default)
NOTE: The run with OPUs_pipe should processed with maxmem parameter set to a good ratio of RAM memory to be allocated. You may need at least 4GB of RAM to get results, otherwise the program gonna fail due to low memory resources.
As inputs you need:
- [i]. fasta files with the predicted proteins per sample, they can either be gzipped or xzipped;
- [ii]. TSV files from the emapper2 output for each protein sequences file.
Files from [i] and [ii] should be named with the same preffix, equivalent to the sample they belong to. You can also place them at the same folder or alternatively to separate folders for sequences and annotations.
You can get a detailed list and description of files generated by following the links below:
general_OPUs_annotation.tsv.xz
OPUs_cluster_relationship.tsv.xz
summarized_OPUs_annotation.tsv.xz
After installation, you can also test the results obtained with your installation, by running the script supplied in the testing folder.
You gonna need at least 4GB of RAM to process the testing.
cd testing/
./test_run.sh
It should return the status of your installation.