Impact of genetic features of Pseudomonas putida in its bioremediation potential

This repository collects the scripts and pipelines used for the determination of the features that determine Pseudomonas putida's bioremediation potential.

Requirements

Prokka
Roary
chewBBACA
local BLASTn

Phylogenetic, functional and pan genomic analysis

The genomic analysis pipeline has been described in the Pangenomic__phylogenetic_and_functional_analysis_pipeline.pdf document. It contains python and bash code.

Bioremediation potential determination

The determination of bioremediation potential of each strain, understood as the sum of the presence of different bioremediation-related genes, is resumed in the Bioremediation_potential_determination_pipeline.pdf document. This pipeline describes the steps taken in the assignment of a bioremediation potential value to each strain.

Apart from bioremediation-related genes the pipeline can also be modified to retrieve other types of genes in different genomes, if the genomic sequences of said genes are available.

ML approach of bioremediation potential prediction

The prediction of the bioremediation potential of different Pseudomona putida strains is resumed in the Bioremediation_potential_prediction.ipnyb file. The document contains a script with the steps we followed in order to explore the bioremediation potential data and genome metadata. The steps we have followed have been the following:

Data cleaning
Correlation analysis
Unsupervised learning
- Dimensionality reduction (PCA)
- PERMANOVA analysis
Supervised learning
- XGBoost model
- SHAP analysis

The metrics of the model are the following:

MAE: 2.89
MSE: 12.58
MAPE: 0.12
R2: 0.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Impact of genetic features of Pseudomonas putida in its bioremediation potential

Requirements

Phylogenetic, functional and pan genomic analysis

Bioremediation potential determination

ML approach of bioremediation potential prediction

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Impact of genetic features of Pseudomonas putida in its bioremediation potential

Requirements

Phylogenetic, functional and pan genomic analysis

Bioremediation potential determination

ML approach of bioremediation potential prediction