A comprehensive web application for analyzing codon usage patterns across echinoderm species using machine learning and statistical methods.
- Upload and process CDS FASTA files from multiple echinoderm species
- Calculate codon usage statistics and GC content
- Perform dimensionality reduction (PCA, t-SNE)
- Cluster analysis (K-means, Hierarchical)
- Machine learning classification
- Statistical analysis (ANOVA)
- Interactive visualizations
- Clone the repository:
git clone https://github.com/yourusername/EchinoML.git
cd EchinoML
- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install required packages:
pip install -r requirements.txt
- Start the Streamlit app:
streamlit run app.py
- Upload your CDS FASTA files (.fa.gz format) through the web interface
- Explore the various analysis sections:
- Basic Statistics
- GC Content Analysis
- Codon Usage Analysis
- Dimensionality Reduction
- Clustering Analysis
- Machine Learning Analysis
- Statistical Analysis
The app expects CDS (Coding Sequence) FASTA files in gzipped format (.fa.gz) from echinoderm species. Each file should contain protein-coding sequences.
The app provides:
- Interactive visualizations
- Statistical summaries
- Machine learning model performance metrics
- Downloadable results in CSV format
[Add your license here]
Contributions are welcome! Please feel free to submit a Pull Request.