A reproducible computational framework for analyzing SSR (Simple Sequence Repeat) marker datasets in plant breeding and population genetics.
This repository provides a fully automated pipeline for:
- Marker quality control
- Genetic diversity estimation
- Population structure analysis
- Linkage disequilibrium network inference
- Genetic differentiation and gene flow analysis
The workflow is designed for crop genetics, molecular breeding, and population genomics studies.
This pipeline is suitable for:
- Genetic diversity analysis
- Germplasm characterization
- Population structure studies
- Molecular breeding programs
- Marker-assisted selection
- Plant population genetics
Example organisms:
- Rice
- Wheat
- Maize
- Barley
- Other crop species with SSR datasets
| Module | Description |
|---|---|
| Data Processing | Load and validate SSR marker datasets |
| Genetic Diversity | Estimate MAF, PIC, He, Shannon and Simpson indices |
| Population Structure | Compute genetic distances and multivariate ordination |
| Network Analysis | Construct linkage disequilibrium networks |
| Genetic Differentiation | Estimate Fst and gene flow (Nm) |
SSR-Genetic-Diversity-Pipeline
│
├── data
│ └── example_ssr_dataset.xlsx
│
├── pipeline
│ ├── data_processing.py
│ ├── genetic_diversity.py
│ ├── population_structure.py
│ ├── network_analysis.py
│ └── genetic_differentiation.py
│
├── notebooks
│ └── SSR_analysis_colab.ipynb
│
├── results
│
├── README.md
├── requirements.txt
└── LICENSE
SSR Marker Dataset
│
▼
Marker Quality Control
│
▼
Genetic Diversity Analysis
│
▼
Genetic Distance Estimation
│
▼
Population Structure Analysis
├── PCA
├── MDS
└── Hierarchical Clustering
│
▼
Linkage Disequilibrium Network
│
▼
Genetic Differentiation
├── Fst
└── Gene Flow (Nm)
The pipeline automatically generates:
results/
diversity_indices.csv
fst_nm_results.csv
dendrogram.png
PCA.png
MDS.png
LD_network.png
These outputs enable comprehensive interpretation of genetic diversity and population structure.
Clone the repository:
git clone https://github.com/yourusername/SSR-Genetic-Diversity-Pipeline.git
cd SSR-Genetic-Diversity-PipelineInstall dependencies:
pip install -r requirements.txtExample execution in Python:
from pipeline.data_processing import *
from pipeline.genetic_diversity import *
from pipeline.population_structure import *
from pipeline.network_analysis import *
from pipeline.genetic_differentiation import *
df = load_ssr_data("data/example_ssr_dataset.xlsx")
df_clean, dropped, mono = validate_markers(df)
diversity = analyze_ssr_diversity(df_clean)
dist = compute_jaccard(df_clean)
plot_dendrogram(dist, df_clean.index, "results")
pca_analysis(df_clean, "results")
mds_analysis(dist, "results")
ld_df = ld_analysis(df_clean)
plot_ld_network(ld_df, "results")
fst_nm = calculate_fst_nm(df_clean)If you use this pipeline in your research, please cite:
SSR Genetic Diversity & Population Structure Pipeline
GitHub Repository
Md Rezve Research Assistant — Plant Protection Lab Khulna University, Bangladesh
Research interests:
- Plant molecular genetics
- Population genomics
- Omics-driven breeding
- Computational biology
This project is released under the MIT License.