This repository contains a Python script for classifying multiple myeloma patients into ultra-high-risk and low-risk categories based on RNA-seq gene expression data and survival information.
- Expression data: TSV/CSV with GENE_ID as rows, sample IDs (e.g., MMRF_XXXX_1_BM) as columns. Values are TPM.
- Survival data: CSV with columns
public_id(e.g., MMRF_XXXX) andttcos(survival time in months).
pip install -r requirements.txt
python mmrc.py --expression_file path/to/exp.tpm.tsv --survival_file path/to/survival_months.csv --output_dir output/
- Intermediate CSV files and plots in the specified output directory.
- Console output with model performance.
- This is a simplified classification approach. For real clinical use, consider censoring in survival data and consult domain experts.
- Immunoglobulin genes are fetched via HGNC API and removed.
MIT License — see LICENSE for details.
Pull requests and suggestions are welcome! Please see CONTRIBUTING.md for guidelines.