|
30 | 30 | - **Networks**: `output/networks/<network_type>/` (e.g., `ppi/`, `regulatory/`, `pathways/`) |
31 | 31 | - **ML**: `output/ml/<task>/` (e.g., `classification/`, `regression/`, `features/`) |
32 | 32 | - **Multi-Omics**: `output/multiomics/<integration>/` (e.g., `integrated/`, `plots/`) |
| 33 | +- **Long Read**: `output/longread/<analysis_type>/` (e.g., `basecalling/`, `assembly/`, `methylation/`) |
| 34 | +- **Metagenomics**: `output/metagenomics/<analysis_type>/` (e.g., `amplicon/`, `assembly/`, `functional/`) |
| 35 | +- **Structural Variants**: `output/structural_variants/<analysis_type>/` (e.g., `detection/`, `annotation/`) |
| 36 | +- **Spatial**: `output/spatial/<analysis_type>/` (e.g., `clustering/`, `deconvolution/`, `integration/`) |
| 37 | +- **Pharmacogenomics**: `output/pharmacogenomics/<analysis_type>/` (e.g., `alleles/`, `clinical/`, `reports/`) |
33 | 38 |
|
34 | 39 | ## Path and I/O |
35 | 40 |
|
@@ -167,6 +172,11 @@ with io.open_text_auto("data/large_file.txt.gz") as f: |
167 | 172 | - **Networks Module**: Use prefix `NET_` (e.g., `NET_THREADS`, `NET_WORK_DIR`) |
168 | 173 | - **ML Module**: Use prefix `ML_` (e.g., `ML_THREADS`, `ML_WORK_DIR`, `ML_MODEL_DIR`) |
169 | 174 | - **Multi-Omics Module**: Use prefix `MULTI_` (e.g., `MULTI_THREADS`, `MULTI_WORK_DIR`) |
| 175 | +- **Long Read Module**: Use prefix `LR_` (e.g., `LR_THREADS`, `LR_WORK_DIR`) |
| 176 | +- **Metagenomics Module**: Use prefix `META_` (e.g., `META_THREADS`, `META_WORK_DIR`) |
| 177 | +- **Structural Variants Module**: Use prefix `SV_` (e.g., `SV_THREADS`, `SV_WORK_DIR`) |
| 178 | +- **Spatial Module**: Use prefix `SPATIAL_` (e.g., `SPATIAL_THREADS`, `SPATIAL_WORK_DIR`) |
| 179 | +- **Pharmacogenomics Module**: Use prefix `PHARMA_` (e.g., `PHARMA_THREADS`, `PHARMA_DB_PATH`) |
170 | 180 |
|
171 | 181 | ### Configuration File Structure |
172 | 182 | ```yaml |
@@ -202,7 +212,7 @@ def load_domain_config(config_file: str | Path, prefix: str = "DOMAIN") -> Domai |
202 | 212 | - RNA: `AmalgkitWorkflowConfig` with prefix `"AK"` |
203 | 213 | - GWAS: `GWASWorkflowConfig` with prefix `"GWAS"` |
204 | 214 | - Life Events: `LifeEventsWorkflowConfig` with prefix `"LE"` |
205 | | -- Other modules: Follow pattern `{MODULE}_` prefix (e.g., `DNA_`, `PROT_`, `EPI_`, `ONT_`, `PHEN_`, `ECO_`, `MATH_`, `INFO_`, `VIZ_`, `SIM_`, `SC_`, `QC_`, `NET_`, `ML_`, `MULTI_`) |
| 215 | +- Other modules: Follow pattern `{MODULE}_` prefix (e.g., `DNA_`, `PROT_`, `EPI_`, `ONT_`, `PHEN_`, `ECO_`, `MATH_`, `INFO_`, `VIZ_`, `SIM_`, `SC_`, `QC_`, `NET_`, `ML_`, `MULTI_`, `LR_`, `META_`, `SV_`, `SPATIAL_`, `PHARMA_`) |
206 | 216 |
|
207 | 217 | ## Code Quality Policy (STRICTLY NO MOCKS/FAKES/PLACEHOLDERS) |
208 | 218 |
|
@@ -303,6 +313,12 @@ Module-specific rules are organized in the `cursorrules/` directory. Each module |
303 | 313 | - `cursorrules/networks.cursorrules` - Biological network analysis |
304 | 314 | - `cursorrules/ml.cursorrules` - Machine learning for biological data |
305 | 315 | - `cursorrules/multiomics.cursorrules` - Multi-omic data integration |
| 316 | +- `cursorrules/longread.cursorrules` - Long-read sequencing (PacBio/Nanopore) |
| 317 | +- `cursorrules/metagenomics.cursorrules` - Metagenomic analysis (amplicon, shotgun) |
| 318 | +- `cursorrules/structural_variants.cursorrules` - CNV/SV detection and annotation |
| 319 | +- `cursorrules/spatial.cursorrules` - Spatial transcriptomics (Visium, MERFISH, Xenium) |
| 320 | +- `cursorrules/pharmacogenomics.cursorrules` - Clinical pharmacogenomics |
| 321 | +- `cursorrules/menu.cursorrules` - Interactive menu and discovery system |
306 | 322 |
|
307 | 323 | **See `cursorrules/README.md` for detailed information about the modular structure.** |
308 | 324 |
|
@@ -489,6 +505,19 @@ Each module should have: |
489 | 505 | - **Quality → All**: Quality control for all data types |
490 | 506 | - **Simulation → All**: Synthetic data generation for testing |
491 | 507 | - **Multi-Omics**: Integration of DNA, RNA, protein, epigenome, and other omics types |
| 508 | +- **Longread → DNA**: Long-read variant calling and genomic coordinates |
| 509 | +- **Longread → Epigenome**: Methylation from modified base detection |
| 510 | +- **Longread → Structural Variants**: SV detection complements short-read methods |
| 511 | +- **Metagenomics → Ecology**: Community diversity from amplicon/shotgun data |
| 512 | +- **Metagenomics → Networks**: Microbial co-occurrence networks |
| 513 | +- **Metagenomics → Ontology**: Functional annotation via GO/KEGG |
| 514 | +- **Structural Variants → DNA**: Genomic coordinates and variant calling |
| 515 | +- **Structural Variants → GWAS**: Structural variants in association studies |
| 516 | +- **Spatial → Single-Cell**: scRNA-seq reference for deconvolution |
| 517 | +- **Spatial → Networks**: Spatial interaction networks, ligand-receptor |
| 518 | +- **Pharmacogenomics → GWAS**: Variant data from association studies |
| 519 | +- **Pharmacogenomics → DNA**: Genomic coordinates and variant calling |
| 520 | +- **Pharmacogenomics → Phenotype**: Clinical phenotype data |
492 | 521 |
|
493 | 522 | ### Workflow Patterns |
494 | 523 | ```python |
|
0 commit comments