feat: 8 new tools, 4 new skills, 100-skill audit, reasoning frameworks#151
Open
feat: 8 new tools, 4 new skills, 100-skill audit, reasoning frameworks#151
Conversation
RGD_get_gene, RGD_search_genes, RGD_get_annotations, RGD_get_orthologs Search uses Alliance of Genome Resources API (RGD's own is unreliable). Tested: Brca1 (RGD:2218) — gene info, 530 disease annotations, 10 orthologs.
All 3 follow the reasoning-framework pattern with interpretation tables, evidence grading, computational procedures, and honest limitations. Lipidomics: - LIPID MAPS 8-category classification with biological role table - Key lipid pathways (sphingolipid, eicosanoid, steroid) mapped to KEGG - Disease interpretation framework (ceramide↑→Alzheimer's, oxPL↑→CVD) - Lipid class enrichment analysis procedure (scipy) Non-coding RNA: - miRNA/lncRNA/circRNA identification and classification - Target evidence grading (validated > high-confidence prediction > prediction) - lncRNA mechanism types (chromatin modifier, sponge, scaffold, enhancer) - Key ncRNA-disease associations table (miR-21, HOTAIR, MALAT1, etc.) Aging & Senescence: - 12 hallmarks of aging framework (Lopez-Otin 2023) with gene/pathway mapping - Senescence marker interpretation with caveats - Senolytic drug table (D+Q, navitoclax, fisetin) with clinical status - Geroprotector table (rapamycin, metformin, NAD+ precursors) - KEGG cellular senescence pathway (hsa04218) integration
…nings, Orphanet filter - Monarch: biolink:GeneToDiseaseAssociation → biolink:CausalGeneToDiseaseAssociation (old category returns HTTP 422) - Monarch: biolink:DiseaseToGeneAssociation → biolink:CorrelatedGeneToDiseaseAssociation - DisGeNET: add API key requirement warning + fallback to OpenTargets/Monarch - OMIM: add API key requirement warning + Monarch fallback - Orphanet: add substring match warning (BRCA1 also matches BAP1, BRCC3)
Lipidomics: - LIPIDMAPS_search → LipidMaps_search_by_name (correct registry name) - LIPIDMAPS_get_compound → LipidMaps_get_compound_by_id Aging/Senescence: - Add GWAS search limitation note (trait search works better than gene search) - DisGeNET_search_gene: param is gene= not query=, needs DISGENET_API_KEY
…tool - LNCipedia_search→LNCipedia_search_lncrna, LNCipedia_get_transcript→ LNCipedia_get_lncrna, LNCipedia_get_gene→LNCipedia_get_lncrna_xrefs, LNCipedia_list_transcripts→LNCipedia_search_ncrna_by_type, LNCipedia_get_sequence→LNCipedia_get_lncrna_publications - miRBase_get_mirna_targets does NOT exist; replaced with PubMed literature search + built-in reference table for common oncomiR targets - GTEx param: gene→gene_symbol
Lipidomics: - HMDB params: query→compound_name for both HMDB_search and HMDB_get_metabolite - DisGeNET param: query→gene - Add LIPID MAPS search tips (species abbreviations may fail, use generic names or formula search as fallback) Aging/Senescence: - Reorder GWAS strategy: gwas_get_snps_for_gene first (gene-centric, works), gwas_search_associations second (trait-centric, "longevity" may return 0) - Add PubMed as essential fallback for centenarian studies not in GWAS Catalog (Willcox 2008, Flachsbart 2009 used targeted genotyping, not GWAS arrays)
New tools: T3DB_get_toxin, T3DB_search_toxins (XML API, no auth) ncRNA skill: TargetScan + miRTarBase download-and-process procedures
…oncology, systems-biology, pharmacogenomics disease-research (6→7/10): - OSL_get_efo_id→OSL_get_efo_id_by_disease_name - ols_search/get_efo_terms→ols_search_efo_terms, ols_get_efo_term - umls_search→umls_search_concepts, icd_search→icd_search_codes - snomed_search→snomed_search_concepts - HumanBase PPI→humanbase_ppi_analysis precision-oncology (7→8/10): - NvidiaNIM_alphafold2→alphafold_get_prediction (NvidiaNIM not in registry) systems-biology (6→7/10): - pc_search_pathways→PathwayCommons_search (2 occurrences) pharmacogenomics (9→9.5/10): - PharmGKB_get_clinical_annotations IS in registry (removed false "not available" note, fixed strikethrough in reference table)
rare-disease-diagnosis (6→7.5/10): - NvidiaNIM_alphafold2→alphafold_get_prediction - gnomAD_get_variant_frequencies→gnomad_get_variant (lowercase) drug-research (8→8.5/10): - FDA_OrangeBook_search→FDA_OrangeBook_search_drug target-research (8→8.5/10): - get_protein_metadata_by_pdb_id→RCSBData_get_entry - GtoPdb (bare)→GtoPdb_search_ligands Remaining 5 skills in batch audited clean (0 errors each): cancer-variant-interpretation 9/10, gwas-snp-interpretation 8/10, literature-deep-research 9/10, adverse-event-detection 9/10, regulatory-genomics 8/10
network-pharmacology: clinical_trials_get_details→get_clinical_trial_descriptions clinical-trial-matching: clinical_trials_get_details→get_clinical_trial_descriptions, clinical_trials_search→search_clinical_trials Batch 5: 6/8 skills clean (antibody-engineering, drug-drug-interaction, immunotherapy-response, protein-interactions, sequence-analysis, epigenomics-chromatin all scored 10/10)
spatial-omics: clinical_trials_search→search_clinical_trials, HuBMAP_Dataverse_get_dataset→HuBMAP_get_dataset precision-medicine-stratification: clinical_trials_search→search_clinical_trials clinical-trial-design: FDA_OrangeBook_search_drugs→FDA_OrangeBook_search_drug, gnomAD_search_gene_variants→gnomad_search_variants, gnomAD_get_variant_details→gnomad_get_variant Batch 6: 5/8 clean (drug-target-validation, variant-to-mechanism, multiomic-disease-characterization, gene-enrichment, rnaseq-deseq2 all 10/10)
proteomics-data-retrieval: MassIVE/ProteomeXchange _Dataverse_ artifacts spatial-transcriptomics: HuBMAP_Dataverse_get_dataset→HuBMAP_get_dataset protein-structure-retrieval: pdbe_get_molecules→pdbe_get_entry_molecules, pdbe_get_binding_sites→PDBe_KB_get_ligand_sites, download_pdb_structure_file→RCSBData_get_entry pharmacovigilance: PharmGKB_search_drug→PharmGKB_search_drugs (plural) Batch 7-9 (30 skills audited): 23 clean, 7 with fixes applied. Cumulative: 54 skills audited out of 100.
protein-modification-analysis: MassIVE_Dataverse→MassIVE_get_dataset structural-proteomics: ProteomeXchange_Dataverse→ProteomeXchange_get_dataset statistical-modeling: clinical_trials_search→search_clinical_trials rare-disease-diagnosis: gnomAD_get_variant→gnomad_get_variant (2 remaining) Full audit complete: 100 skills checked, all tool name issues resolved.
disease-research: add evidence grading (T1-T4), 5 synthesis questions for executive summary, cross-database concordance interpretation, conflicting data resolution table target-research: add Target Validation Scorecard (0-18 scale, 6 dimensions), GO/NO-GO interpretation rules (genetic evidence is strongest predictor, essential genes = poor targets) gwas-drug-discovery: add GWAS signal strength assessment (gold/strong/ moderate/weak), 4-step target prioritization decision tree (druggable? direction? effect size? precedent?), evidence integration scoring table
Computational vaccine design pipeline covering: - Antigen selection with prioritization criteria (surface/conservation/essentiality) - T-cell epitope prediction (MHC-I/II via IEDB NetMHCpan) - B-cell epitope prediction (linear + conformational) - Population coverage analysis with HLA supertype strategy - Conservation analysis across pathogen strains - Multi-epitope construct design with linker guidance - Binding affinity interpretation table (IC50 thresholds) - Population coverage targets (>90%=excellent, <50%=redesign) - Evidence grading (T1-T4 for vaccine evidence levels)
cancer-genomics-tcga: mutation frequency interpretation (>10%=driver), survival analysis guidance (HR, p-value, cohort caveats), CNV interpretation (focal vs arm-level), T1-T4 evidence grading drug-regulatory: approval pathway interpretation (505(b)(1) vs ANDA), Orange Book patent/exclusivity codes, DailyMed label section guide metabolomics: metabolite ID confidence levels (L1-L4), pathway enrichment interpretation, biomarker discovery criteria spatial-transcriptomics: spatial domain interpretation, cell-cell proximity significance (z-score thresholds), SVG interpretation (Moran's I thresholds) microbiome-research: alpha diversity (Shannon thresholds), beta diversity (PERMANOVA R^2), taxonomic composition significance, functional profiling (potential vs activity) sequence-retrieval: sequence quality tiers, accession type guidance (RefSeq vs GenBank routing), cross-database reconciliation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Comprehensive ToolUniverse improvement round covering new tools, new skills, full 100-skill audit, and reasoning framework upgrades.
New Tools (8)
New Skills (4)
Full 100-Skill Tool Name Audit
_Dataverse_artifacts,clinical_trials_search,gnomAD_casing,NvidiaNIM_alphafold2Reasoning Framework Upgrades (10 skills)
Added evidence grading, interpretation tables, synthesis questions, and scoring to:
disease-research, target-research, gwas-drug-discovery, cancer-genomics-tcga, drug-regulatory, metabolomics, spatial-transcriptomics, microbiome-research, sequence-retrieval
77/101 skills now have reasoning frameworks (up from ~30).
Test plan