Skip to content

feat: 8 new tools, 4 new skills, 100-skill audit, reasoning frameworks#151

Open
gasvn wants to merge 17 commits intomainfrom
fix/round-155-gaps
Open

feat: 8 new tools, 4 new skills, 100-skill audit, reasoning frameworks#151
gasvn wants to merge 17 commits intomainfrom
fix/round-155-gaps

Conversation

@gasvn
Copy link
Member

@gasvn gasvn commented Mar 26, 2026

Summary

Comprehensive ToolUniverse improvement round covering new tools, new skills, full 100-skill audit, and reasoning framework upgrades.

New Tools (8)

  • RGD (Rat Genome Database): 4 tools — get_gene, search_genes, get_annotations, get_orthologs
  • T3DB (Toxin Database): 2 tools — get_toxin (XML API), search_toxins
  • MyVariant_get_pathogenicity_scores: dedicated REVEL/AlphaMissense/SIFT lookup

New Skills (4)

  • tooluniverse-lipidomics: LIPID MAPS classification, sphingolipid/eicosanoid pathways, disease interpretation
  • tooluniverse-noncoding-rna: miRNA/lncRNA analysis with target prediction procedures
  • tooluniverse-aging-senescence: 12 hallmarks framework, senolytic drug tables, GWAS longevity loci
  • tooluniverse-vaccine-design: epitope prediction, population coverage, multi-epitope construct design

Full 100-Skill Tool Name Audit

  • Audited all 100 skills in 9 batches
  • Fixed ~50 wrong tool references across 20+ skills
  • Common patterns: _Dataverse_ artifacts, clinical_trials_search, gnomAD_ casing, NvidiaNIM_alphafold2

Reasoning Framework Upgrades (10 skills)

Added evidence grading, interpretation tables, synthesis questions, and scoring to:
disease-research, target-research, gwas-drug-discovery, cancer-genomics-tcga, drug-regulatory, metabolomics, spatial-transcriptomics, microbiome-research, sequence-retrieval

77/101 skills now have reasoning frameworks (up from ~30).

Test plan

  • RGD tools tested (Brca1, Tp53)
  • T3DB tool tested (Arsenic)
  • 4 new skills tested with real scientist questions
  • Tool name audit: 100/100 skills checked
  • Pre-commit hooks pass

gasvn added 17 commits March 25, 2026 22:34
RGD_get_gene, RGD_search_genes, RGD_get_annotations, RGD_get_orthologs
Search uses Alliance of Genome Resources API (RGD's own is unreliable).
Tested: Brca1 (RGD:2218) — gene info, 530 disease annotations, 10 orthologs.
All 3 follow the reasoning-framework pattern with interpretation tables,
evidence grading, computational procedures, and honest limitations.

Lipidomics:
- LIPID MAPS 8-category classification with biological role table
- Key lipid pathways (sphingolipid, eicosanoid, steroid) mapped to KEGG
- Disease interpretation framework (ceramide↑→Alzheimer's, oxPL↑→CVD)
- Lipid class enrichment analysis procedure (scipy)

Non-coding RNA:
- miRNA/lncRNA/circRNA identification and classification
- Target evidence grading (validated > high-confidence prediction > prediction)
- lncRNA mechanism types (chromatin modifier, sponge, scaffold, enhancer)
- Key ncRNA-disease associations table (miR-21, HOTAIR, MALAT1, etc.)

Aging & Senescence:
- 12 hallmarks of aging framework (Lopez-Otin 2023) with gene/pathway mapping
- Senescence marker interpretation with caveats
- Senolytic drug table (D+Q, navitoclax, fisetin) with clinical status
- Geroprotector table (rapamycin, metformin, NAD+ precursors)
- KEGG cellular senescence pathway (hsa04218) integration
…nings, Orphanet filter

- Monarch: biolink:GeneToDiseaseAssociation → biolink:CausalGeneToDiseaseAssociation
  (old category returns HTTP 422)
- Monarch: biolink:DiseaseToGeneAssociation → biolink:CorrelatedGeneToDiseaseAssociation
- DisGeNET: add API key requirement warning + fallback to OpenTargets/Monarch
- OMIM: add API key requirement warning + Monarch fallback
- Orphanet: add substring match warning (BRCA1 also matches BAP1, BRCC3)
Lipidomics:
- LIPIDMAPS_search → LipidMaps_search_by_name (correct registry name)
- LIPIDMAPS_get_compound → LipidMaps_get_compound_by_id

Aging/Senescence:
- Add GWAS search limitation note (trait search works better than gene search)
- DisGeNET_search_gene: param is gene= not query=, needs DISGENET_API_KEY
…tool

- LNCipedia_search→LNCipedia_search_lncrna, LNCipedia_get_transcript→
  LNCipedia_get_lncrna, LNCipedia_get_gene→LNCipedia_get_lncrna_xrefs,
  LNCipedia_list_transcripts→LNCipedia_search_ncrna_by_type,
  LNCipedia_get_sequence→LNCipedia_get_lncrna_publications
- miRBase_get_mirna_targets does NOT exist; replaced with PubMed
  literature search + built-in reference table for common oncomiR targets
- GTEx param: gene→gene_symbol
Lipidomics:
- HMDB params: query→compound_name for both HMDB_search and HMDB_get_metabolite
- DisGeNET param: query→gene
- Add LIPID MAPS search tips (species abbreviations may fail, use generic
  names or formula search as fallback)

Aging/Senescence:
- Reorder GWAS strategy: gwas_get_snps_for_gene first (gene-centric, works),
  gwas_search_associations second (trait-centric, "longevity" may return 0)
- Add PubMed as essential fallback for centenarian studies not in GWAS Catalog
  (Willcox 2008, Flachsbart 2009 used targeted genotyping, not GWAS arrays)
New tools: T3DB_get_toxin, T3DB_search_toxins (XML API, no auth)
ncRNA skill: TargetScan + miRTarBase download-and-process procedures
…oncology, systems-biology, pharmacogenomics

disease-research (6→7/10):
- OSL_get_efo_id→OSL_get_efo_id_by_disease_name
- ols_search/get_efo_terms→ols_search_efo_terms, ols_get_efo_term
- umls_search→umls_search_concepts, icd_search→icd_search_codes
- snomed_search→snomed_search_concepts
- HumanBase PPI→humanbase_ppi_analysis

precision-oncology (7→8/10):
- NvidiaNIM_alphafold2→alphafold_get_prediction (NvidiaNIM not in registry)

systems-biology (6→7/10):
- pc_search_pathways→PathwayCommons_search (2 occurrences)

pharmacogenomics (9→9.5/10):
- PharmGKB_get_clinical_annotations IS in registry (removed false
  "not available" note, fixed strikethrough in reference table)
rare-disease-diagnosis (6→7.5/10):
- NvidiaNIM_alphafold2→alphafold_get_prediction
- gnomAD_get_variant_frequencies→gnomad_get_variant (lowercase)

drug-research (8→8.5/10):
- FDA_OrangeBook_search→FDA_OrangeBook_search_drug

target-research (8→8.5/10):
- get_protein_metadata_by_pdb_id→RCSBData_get_entry
- GtoPdb (bare)→GtoPdb_search_ligands

Remaining 5 skills in batch audited clean (0 errors each):
cancer-variant-interpretation 9/10, gwas-snp-interpretation 8/10,
literature-deep-research 9/10, adverse-event-detection 9/10,
regulatory-genomics 8/10
network-pharmacology: clinical_trials_get_details→get_clinical_trial_descriptions
clinical-trial-matching: clinical_trials_get_details→get_clinical_trial_descriptions,
  clinical_trials_search→search_clinical_trials

Batch 5: 6/8 skills clean (antibody-engineering, drug-drug-interaction,
immunotherapy-response, protein-interactions, sequence-analysis,
epigenomics-chromatin all scored 10/10)
spatial-omics: clinical_trials_search→search_clinical_trials,
  HuBMAP_Dataverse_get_dataset→HuBMAP_get_dataset
precision-medicine-stratification: clinical_trials_search→search_clinical_trials
clinical-trial-design: FDA_OrangeBook_search_drugs→FDA_OrangeBook_search_drug,
  gnomAD_search_gene_variants→gnomad_search_variants,
  gnomAD_get_variant_details→gnomad_get_variant

Batch 6: 5/8 clean (drug-target-validation, variant-to-mechanism,
multiomic-disease-characterization, gene-enrichment, rnaseq-deseq2 all 10/10)
proteomics-data-retrieval: MassIVE/ProteomeXchange _Dataverse_ artifacts
spatial-transcriptomics: HuBMAP_Dataverse_get_dataset→HuBMAP_get_dataset
protein-structure-retrieval: pdbe_get_molecules→pdbe_get_entry_molecules,
  pdbe_get_binding_sites→PDBe_KB_get_ligand_sites,
  download_pdb_structure_file→RCSBData_get_entry
pharmacovigilance: PharmGKB_search_drug→PharmGKB_search_drugs (plural)

Batch 7-9 (30 skills audited): 23 clean, 7 with fixes applied.
Cumulative: 54 skills audited out of 100.
protein-modification-analysis: MassIVE_Dataverse→MassIVE_get_dataset
structural-proteomics: ProteomeXchange_Dataverse→ProteomeXchange_get_dataset
statistical-modeling: clinical_trials_search→search_clinical_trials
rare-disease-diagnosis: gnomAD_get_variant→gnomad_get_variant (2 remaining)

Full audit complete: 100 skills checked, all tool name issues resolved.
disease-research: add evidence grading (T1-T4), 5 synthesis questions
  for executive summary, cross-database concordance interpretation,
  conflicting data resolution table

target-research: add Target Validation Scorecard (0-18 scale, 6
  dimensions), GO/NO-GO interpretation rules (genetic evidence is
  strongest predictor, essential genes = poor targets)

gwas-drug-discovery: add GWAS signal strength assessment (gold/strong/
  moderate/weak), 4-step target prioritization decision tree (druggable?
  direction? effect size? precedent?), evidence integration scoring table
Computational vaccine design pipeline covering:
- Antigen selection with prioritization criteria (surface/conservation/essentiality)
- T-cell epitope prediction (MHC-I/II via IEDB NetMHCpan)
- B-cell epitope prediction (linear + conformational)
- Population coverage analysis with HLA supertype strategy
- Conservation analysis across pathogen strains
- Multi-epitope construct design with linker guidance
- Binding affinity interpretation table (IC50 thresholds)
- Population coverage targets (>90%=excellent, <50%=redesign)
- Evidence grading (T1-T4 for vaccine evidence levels)
cancer-genomics-tcga: mutation frequency interpretation (>10%=driver),
  survival analysis guidance (HR, p-value, cohort caveats), CNV
  interpretation (focal vs arm-level), T1-T4 evidence grading

drug-regulatory: approval pathway interpretation (505(b)(1) vs ANDA),
  Orange Book patent/exclusivity codes, DailyMed label section guide

metabolomics: metabolite ID confidence levels (L1-L4), pathway
  enrichment interpretation, biomarker discovery criteria

spatial-transcriptomics: spatial domain interpretation, cell-cell
  proximity significance (z-score thresholds), SVG interpretation
  (Moran's I thresholds)

microbiome-research: alpha diversity (Shannon thresholds), beta
  diversity (PERMANOVA R^2), taxonomic composition significance,
  functional profiling (potential vs activity)

sequence-retrieval: sequence quality tiers, accession type guidance
  (RefSeq vs GenBank routing), cross-database reconciliation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant