Skip to content

Commit 8ecf0d5

Browse files
authored
Add 22 new tools: PharmacoDB, SYNERGxDB, CancerPrognosis, NEB Tm, Add… (#92)
* Add 22 new tools: PharmacoDB, SYNERGxDB, CancerPrognosis, NEB Tm, Addgene New tool groups: - PharmacoDB (6): search/get compound, cell line, experiments, datasets, biomarker associations via GraphQL API - SYNERGxDB (7): search combos, get matrix/drug/stats, list drugs/cell lines/datasets - CancerPrognosis (4): survival data, gene expression, study search/summary via cBioPortal - NEB Tm Calculator (2): calculate Tm/Ta for primers, list NEB polymerases - Addgene (3): search/get plasmids, search depositors (requires ADDGENE_API_KEY) All tools: 100% pass rate, 100% schema valid across 24 integration tests * Add 11 tools: ZINC20, SwissTargetPrediction, IDT OligoAnalyzer, DrugSynergy extensions Workflow gaps filled: - ZINC20 (5): search/get purchasable compounds, SMILES similarity, Lipinski property filter (step before IBM RXN synthesis and PharmacoDB drug sensitivity) - SwissTargetPrediction (2): predict protein targets from SMILES, list organisms (step before PharmacoDB; correctly identifies COX1/COX2 for aspirin) - IDT OligoAnalyzer (2): comprehensive oligo Tm/GC/MW/extinction, self-dimer risk assessment (step alongside NEB_Tm for primer QC before ordering) - DrugSynergy extensions (2): Loewe additivity index, Chou-Talalay combination index (completes synergy toolkit: Bliss/HSA/ZIP/Loewe/CI now all available) All 11 tools: 20/20 tests pass, 0 schema invalid Notes: REBASE down (NEB site returning errors), NCI DTP no public REST API, SynergyFinder is R Shiny app (no REST API) * Fix SwissTargetPrediction User-Agent: server rejects bot-style UA strings * Add 17 tools: IntOGen, Mcule, PDC, MEME Suite - IntOGen (4 tools): cancer driver gene identification - IntOGen_get_drivers, IntOGen_get_gene_info, IntOGen_list_cohorts, IntOGen_list_cancer_types (HTML scraping, embedded JSON parsing) - Mcule (4 tools): compound purchasing and lookup - Mcule_lookup_compound, Mcule_get_compound, Mcule_list_databases, Mcule_get_database (public endpoints + optional MCULE_API_KEY) - PDC (5 tools): NCI Proteomics Data Commons - PDC_search_studies, PDC_get_gene_protein, PDC_list_programs, PDC_get_study_summary, PDC_get_clinical_data (GraphQL API) - MEME Suite (4 tools): motif discovery and scanning - MEME_fimo_scan, MEME_discover_motifs, MEME_tomtom_compare, MEME_list_databases (multipart form POST, XML status polling) All 23 tests pass (100%), 0 schema invalid * Add 14 tools: CellMarker, ProteomicsDB, SwissADME - CellMarker 2.0 (4 tools): cell type marker database for scRNA-seq annotation - search_by_gene, search_by_cell_type, list_cell_types, search_cancer_markers - HTML scraping (3,000+ cell types, 30,000+ marker genes) - ProteomicsDB (4 tools): MS-based human proteome expression - get_protein_expression, search_proteins, get_expression_summary, list_tissues - SAP XSEngine + OData v2 APIs; TP53 expressed in 340 sources - SwissADME (2 tools): ADMET/drug-likeness prediction from SMILES - calculate_adme (49 properties: lipophilicity, solubility, PK, drug-likeness) - check_druglikeness (Lipinski/Ghose/Veber/Egan/Muegge filters + PAINS) - HTML form POST → CSV result parsing All 17 tests pass (100%), 0 schema invalid * Add 4 MetaboAnalyst tools: pathway enrichment and metabolite ID mapping Uses KEGG REST API for compound resolution and pathway-metabolite mappings, with local scipy-based hypergeometric enrichment + BH FDR correction. Hybrid approach due to MetaboAnalyst REST API returning HTTP 500 errors. Tools added: - MetaboAnalyst_pathway_enrichment: ORA against KEGG metabolic pathways - MetaboAnalyst_name_to_id: Map metabolite names to KEGG/HMDB/PubChem IDs - MetaboAnalyst_get_pathway_library: Browse KEGG pathways by species - MetaboAnalyst_biomarker_enrichment: Enrichment against 20 curated metabolite sets 7/7 tests passing (100%) * Add broken_apis tracking folder for confirmed non-functional APIs Establishes a workflow for documenting APIs that fail after multiple investigation attempts, so future agents skip them and use workarounds. Files: - data/broken_apis/README.md: folder purpose, retry policy, entry format - data/broken_apis/metaboanalyst_rest.json: first entry — MetaboAnalyst public REST API (rest.xialab.ca/api/mapcompounds) broken since Dec 2024. Root cause: servlet is broken; internal R API requires binary .rds serialization not accessible from Python. Workaround: KEGG + scipy. * Fix null/weak return_schema in 11 tool configs (49 tools) Audit identified tools where return_schema was null or used loose {"type":"object"} with no properties, causing schema validation to be silently skipped in test_new_tools.py. Fixed files and affected tools: - ncbi_nucleotide_tools.json: 3 tools (search, fetch, get_sequence) - ncbi_sra_tools.json: 4 tools (search, run_info, download_urls, biosample) - nvidia_nim_tools.json: 16 tools (structure prediction, ESMFold, imaging) - biogrid_tools.json: 3 tools (additionalProperties pattern for dict-of-objects) - pharmgkb_tools.json: 4 tools (fixed field type mismatches) - chipatlas_tools.json: 4 tools (experiment, peak, enrichment, liftover) - biomodels_tools.json: 2 tools (list_files, search_parameters) - pubchem_tools.json: 2 assay tools (assay summary, assay data) - cellxgene_census_tools.json: 2 tools (obs, var queries) - emdb_tools.json: 1 tool (search_structures array fix) - mcp_auto_loader_esm.json: 1 tool (additionalProperties) All 11 files: 100% tests pass, Schema Valid count now non-zero. * Bump version to 1.0.22
1 parent 1304bf8 commit 8ecf0d5

File tree

118 files changed

+15901
-71
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+15901
-71
lines changed

.env.template

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# API Keys for ToolUniverse
22
# Copy this file to .env and fill in your actual API keys
33

4+
ADDGENE_API_KEY=your_api_key_here
5+
46
BOLTZ_MCP_SERVER_HOST=your_api_key_here
57

68
BRENDA_EMAIL=your_api_key_here
@@ -9,6 +11,8 @@ BRENDA_PASSWORD=your_api_key_here
911

1012
DISGENET_API_KEY=your_api_key_here
1113

14+
ESM_MCP_SERVER_HOST=your_api_key_here
15+
1216
EXPERT_FEEDBACK_MCP_SERVER_URL=your_api_key_here
1317

1418
NVIDIA_API_KEY=your_api_key_here

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "tooluniverse"
7-
version = "1.0.21"
7+
version = "1.0.22"
88
description = "A comprehensive collection of scientific tools for Agentic AI, offering integration with the ToolUniverse SDK and MCP Server to support advanced scientific workflows."
99
authors = [
1010
{ name = "Shanghua Gao", email = "shanghuagao@gmail.com" }

src/tooluniverse/_lazy_registry_static.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
STATIC_LAZY_REGISTRY = {
99
"ADAStandardsTool": "clinical_society_tools",
1010
"ADMETAITool": "admetai_tool",
11+
"AddgeneTool": "addgene_tool",
1112
"AHAACCGuidelineTool": "clinical_society_tools",
1213
"AgenticTool": "agentic_tool",
1314
"AllenBrainTool": "allen_brain_tool",
@@ -37,9 +38,11 @@
3738
"Boltz2DockingTool": "boltz_tool",
3839
"CADDTool": "cadd_tool",
3940
"CATHTool": "cath_tool",
41+
"CancerPrognosisTool": "cancer_prognosis_tool",
4042
"CBioPortalRESTTool": "cbioportal_tool",
4143
"CDCRESTTool": "cdc_tool",
4244
"CELLxGENECensusTool": "cellxgene_census_tool",
45+
"CellMarkerTool": "cellmarker_tool",
4346
"CellPaintingTool": "cellpainting_tool",
4447
"CIViCTool": "civic_tool",
4548
"CMAGuidelinesTool": "unified_guideline_tools",
@@ -243,6 +246,7 @@
243246
"HealthDisparitiesTool": "health_disparities_tool",
244247
"HumanBaseTool": "humanbase_tool",
245248
"IBMRXNTool": "ibmrxn_tool",
249+
"IDTTool": "idt_tool",
246250
"ICD10Tool": "icd_tool",
247251
"ICDTool": "icd_tool",
248252
"IEDBTool": "iedb_tool",
@@ -252,6 +256,7 @@
252256
"ITISTool": "itis_tool",
253257
"IdentifiersOrgTool": "identifiers_org_tool",
254258
"IntActRESTTool": "intact_tool",
259+
"IntOGenTool": "intogen_tool",
255260
"InterProDomainArchTool": "interpro_domain_arch_tool",
256261
"InterProEntryTool": "interpro_entry_tool",
257262
"InterProExtTool": "interpro_ext_tool",
@@ -272,8 +277,10 @@
272277
"ListTools": "tool_discovery_tools",
273278
"ListToolsTool": "tool_discovery_tools",
274279
"MCPAutoLoaderTool": "mcp_client_tool",
280+
"MculeTool": "mcule_tool",
275281
"MCPClientTool": "mcp_client_tool",
276282
"MCPProxyTool": "mcp_client_tool",
283+
"MEMETool": "meme_tool",
277284
"MCPServerDiscovery": "mcp_client_tool",
278285
"MGnifyAnalysesTool": "mgnify_tool",
279286
"MGnifyExpandedTool": "mgnify_expanded_tool",
@@ -282,6 +289,7 @@
282289
"MarkItDownTool": "markitdown_tool",
283290
"MeSHTool": "mesh_tool",
284291
"MedlinePlusRESTTool": "medlineplus_tool",
292+
"MetaboAnalystTool": "metaboanalyst_tool",
285293
"MetaCycTool": "metacyc_tool",
286294
"MetaboLightsRESTTool": "metabolights_tool",
287295
"MetabolomicsWorkbenchTool": "metabolomics_workbench_tool",
@@ -305,6 +313,7 @@
305313
"NCCNGuidelineTool": "clinical_society_tools",
306314
"NCIThesaurusTool": "nci_thesaurus_tool",
307315
"NDExTool": "ndex_tool",
316+
"NEBTmTool": "neb_tm_tool",
308317
"NHANESTool": "nhanes_tool",
309318
"NICEGuidelineFullTextTool": "unified_guideline_tools",
310319
"NICEWebScrapingTool": "unified_guideline_tools",
@@ -357,13 +366,16 @@
357366
"PaleobiologyRESTTool": "paleobiology_tool",
358367
"PathwayCommonsTool": "pathway_commons_tool",
359368
"PfamTool": "pfam_tool",
369+
"PharmacoDBTool": "pharmacodb_tool",
370+
"PDCTool": "pdc_tool",
360371
"PharmGKBTool": "pharmgkb_tool",
361372
"PharosTool": "pharos_tool",
362373
"PlantReactomeTool": "plant_reactome_tool",
363374
"PomBaseTool": "pombase_tool",
364375
"ProtacDBTool": "protacdb_tool",
365376
"ProteinStructure3DTool": "protein_structure_3d_tool",
366377
"ProteinsAPIRESTTool": "proteins_api_tool",
378+
"ProteomicsDBTool": "proteomicsdb_tool",
367379
"ProteinsPlusRESTTool": "proteinsplus_tool",
368380
"ProteomeXchangeTool": "proteomexchange_tool",
369381
"PubChemBioAssayTool": "pubchem_bioassay_tool",
@@ -414,6 +426,9 @@
414426
"SurvivalTool": "survival_tool",
415427
"SwissDockTool": "swissdock_tool",
416428
"SwissModelTool": "swissmodel_tool",
429+
"SwissADMETool": "swissadme_tool",
430+
"SwissTargetTool": "swiss_target_tool",
431+
"SYNERGxDBTool": "synergxdb_tool",
417432
"SynBioHubTool": "synbiohub_tool",
418433
"TIMERTool": "timer_tool",
419434
"TRIPDatabaseTool": "unified_guideline_tools",
@@ -463,6 +478,7 @@
463478
"XMLDatasetTool": "xml_tool",
464479
"XMLTool": "xml_tool",
465480
"ZenodoRESTTool": "zenodo_tool",
481+
"ZincTool": "zinc_tool",
466482
"dbSNPGetFrequencies": "dbsnp_tool",
467483
"dbSNPGetVariantByRsID": "dbsnp_tool",
468484
"dbSNPRESTTool": "dbsnp_tool",

src/tooluniverse/addgene_tool.py

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
"""
2+
Addgene Developers API Tool
3+
4+
Provides programmatic access to Addgene's plasmid catalog via the official
5+
Addgene Developers API (https://developers.addgene.org/).
6+
7+
Supports:
8+
- Searching plasmids by name, gene, species, vector type, purpose, etc.
9+
- Retrieving detailed plasmid information (cloning, inserts, resistance, growth)
10+
- Browsing depositors (PIs) via plasmid catalog queries
11+
12+
API Base: https://api.developers.addgene.org
13+
Authentication: Token-based via ADDGENE_API_KEY environment variable.
14+
Register at https://developers.addgene.org/ to obtain a free token.
15+
"""
16+
17+
import os
18+
import requests
19+
from typing import Dict, Any, Optional, List
20+
from .base_tool import BaseTool
21+
from .tool_registry import register_tool
22+
23+
ADDGENE_API_URL = "https://api.developers.addgene.org"
24+
25+
26+
@register_tool("AddgeneTool")
27+
class AddgeneTool(BaseTool):
28+
"""
29+
Tool for querying the Addgene plasmid repository.
30+
31+
Addgene is a nonprofit global plasmid repository that archives and
32+
distributes plasmids for the scientific community. This tool provides
33+
access to:
34+
- Plasmid search (by name, gene, species, vector type, purpose)
35+
- Plasmid detail retrieval (cloning info, inserts, resistance markers)
36+
- Depositor/PI search
37+
38+
Requires API token via ADDGENE_API_KEY environment variable.
39+
Register at https://developers.addgene.org/ for access.
40+
"""
41+
42+
def __init__(self, tool_config: Dict[str, Any]):
43+
super().__init__(tool_config)
44+
self.timeout = tool_config.get("timeout", 30)
45+
self.parameter = tool_config.get("parameter", {})
46+
self.required = self.parameter.get("required", [])
47+
self.api_key = os.environ.get("ADDGENE_API_KEY", "")
48+
49+
def _get_headers(self):
50+
"""Get request headers with auth token."""
51+
headers = {"Accept": "application/json"}
52+
if self.api_key:
53+
headers["Authorization"] = "Token " + self.api_key
54+
return headers
55+
56+
def run(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
57+
operation = arguments.get("operation")
58+
if not operation:
59+
return {"status": "error", "error": "Missing required parameter: operation"}
60+
61+
if not self.api_key:
62+
return {
63+
"status": "error",
64+
"error": (
65+
"Addgene API key required. Set ADDGENE_API_KEY environment variable. "
66+
"Register at https://developers.addgene.org/ for access."
67+
),
68+
}
69+
70+
handlers = {
71+
"search_plasmids": self._search_plasmids,
72+
"get_plasmid": self._get_plasmid,
73+
"search_depositors": self._search_depositors,
74+
}
75+
76+
handler = handlers.get(operation)
77+
if not handler:
78+
return {
79+
"status": "error",
80+
"error": "Unknown operation: {}. Available: {}".format(
81+
operation, ", ".join(handlers.keys())
82+
),
83+
}
84+
85+
try:
86+
return handler(arguments)
87+
except requests.exceptions.Timeout:
88+
return {"status": "error", "error": "Addgene API request timed out"}
89+
except requests.exceptions.ConnectionError:
90+
return {"status": "error", "error": "Failed to connect to Addgene API"}
91+
except Exception as e:
92+
return {"status": "error", "error": "Operation failed: {}".format(str(e))}
93+
94+
def _search_plasmids(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
95+
"""
96+
Search Addgene plasmid catalog.
97+
98+
Supports filtering by name, genes, species, vector_types, purpose,
99+
experimental_use, expression, and more.
100+
"""
101+
query = arguments.get("query")
102+
organism = arguments.get("organism")
103+
vector_type = arguments.get("vector_type")
104+
limit = min(int(arguments.get("limit", 10)), 100)
105+
106+
params = {"page_size": limit}
107+
108+
if query:
109+
params["name"] = query
110+
if organism:
111+
params["species"] = organism
112+
if vector_type:
113+
params["vector_types"] = vector_type
114+
115+
response = requests.get(
116+
ADDGENE_API_URL + "/catalog/plasmid/",
117+
params=params,
118+
headers=self._get_headers(),
119+
timeout=self.timeout,
120+
)
121+
122+
if response.status_code == 401:
123+
return {
124+
"status": "error",
125+
"error": "Authentication failed. Check your ADDGENE_API_KEY.",
126+
}
127+
128+
response.raise_for_status()
129+
data = response.json()
130+
131+
results = data.get("results", [])
132+
return {
133+
"status": "success",
134+
"data": {
135+
"plasmids": results,
136+
"total_count": data.get("count", len(results)),
137+
"next_page": data.get("next"),
138+
},
139+
"metadata": {
140+
"source": "Addgene",
141+
"query": query,
142+
"organism": organism,
143+
"vector_type": vector_type,
144+
},
145+
}
146+
147+
def _get_plasmid(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
148+
"""
149+
Get detailed information about a specific plasmid.
150+
151+
Returns full plasmid record including cloning info, inserts,
152+
resistance markers, growth conditions, article, depositor comments.
153+
"""
154+
plasmid_id = arguments.get("plasmid_id")
155+
if not plasmid_id:
156+
return {
157+
"status": "error",
158+
"error": "Missing required parameter: plasmid_id",
159+
}
160+
161+
plasmid_id = str(plasmid_id).strip()
162+
163+
response = requests.get(
164+
"{}/catalog/plasmid/{}/".format(ADDGENE_API_URL, plasmid_id),
165+
headers=self._get_headers(),
166+
timeout=self.timeout,
167+
)
168+
169+
if response.status_code == 401:
170+
return {
171+
"status": "error",
172+
"error": "Authentication failed. Check your ADDGENE_API_KEY.",
173+
}
174+
if response.status_code == 404:
175+
return {
176+
"status": "error",
177+
"error": "Plasmid ID {} not found in Addgene".format(plasmid_id),
178+
}
179+
180+
response.raise_for_status()
181+
plasmid = response.json()
182+
183+
return {
184+
"status": "success",
185+
"data": plasmid,
186+
"metadata": {
187+
"source": "Addgene",
188+
"plasmid_id": plasmid_id,
189+
"url": "https://www.addgene.org/{}/".format(plasmid_id),
190+
},
191+
}
192+
193+
def _search_depositors(self, arguments: Dict[str, Any]) -> Dict[str, Any]:
194+
"""
195+
Search for depositors (PIs) by querying plasmids and extracting
196+
unique depositor information.
197+
198+
The Addgene API does not have a dedicated depositor search endpoint,
199+
so this searches plasmids filtered by PI name or institution, then
200+
extracts and deduplicates depositor info from results.
201+
"""
202+
name = arguments.get("name")
203+
institution = arguments.get("institution")
204+
205+
if not name and not institution:
206+
return {
207+
"status": "error",
208+
"error": "At least one of 'name' or 'institution' is required.",
209+
}
210+
211+
params = {"page_size": 50}
212+
if name:
213+
params["pis"] = name
214+
if institution:
215+
# Institution is not a direct filter in the API;
216+
# we search by article_authors which may contain institution info
217+
params["article_authors"] = institution
218+
219+
response = requests.get(
220+
ADDGENE_API_URL + "/catalog/plasmid/",
221+
params=params,
222+
headers=self._get_headers(),
223+
timeout=self.timeout,
224+
)
225+
226+
if response.status_code == 401:
227+
return {
228+
"status": "error",
229+
"error": "Authentication failed. Check your ADDGENE_API_KEY.",
230+
}
231+
232+
response.raise_for_status()
233+
data = response.json()
234+
235+
results = data.get("results", [])
236+
237+
# Extract unique depositors from plasmid results
238+
depositors = {}
239+
for plasmid in results:
240+
depositor_list = plasmid.get("depositor", [])
241+
for dep in depositor_list:
242+
dep_str = str(dep)
243+
if dep_str not in depositors:
244+
depositors[dep_str] = {
245+
"name": dep_str,
246+
"plasmid_count": 0,
247+
"example_plasmids": [],
248+
}
249+
depositors[dep_str]["plasmid_count"] += 1
250+
if len(depositors[dep_str]["example_plasmids"]) < 3:
251+
depositors[dep_str]["example_plasmids"].append(
252+
{"id": plasmid.get("id"), "name": plasmid.get("name")}
253+
)
254+
255+
depositor_list = sorted(
256+
depositors.values(), key=lambda x: x["plasmid_count"], reverse=True
257+
)
258+
259+
return {
260+
"status": "success",
261+
"data": {
262+
"depositors": depositor_list,
263+
"total_depositors": len(depositor_list),
264+
"total_plasmids_searched": data.get("count", len(results)),
265+
},
266+
"metadata": {
267+
"source": "Addgene",
268+
"name_query": name,
269+
"institution_query": institution,
270+
},
271+
}

0 commit comments

Comments
 (0)