Add GTN training agent to ChatGXY by dannon · Pull Request #22097 · galaxyproject/galaxy

dannon · 2026-03-13T04:32:34Z

Summary

Adds a GTN training agent so ChatGXY can answer training questions using real content from the Galaxy Training Network rather than generic LLM responses.

The agent is backed by a SQLite FTS5 database built from the GTN repository (~400 tutorials and FAQs). The database is hosted on depot.galaxyproject.org and downloaded automatically on first use — nothing is bundled in the repo. A build_database.py script is included to rebuild it from a fresh GTN clone when content changes.

When a user asks something like "How do I do RNA-seq analysis?", the router recognizes it as a training question and hands off to the GTN agent. The agent searches the database, reads the 1-2 most relevant tutorials, and synthesizes a step-by-step answer with links back to the full tutorials on the GTN site. If the database can't be fetched or is corrupt, the agent disables itself gracefully rather than crashing.

Draft status

Still working on two things:

Token usage — certain queries cause the agent to fetch too much tutorial content, inflating context. I've tightened defaults and prompt guidance but want to benchmark more before finalizing.
Database delivery — the download-on-first-use mechanism works but I want to flesh out the versioning and update story (dated files on depot with symlinks, automated rebuild pipeline, etc.).

SQLite FTS5 database for Galaxy Training Network tutorials and FAQs.

Finds relevant Galaxy Training Network tutorials using SQLite FTS5 search.

- Add import re at module level - Remove unused SearchResult import - Use GTNSearchDB | None type annotation - Replace bare except Exception with specific FileNotFoundError/OSError - Fix unnecessary f-string prefix

The router now routes analysis workflow and tutorial questions to the GTN training agent instead of answering them directly. This lets the GTN agent use its tutorial database to find relevant training materials.

Updated the ChatGXY component to show response metadata on the right side of the footer: which agent handled the query, the model used, and token count. Also fixed the router prompt to use natural language instead of explicit function names which was causing the model to output the function name as text instead of calling it.

Added helper methods to BaseGalaxyAgent (_build_metadata and _build_response) that ensure consistent metadata structure across all agent responses. Every response now includes model name, method, and token usage when available. Also formalized the handoff pattern in the router with _serialize_handoff(), and added TokenUsage and HandoffInfo schema models. Agent-specific data is now available both at the top level (backwards compat) and namespaced under agent_data for structured access.

Two fixes for the GTN training agent: 1. Suggestions now link to specific tutorials instead of the generic GTN homepage. When parsing simple text responses, we look up mentioned tutorial names in the GTN database to get their actual URLs. 2. Added normalize_llm_text() to handle literal \n strings in LLM output, which was causing wonky formatting in the UI.

…trip verbosity Context managers for all SQLite connections in build_database.py to prevent leaks on exceptions. Narrowed bare except Exception to specific types. Switched FTS5 tables to content=/content_rowid= so rowid alignment is guaranteed by SQLite rather than assumed. Added re.escape for regex safety in extract_section, deduplicated version into DB_VERSION constant. GTNSearchDB now downloads the database from a configurable URL when the local file is missing, so the 25MB .db no longer needs to live in git. Removed it from tracking and added .gitignore entry. Consolidated dead FileNotFoundError/OSError branches in gtn_training.py, removed redundant safety checks, replaced IMPORTANT directive docstrings with concise descriptions throughout.

The YAML parser was treating quoted empty strings (e.g. zenodo_link: "") as list headers because quote stripping happened before the empty-value check. Now tracks whether quotes were present so `key: ""` produces an empty string while `key:` followed by `- items` still creates a list. Also coerces hands_on to bool (some tutorials use "external") and widens per-row exception handling to catch ValueError/TypeError so a single bad tutorial doesn't kill the whole build.

…content fetches SearchResult.to_dict() now returns only the 7 fields the LLM needs (title, topic, tutorial, url, difficulty, time_estimation, snippet) instead of all 12. Default search limit drops from 10 to 5, tutorial content cap from 2000 to 1500 chars. The system prompt now explicitly tells the agent to fetch content for only 1-2 tutorials instead of all search results.

The FTS5 snippet() function wraps matches in <mark> tags which would show as literal text in the chat UI. Strip them in to_dict() at the serialization boundary. Also remove GTNSearchRequest which was defined but never used.

bgruening · 2026-03-15T08:57:04Z

lib/galaxy/agents/gtn/search.py

+    def __init__(self, db_path: Optional[str] = None, download_url: Optional[str] = None):
+        if db_path is None:
+            current_dir = Path(__file__).parent
+            self.db_path = current_dir / "data" / "gtn_search.db"


This should be configurable, so that admins can put it in a mutable-data directory.

bgruening · 2026-03-15T08:58:21Z

lib/galaxy/agents/gtn/search.py

+
+    def _get_connection(self) -> sqlite3.Connection:
+        """Get a database connection."""
+        conn = sqlite3.connect(str(self.db_path))


Should we set here an isolation_level ?

dannon added 22 commits February 26, 2026 14:59

Add GTN search database infrastructure

645840b

SQLite FTS5 database for Galaxy Training Network tutorials and FAQs.

Add GTN training agent

78545c9

Finds relevant Galaxy Training Network tutorials using SQLite FTS5 search.

Tighten exception handling in GTN training agent

b825838

Use sqlite3.Error in GTN search module

4f9a293

Backend formatting/cleanup (GTN portion)

3e0b04c

Cleaning up mypy (GTN portion)

648ef4b

More linting fixes (GTN portion)

add1359

Fix Python 3.9 compatibility in GTN agent

48432ec

Add explicit agent_type to GTN training agent

de87bd1

Improve type safety and exception handling in GTN agent

33a27f5

- Add import re at module level - Remove unused SearchResult import - Use GTNSearchDB | None type annotation - Replace bare except Exception with specific FileNotFoundError/OSError - Fix unnecessary f-string prefix

Register GTN training agent

c8e319b

Fix lint and mypy issues in GTN agent

66b4f4c

Add GTN training handoff to router agent

c6822a1

The router now routes analysis workflow and tutorial questions to the GTN training agent instead of answering them directly. This lets the GTN agent use its tutorial database to find relevant training materials.

Point GTN database download URL to depot chatgxy path

c150f38

Catch RuntimeError from corrupt GTN database during agent init

4646320

bgruening reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GTN training agent to ChatGXY#22097

Add GTN training agent to ChatGXY#22097
dannon wants to merge 22 commits intogalaxyproject:devfrom
dannon:agent-based-ai-gtn

dannon commented Mar 13, 2026

Uh oh!

bgruening Mar 15, 2026

Uh oh!

bgruening Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dannon commented Mar 13, 2026

Summary

Draft status

Uh oh!

bgruening Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

bgruening Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants