Unified access to neuroscience and scientific datasets
Full Documentation · pip install scitex-dataset
| # | Problem | Solution |
|---|---|---|
| 1 | Public dataset repositories balkanized -- OpenNeuro (BIDS) + DANDI (NWB) + PhysioNet (WFDB) + Zenodo (generic) + GEO / ChEMBL / ClinicalTrials — different APIs, auth, download tools | Unified fetcher -- stx.dataset.neuroscience.openneuro.fetch_all_datasets() same call shape across all; local FTS5 search across metadata |
| 2 | "Download this BIDS dataset" means reading DataLad docs first -- the barrier is tooling, not knowledge | One-line fetch -- no DataLad setup; the module handles auth, resumption, checksums transparently |
| Repository | Description | Data Types |
|---|---|---|
| OpenNeuro | Open platform for sharing neuroimaging data | MRI, EEG, MEG, iEEG, PET |
| DANDI | BRAIN Initiative data archive | Electrophysiology, Ophys |
| PhysioNet | Physiological signal databases | ECG, EEG, clinical data |
| Zenodo | General scientific data repository (CERN) | Any research data |
Table 1. Supported data repositories. Each source is queried via its public API; no authentication required for metadata access.
Requires Python >= 3.10.
pip install scitex-datasetMCP support:
pip install scitex-dataset[mcp]
from scitex_dataset import fetch_all_datasets, format_dataset
# Fetch datasets from OpenNeuro
datasets = fetch_all_datasets(max_datasets=10)
# Format for analysis
for ds in datasets:
formatted = format_dataset(ds)
print(f"{formatted['id']}: {formatted['name']} ({formatted['n_subjects']} subjects)")Python API
from scitex_dataset import fetch_all_datasets, format_dataset, search_datasets, sort_datasets
from scitex_dataset import neuroscience, database
# Fetch from specific sources
datasets = fetch_all_datasets(max_datasets=100) # OpenNeuro
dandi_ds = neuroscience.dandi.fetch_all_datasets(max_datasets=50) # DANDI
phys_ds = neuroscience.physionet.fetch_all_datasets() # PhysioNet
# Search and filter
eeg_datasets = search_datasets(datasets, modality="eeg", min_subjects=20)
popular = sort_datasets(datasets, by="downloads", descending=True)
# Local database for fast full-text search
database.build() # index all sources
results = database.search("alzheimer EEG", min_subjects=20)CLI Commands
scitex-dataset --help-recursive # Show all commands
# Fetch from repositories
scitex-dataset openneuro -n 100 -o datasets.json -v
scitex-dataset dandi -n 50 -o dandi.json -v
scitex-dataset physionet -n 50 -v
scitex-dataset zenodo -q "neuroscience" -n 20
# Local database
scitex-dataset db build # index all sources
scitex-dataset db search "epilepsy EEG" # full-text search
scitex-dataset db stats # show statistics
# Introspection
scitex-dataset list-python-apis -v # list Python API tree
scitex-dataset mcp list-tools -v # list MCP toolsMCP Server
AI agents can discover and query neuroscience datasets autonomously.
| Tool | Description |
|---|---|
dataset_openneuro_fetch |
Fetch datasets from OpenNeuro |
dataset_dandi_fetch |
Fetch datasets from DANDI Archive |
dataset_physionet_fetch |
Fetch datasets from PhysioNet |
dataset_zenodo_fetch |
Fetch datasets from Zenodo |
dataset_search |
Filter datasets by modality, subjects, etc. |
dataset_list_sources |
List available data repositories |
dataset_db_build |
Build local search database |
dataset_db_search |
Full-text search across all sources |
dataset_db_stats |
Database statistics |
Table 2. Nine MCP tools available for AI-assisted dataset discovery. All tools accept JSON parameters and return JSON results.
scitex-dataset mcp startSkills
Skills provide workflow-oriented guides that AI agents query to discover capabilities and usage patterns.
scitex-dataset skills list # List available skill pages
scitex-dataset skills get SKILL # Show main skill page
scitex-dev skills export --package scitex-dataset # Export to Claude Code| Skill | Content |
|---|---|
quick-start |
Basic usage |
data-sources |
OpenNeuro, DANDI, PhysioNet |
cli-reference |
CLI commands |
mcp-tools |
MCP tools for AI agents |
scitex-dataset is part of SciTeX. Install via
the umbrella with pip install scitex[dataset] to use as
scitex.dataset (Python) or scitex dataset ... (CLI).
import scitex
from scitex_dataset import fetch_all_datasets, format_dataset
@scitex.session
def main(logger=scitex.INJECTED):
datasets = fetch_all_datasets(max_datasets=100, logger=logger)
formatted = [format_dataset(ds) for ds in datasets]
scitex.io.save(formatted, "openneuro_datasets.json")
return 0The SciTeX ecosystem follows the Four Freedoms for Research, inspired by the Free Software Definition:
Four Freedoms for Research
- The freedom to run your research anywhere -- your machine, your terms.
- The freedom to study how every step works -- from raw data to final manuscript.
- The freedom to redistribute your workflows, not just your papers.
- The freedom to modify any module and share improvements with the community.
AGPL-3.0 -- because we believe research infrastructure deserves the same freedoms as the software it runs on.