Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
f9973bf
removing not necessary function
bruAristimunha Oct 22, 2025
f3dc559
updating
bruAristimunha Oct 24, 2025
58cf377
updating the fetch dataset
bruAristimunha Oct 24, 2025
5d757d4
cloning
bruAristimunha Oct 24, 2025
37ebae0
updating
bruAristimunha Oct 25, 2025
6a72510
iterating
bruAristimunha Oct 25, 2025
705df51
first step, fetching automatically
bruAristimunha Nov 7, 2025
feef992
fetch openneuro
bruAristimunha Nov 7, 2025
5728ac7
including the 1-fetch-openneuro
bruAristimunha Nov 7, 2025
53a85fe
updating the fetch
bruAristimunha Nov 7, 2025
e6b6a93
updating the fetch
bruAristimunha Nov 7, 2025
26b360c
saving the .json file and updating the fetch
bruAristimunha Nov 7, 2025
696a22b
updating the fetch to add
bruAristimunha Nov 7, 2025
f35773e
chore: update OpenNeuro & NEMAR dataset listings
github-actions[bot] Nov 7, 2025
69da4a1
updating the tests
bruAristimunha Nov 7, 2025
75f6b82
Merge branch 'diggestion-v2' of https://github.com/sccn/EEGDash into …
bruAristimunha Nov 7, 2025
7ae29d6
including and updating
bruAristimunha Nov 7, 2025
add6f9e
chore: update OpenNeuro & NEMAR dataset listings and filtered to_dige…
github-actions[bot] Nov 7, 2025
1a46825
updating the scripts
bruAristimunha Nov 11, 2025
a989519
chore: update OpenNeuro & NEMAR dataset listings and filtered to_dige…
github-actions[bot] Nov 11, 2025
a67911e
renaming for the correct entities
bruAristimunha Nov 11, 2025
e56da2c
done with openneuro
bruAristimunha Nov 11, 2025
81b92a8
scidb for later
bruAristimunha Nov 11, 2025
5e984d1
updating the fetch for zenodo
bruAristimunha Nov 11, 2025
2ab4963
figure share
bruAristimunha Nov 11, 2025
4f26ea2
updating the json
bruAristimunha Nov 11, 2025
fa2eb74
updating the clone please
bruAristimunha Nov 11, 2025
851096f
removing .json files
bruAristimunha Nov 11, 2025
44c7e3f
testing diggestion
bruAristimunha Nov 11, 2025
fa0a05b
1, to allow downloading
bruAristimunha Nov 11, 2025
4a40623
using more constant
bruAristimunha Nov 11, 2025
d292a7e
updating the documentation
bruAristimunha Nov 11, 2025
39e7f5c
updating
bruAristimunha Nov 11, 2025
f350a34
test correctness
bruAristimunha Nov 11, 2025
5b1e541
remove the json to move to another place
bruAristimunha Nov 11, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 130 additions & 0 deletions .github/workflows/1-fetch-openneuro-datasets-nemar.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
name: Fetch OpenNeuro & NEMAR Datasets

on:
pull_request:
branches:
- '**'
# schedule:
# # Run weekly on Monday at 00:00 UTC
# - cron: '0 0 * * 1'
workflow_dispatch: # Allow manual triggering

jobs:
fetch-datasets:
runs-on: ubuntu-latest
permissions:
contents: write

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
token: ${{ secrets.GITHUB_TOKEN }}
ref: ${{ github.head_ref }}
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'
cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install gql[requests] requests
pip install -e .

- name: Fetch OpenNeuro datasets
run: |
python scripts/ingestions/1_fetch_openneuro_datasets.py \
--page-size 100 \
--output consolidated/openneuro_datasets.json

- name: Fetch NEMAR GitHub repositories
run: |
python scripts/ingestions/1_fetch_github_organization.py \
--organization nemardatasets \
--output consolidated/nemardatasets_repos.json

- name: Verify OpenNeuro output
run: |
if [ -f consolidated/openneuro_datasets.json ]; then
echo "✓ OpenNeuro dataset file created successfully"
python -c "import json; data = json.load(open('consolidated/openneuro_datasets.json')); print(f'Total entries: {len(data)}'); modalities = set(d['modality'] for d in data); print(f'Modalities: {sorted(modalities)}')"
else
echo "✗ OpenNeuro dataset file not created"
exit 1
fi

- name: Verify NEMAR output
run: |
if [ -f consolidated/nemardatasets_repos.json ]; then
echo "✓ NEMAR repositories file created successfully"
python -c "import json; data = json.load(open('consolidated/nemardatasets_repos.json')); print(f'Total repositories: {len(data)}'); topics = set(); [topics.update(d.get('topics', [])) for d in data]; print(f'Topics: {sorted(topics) if topics else \"None\"}')"
else
echo "✗ NEMAR repositories file not created"
exit 1
fi

- name: Filter new OpenNeuro datasets
run: |
python scripts/ingestions/2_filter_new_datasets.py \
consolidated/openneuro_datasets.json

- name: Filter new NEMAR datasets
run: |
python scripts/ingestions/2_filter_new_datasets.py \
consolidated/nemardatasets_repos.json

- name: Verify filtered outputs
run: |
echo "📊 Filtering Results:"
echo ""
if [ -f consolidated/to_digest_openneuro_datasets.json ]; then
echo "✓ OpenNeuro filtered datasets created"
python -c "import json; data = json.load(open('consolidated/to_digest_openneuro_datasets.json')); print(f' Datasets to digest: {len(data)}')"
else
echo "✗ OpenNeuro filtered datasets not created"
exit 1
fi
echo ""
if [ -f consolidated/to_digest_nemardatasets_repos.json ]; then
echo "✓ NEMAR filtered datasets created"
python -c "import json; data = json.load(open('consolidated/to_digest_nemardatasets_repos.json')); print(f' Datasets to digest: {len(data)}')"
else
echo "✗ NEMAR filtered datasets not created"
exit 1
fi

- name: Commit and push changes if datasets updated
run: |
git config user.name "github-actions[bot]"
git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

# Add all dataset files to staging
git add consolidated/openneuro_datasets.json
git add consolidated/nemardatasets_repos.json
git add consolidated/to_digest_openneuro_datasets.json
git add consolidated/to_digest_nemardatasets_repos.json

# Check if there are actual changes (not just timestamp differences)
if git diff --cached --quiet; then
echo "No changes detected in dataset files, skipping commit"
else
echo "Changes detected, committing..."
git commit -m "chore: update OpenNeuro & NEMAR dataset listings and filtered to_digest files"
git push origin HEAD:${{ github.head_ref }}
echo "✓ Changes committed and pushed"
fi

- name: Upload artifacts for downstream jobs
uses: actions/upload-artifact@v4
with:
name: dataset-listings
path: |
consolidated/openneuro_datasets.json
consolidated/nemardatasets_repos.json
consolidated/to_digest_openneuro_datasets.json
consolidated/to_digest_nemardatasets_repos.json
retention-days: 7
98 changes: 98 additions & 0 deletions .github/workflows/clone-openneuro-datasets.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
name: Clone OpenNeuro Datasets

on:
schedule:
# Run weekly on Monday at 02:00 UTC (after fetch completes)
- cron: '0 2 * * 1'
workflow_dispatch: # Allow manual triggering
# TODO: Add other triggers here as needed

jobs:
clone-datasets:
runs-on: ubuntu-latest
timeout-minutes: 720 # 12 hours max for all clones

steps:
- name: Checkout repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.11'

- name: Verify Python script and dataset listings
run: |
if [ ! -f scripts/ingestions/clone_openneuro_datasets.py ]; then
echo "Error: clone_openneuro_datasets.py not found"
exit 1
fi
if [ ! -f consolidated/openneuro_datasets.json ]; then
echo "Error: consolidated/openneuro_datasets.json not found"
exit 1
fi
DATASET_COUNT=$(jq 'length' consolidated/openneuro_datasets.json)
echo "Found $DATASET_COUNT dataset entries"

- name: Create test_diggestion directory
run: mkdir -p test_diggestion

- name: Clone OpenNeuro datasets
run: |
python scripts/ingestions/clone_openneuro_datasets.py \
--output-dir test_diggestion \
--timeout 300 \
--datasets-file consolidated/openneuro_datasets.json
continue-on-error: true # Don't fail workflow if some clones fail

- name: Generate clone report
if: always()
run: |
if [ -f test_diggestion/clone_results.json ]; then
echo "## Clone Results" >> $GITHUB_STEP_SUMMARY
echo "" >> $GITHUB_STEP_SUMMARY
jq -r '"- Success: \(.success | length)\n- Failed: \(.failed | length)\n- Timeout: \(.timeout | length)\n- Skipped: \(.skip | length)\n- Errors: \(.error | length)"' test_diggestion/clone_results.json >> $GITHUB_STEP_SUMMARY
fi

- name: Upload clone results
if: always()
uses: actions/upload-artifact@v4
with:
name: clone-results
path: |
test_diggestion/clone_results.json
test_diggestion/retry.json
retention-days: 30

- name: Create issue if clones failed
if: failure()
uses: actions/github-script@v7
with:
script: |
const fs = require('fs');
if (fs.existsSync('test_diggestion/clone_results.json')) {
const results = JSON.parse(fs.readFileSync('test_diggestion/clone_results.json'));
const failedCount = (results.failed || []).length + (results.timeout || []).length;
if (failedCount > 0) {
github.rest.issues.create({
owner: context.repo.owner,
repo: context.repo.repo,
title: `⚠️ Dataset Cloning: ${failedCount} datasets failed`,
body: `Failed/timeout clones detected.\n\nSee artifacts for details: ${context.runId}`,
labels: ['ci', 'datasets']
});
}
}

- name: Commit cloned datasets (optional)
if: success()
run: |
cd test_diggestion
git config --local user.email "[email protected]"
git config --local user.name "GitHub Action"
git add .
git commit -m "chore: update cloned OpenNeuro datasets" || echo "Nothing to commit"
git push
continue-on-error: true
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ examples/data
.DS_Store

data/
*.json
*.isorted
*.py.isorted

Expand Down
Loading
Loading