sccn
diff --git a/‎.github/workflows/doc.yaml‎
Lines changed: 64 additions & 9 deletions b/‎.github/workflows/doc.yaml‎
Lines changed: 64 additions & 9 deletions
diff --git a/‎.github/workflows/tests.yml‎
Lines changed: 64 additions & 10 deletions b/‎.github/workflows/tests.yml‎
Lines changed: 64 additions & 10 deletions
diff --git a/‎DevNotes.md‎
Lines changed: 2 additions & 1 deletion b/‎DevNotes.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 2 additions & 54 deletions b/‎README.md‎
Lines changed: 2 additions & 54 deletions
@@ -16,9 +16,6 @@ permissions:
 jobs:
   docs:
     runs-on: ${{ matrix.os }}
-    env:
-      EEGDASH_CACHE_DIR: ${{ github.workspace }}/.eegdash_cache
-      MNE_DATA: ${{ github.workspace }}/.eegdash_cache
     strategy:
       fail-fast: false
       matrix:
@@ -31,6 +28,33 @@ jobs:
         with:
           python-version: ${{ matrix.python-version }}
 
+      - name: Configure dataset cache paths
+        id: cache-paths
+        shell: python
+        run: |
+          import os
+          from pathlib import Path
+
+          home = Path.home()
+          workspace = Path(os.environ["GITHUB_WORKSPACE"]).resolve()
+          candidates = {
+              "primary": home / "eegdash_cache",
+              "home_dot": home / ".eegdash_cache",
+              "workspace": workspace / ".eegdash_cache",
+              "mne_data": home / "mne_data",
+          }
+
+          for path in candidates.values():
+              path.mkdir(parents=True, exist_ok=True)
+
+          with open(os.environ["GITHUB_ENV"], "a", encoding="utf-8") as env_file:
+              env_file.write(f"EEGDASH_CACHE_DIR={candidates['primary']}\n")
+              env_file.write(f"MNE_DATA={candidates['primary']}\n")
+
+          with open(os.environ["GITHUB_OUTPUT"], "a", encoding="utf-8") as output:
+              for key, path in candidates.items():
+                  output.write(f"{key}={path}\n")
+
       - name: Install dependencies
         run: |
           python -m pip install uv
@@ -42,16 +66,47 @@ jobs:
           . .venv/bin/activate
           echo PATH=$PATH >> $GITHUB_ENV
 
-      - name: Create/Restore Data Caches (workspace)
+      - name: Restore Data Caches (pull_request)
+        if: github.event_name == 'pull_request'
+        id: cache-data-restore
+        uses: actions/cache@v4
+        with:
+          path: |
+            ${{ steps.cache-paths.outputs.primary }}
+            ${{ steps.cache-paths.outputs.home_dot }}
+            ${{ steps.cache-paths.outputs.workspace }}
+            ${{ steps.cache-paths.outputs.mne_data }}
+          # Cache includes dataset manifest hash so new datasets invalidate once automatically.
+          key: ${{ runner.os }}-data-${{ github.head_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-v2
+          restore-keys: |
+            ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+            ${{ runner.os }}-data-develop-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+            ${{ runner.os }}-data-main-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+            ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-
+            ${{ runner.os }}-data-develop-
+            ${{ runner.os }}-data-main-
+            ${{ runner.os }}-data-
+          lookup-only: true
+
+      - name: Create/Restore Data Caches (push)
+        if: github.event_name != 'pull_request'
         id: cache-data
         uses: actions/cache@v4
         with:
           path: |
-            ${{ env.EEGDASH_CACHE_DIR }}
-          # Use a stable key so caches can be reused across runs.
-          # Bump the suffix (v1 -> v2) to invalidate when needed.
-          key: ${{ runner.os }}-data-v1
+            ${{ steps.cache-paths.outputs.primary }}
+            ${{ steps.cache-paths.outputs.home_dot }}
+            ${{ steps.cache-paths.outputs.workspace }}
+            ${{ steps.cache-paths.outputs.mne_data }}
+          # Cache includes dataset manifest hash so new datasets invalidate once automatically.
+          key: ${{ runner.os }}-data-${{ github.head_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-v2
           restore-keys: |
+            ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+            ${{ runner.os }}-data-develop-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+            ${{ runner.os }}-data-main-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+            ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-
+            ${{ runner.os }}-data-develop-
+            ${{ runner.os }}-data-main-
             ${{ runner.os }}-data-
 
       - name: Create Docs
@@ -66,4 +121,4 @@ jobs:
         with:
           github_token: ${{ secrets.GITHUB_TOKEN }}
           publish_dir: ./docs/build/html
-          cname: eegdash.org
+          cname: eegdash.org
@@ -14,9 +14,6 @@ on:
 jobs:
   test:
     runs-on: ${{ matrix.os }}
-    env:
-      EEGDASH_CACHE_DIR: ${{ github.workspace }}/.eegdash_cache
-      MNE_DATA: ${{ github.workspace }}/.eegdash_cache
     strategy:
       fail-fast: false
       matrix:
@@ -26,18 +23,75 @@ jobs:
     ## Install Braindecode
     - name: Checking Out Repository
       uses: actions/checkout@v4
+    - name: Configure dataset cache paths
+      id: cache-paths
+      shell: python
+      run: |
+        import os
+        from pathlib import Path
+
+        home = Path.home()
+        workspace = Path(os.environ["GITHUB_WORKSPACE"]).resolve()
+        candidates = {
+            "primary": home / "eegdash_cache",
+            "home_dot": home / ".eegdash_cache",
+            "workspace": workspace / ".eegdash_cache",
+            "mne_data": home / "mne_data",
+        }
+
+        for path in candidates.values():
+            path.mkdir(parents=True, exist_ok=True)
+
+        with open(os.environ["GITHUB_ENV"], "a", encoding="utf-8") as env_file:
+            env_file.write(f"EEGDASH_CACHE_DIR={candidates['primary']}\n")
+            env_file.write(f"MNE_DATA={candidates['primary']}\n")
+
+        with open(os.environ["GITHUB_OUTPUT"], "a", encoding="utf-8") as output:
+            for key, path in candidates.items():
+                output.write(f"{key}={path}\n")
     # Cache MNE Data
-    # The cache key here is fixed except for os
-    # so if you download a new mne dataset in the code, best to manually increment the key below
-    - name: Create/Restore EEGDash Cache (workspace)
+    # Cache key incorporates the consolidated dataset manifest so new datasets refresh automatically.
+    - name: Restore EEGDash Cache (pull_request)
+      if: github.event_name == 'pull_request'
+      id: cache-mne_data-restore
+      uses: actions/cache@v4
+      with:
+        path: |
+          ${{ steps.cache-paths.outputs.primary }}
+          ${{ steps.cache-paths.outputs.home_dot }}
+          ${{ steps.cache-paths.outputs.workspace }}
+          ${{ steps.cache-paths.outputs.mne_data }}
+        # Cache includes dataset manifest hash so new datasets invalidate once automatically.
+        key: ${{ runner.os }}-data-${{ github.head_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-v2
+        restore-keys: |
+          ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+          ${{ runner.os }}-data-develop-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+          ${{ runner.os }}-data-main-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+          ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-
+          ${{ runner.os }}-data-develop-
+          ${{ runner.os }}-data-main-
+          ${{ runner.os }}-data-
+        lookup-only: true
+
+    - name: Create/Restore EEGDash Cache (push)
+      if: github.event_name != 'pull_request'
       id: cache-mne_data
       uses: actions/cache@v4
       with:
-        path: ${{ env.EEGDASH_CACHE_DIR }}
-        # Use a stable key so caches can be reused across runs.
-        # Keep in sync with docs workflow; bump suffix to invalidate.
-        key: ${{ runner.os }}-data-v1
+        path: |
+          ${{ steps.cache-paths.outputs.primary }}
+          ${{ steps.cache-paths.outputs.home_dot }}
+          ${{ steps.cache-paths.outputs.workspace }}
+          ${{ steps.cache-paths.outputs.mne_data }}
+        # Cache includes dataset manifest hash so new datasets invalidate once automatically.
+        key: ${{ runner.os }}-data-${{ github.head_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-v2
         restore-keys: |
+          ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+          ${{ runner.os }}-data-develop-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+          ${{ runner.os }}-data-main-${{ hashFiles('consolidated/datasets_consolidated.json') }}-
+          ${{ runner.os }}-data-${{ github.base_ref || github.ref_name }}-
+          ${{ runner.os }}-data-develop-
+          ${{ runner.os }}-data-main-
           ${{ runner.os }}-data-
 
     - name: Install uv and set the python version
 
@@ -2,7 +2,8 @@
 pip install -r requirements.txt
 
 pip uninstall eegdash -y
-python -m pip install --editable /Users/arno/Python/EEG-Dash-Data
+python -m pip install --editable .
+
 # Warning use the exact command above, pip install by itself might not work
 
 ### check if working from different folders
 
@@ -14,22 +14,6 @@ To leverage recent and ongoing advancements in large-scale computational methods
 
 The data in EEG-DaSh originates from a collaboration involving 25 laboratories, encompassing 27,053 participants. This extensive collection includes MEEG data, which is a combination of EEG and MEG signals. The data is sourced from various studies conducted by these labs, involving both healthy subjects and clinical populations with conditions such as ADHD, depression, schizophrenia, dementia, autism, and psychosis. Additionally, data spans different mental states like sleep, meditation, and cognitive tasks. In addition, EEG-DaSh will incorporate a subset of the data converted from NEMAR, which includes 330 MEEG BIDS-formatted datasets, further expanding the archive with well-curated, standardized neuroelectromagnetic data.
 
-## Featured data
-
-The following HBN datasets are currently featured on EEGDash. Documentation about these datasets is available [here](https://neuromechanist.github.io/data/hbn/).
-
-| DatasetID | Participants | Files | Sessions | Population | Channels | Is 10-20? | Modality | Size |
-|---|---|---|---|---|---|---|---|---|
-| [ds005505](https://nemar.org/dataexplorer/detail?dataset_id=ds005505) | 136 | 5393 | 1 | Healthy | 129 | other | Visual | 103 GB |
-| [ds005506](https://nemar.org/dataexplorer/detail?dataset_id=ds005506) | 150 | 5645 | 1 | Healthy | 129 | other | Visual | 112 GB |
-| [ds005507](https://nemar.org/dataexplorer/detail?dataset_id=ds005507) | 184 | 7273 | 1 | Healthy | 129 | other | Visual | 140 GB |
-| [ds005508](https://nemar.org/dataexplorer/detail?dataset_id=ds005508) | 324 | 13393 | 1 | Healthy | 129 | other | Visual | 230 GB |
-| [ds005510](https://nemar.org/dataexplorer/detail?dataset_id=ds005510) | 135 | 4933 | 1 | Healthy | 129 | other | Visual | 91 GB |
-| [ds005512](https://nemar.org/dataexplorer/detail?dataset_id=ds005512) | 257 | 9305 | 1 | Healthy | 129 | other | Visual | 157 GB |
-| [ds005514](https://nemar.org/dataexplorer/detail?dataset_id=ds005514) | 295 | 11565 | 1 | Healthy | 129 | other | Visual | 185 GB |
-
-A total of [246 other datasets](datasets.md) are also available through EEGDash. 
-
 ## Data format
 
 EEGDash queries return a **Pytorch Dataset** formatted to facilitate machine learning (ML) and deep learning (DL) applications. PyTorch Datasets are the best format for EEGDash queries because they provide an efficient, scalable, and flexible structure for machine learning (ML) and deep learning (DL) applications. They allow seamless integration with PyTorch’s DataLoader, enabling efficient batching, shuffling, and parallel data loading, which is essential for training deep learning models on large EEG datasets.
@@ -41,47 +25,11 @@ EEGDash datasets are processed using the popular [braindecode](https://braindeco
 ## EEG-Dash usage
 
 ### Install
-Use your preferred Python environment manager with Python > 3.9 to install the package.
+Use your preferred Python environment manager with Python > 3.10 to install the package.
 * To install the eegdash package, use the following command: `pip install eegdash`
 * To verify the installation, start a Python session and type: `from eegdash import EEGDash`
 
-### Data access
-
-To use the data from a single subject, enter:
-
-```python
-from eegdash import EEGDashDataset
-
-ds_NDARDB033FW5 = EEGDashDataset(
-    {"dataset": "ds005514", "task":
-     "RestingState", "subject": "NDARDB033FW5"}, 
-     cache_dir="."
-)
-```
-
-This will search and download the metadata for the task **RestingState** for subject **NDARDB033FW5** in BIDS dataset **ds005514**. The actual data will not be downloaded at this stage. Following standard practice, data is only downloaded once it is processed. The **ds_NDARDB033FW5** object is a fully functional braindecode dataset, which is itself a PyTorch dataset. This [tutorial](https://github.com/sccn/EEGDash/blob/develop/notebooks/tutorial_eoec.ipynb) shows how to preprocess the EEG data, extracting portions of the data containing eyes-open and eyes-closed segments, then perform eyes-open vs. eyes-closed classification using a (shallow) deep-learning model. 
-
-To use the data from multiple subjects, enter:
-
-```python
-from eegdash import EEGDashDataset
-
-ds_ds005505rest = EEGDashDataset(
-    {"dataset": "ds005505", "task": "RestingState"}, target_name="sex", cache_dir=".
-)
-```
-
-This will search and download the metadata for the task 'RestingState' for all subjects in BIDS dataset 'ds005505' (a total of 136). As above, the actual data will not be downloaded at this stage so this command is quick to execute. Also, the target class for each subject is assigned using the target_name parameter. This means that this object is ready to be directly fed to a deep learning model, although the [tutorial script](https://github.com/sccn/EEGDash/blob/develop/notebooks/tutorial_sex_classification.ipynb) performs minimal processing on it, prior to training a deep-learning model. Because 14 gigabytes of data are downloaded, this tutorial takes about 10 minutes to execute.
-
-### Automatic caching
-
-By default, EEGDash caches downloaded data under a single, consistent folder:
-
-- If ``EEGDASH_CACHE_DIR`` is set in your environment, that path is used.
-- Else, if MNE’s ``MNE_DATA`` config is set, that path is used to align with other EEG tooling.
-- Otherwise, ``.eegdash_cache`` in the current working directory is used.
-
-This means that if you run the tutorial [scripts](https://github.com/sccn/EEGDash/tree/develop/notebooks), the data will only be downloaded the first time the script is executed and reused thereafter.
+Please check our tutorial webpages to explore what you can do with [eegdash](https://eegdash.org/)! 
 
 ## Education -- Coming soon...