Skip to content

Commit c5bb231

Browse files
fixing visualisation and updating README
1 parent 4a86226 commit c5bb231

File tree

2 files changed

+134
-70
lines changed

2 files changed

+134
-70
lines changed

README.md

Lines changed: 31 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -146,12 +146,15 @@ ls /path/to/data/TESTXX/derivatives/cortical_tiles-2026/crops/2mm
146146

147147
## 4. Generate Champollion Configuration
148148

149-
Create dataset configuration files for Champollion:
149+
Create dataset configuration files for Champollion.
150+
151+
> **Recommended:** always pass `--external-config` to keep the `local.yaml` outside the pipeline directory. This is required in read-only containers (Apptainer/Docker) and avoids accidentally committing paths specific to your machine.
150152
151153
```bash
152154
pixi run python3 src/generate_champollion_config.py \
153155
/path/to/data/TESTXX/derivatives/cortical_tiles-2026/crops/2mm \
154-
--dataset TESTXX
156+
--dataset TESTXX \
157+
--external-config /path/to/data/TESTXX/derivatives/champollion_V1/configs/local.yaml
155158
```
156159

157160
### Options
@@ -160,18 +163,7 @@ pixi run python3 src/generate_champollion_config.py \
160163
|--------|-------------|
161164
| `--champollion_loc` | Path to Champollion binaries (default: external/champollion_V1) |
162165
| `--output` | Custom output path for config files |
163-
| `--external-config` | External path for local.yaml (for read-only containers) |
164-
165-
### Read-only Container Support (Apptainer)
166-
167-
When running in a read-only container environment:
168-
169-
```bash
170-
pixi run python3 src/generate_champollion_config.py \
171-
/path/to/crops \
172-
--dataset TESTXX \
173-
--external-config /writable/path/local.yaml
174-
```
166+
| `--external-config` | Path for `local.yaml` outside the pipeline directory (recommended) |
175167

176168
## 5. Generate Embeddings
177169

@@ -355,18 +347,42 @@ pixi run python3 src/generate_snapshots.py \
355347
| Option | Description |
356348
|--------|-------------|
357349
| `--morphologist_dir` | Path to Morphologist output (for sulcal graph snapshots) |
350+
| `--subject` | Subject folder name to visualize (e.g. `sub_0001`). When omitted the first subject found is used. |
351+
| `--acquisition` | Acquisition folder to use (e.g. `wk30`, `wk40`). Required when a subject has multiple segmentations. |
358352
| `--cortical_tiles_dir` | Path to crops/2mm/ directory (for tiles mask snapshots) |
359353
| `--embeddings_dir` | Path to combined embeddings (for UMAP scatter plots) |
360354
| `--reference_data_dir` | Path to pre-trained UMAP models and reference coordinates |
355+
| `--umap_region` | Comma-separated region name(s) to plot (e.g. `FColl-SRh,S.Or.`). Defaults to all regions with available models. |
361356
| `--output_dir` | Directory to save snapshot images |
362357
| `--sulcal-only` | Only generate sulcal graph snapshots |
363358
| `--tiles-only` | Only generate cortical tiles snapshots |
364359
| `--umap-only` | Only generate UMAP scatter plots |
365360
| `--width` / `--height` | Snapshot dimensions (default: 800x600) |
366361

362+
### Disambiguating multiple segmentations
363+
364+
If a subject has several Morphologist acquisitions (e.g. two timepoints `wk30` and `wk40`), the script warns and uses the first one found. Specify the acquisition explicitly to avoid ambiguity:
365+
366+
```bash
367+
pixi run python3 src/generate_snapshots.py \
368+
--morphologist_dir /path/to/subjects/ \
369+
--subject sub_0001 --acquisition wk40 \
370+
--output_dir /path/to/snapshots/ --sulcal-only
371+
```
372+
367373
### UMAP Visualization
368374

369-
The UMAP scatter plot projects a new subject's collateral sulcus embedding onto a pre-trained 2D map fitted on 42,433 UKBioBank40 reference subjects. The reference appears as a blue cloud, with the new subject highlighted in red.
375+
The UMAP scatter plots project a new subject's sulcal region embeddings onto pre-trained 2D maps, one per region and hemisphere. Each plot shows a blue reference cloud (42,433 UKBioBank subjects) with the new subject highlighted in red.
376+
377+
By default all regions for which both an embedding CSV and a pre-trained model exist in `reference_data/` are plotted. Use `--umap_region` to restrict the output:
378+
379+
```bash
380+
pixi run python3 src/generate_snapshots.py \
381+
--embeddings_dir /path/to/embeddings/ \
382+
--reference_data_dir reference_data/ \
383+
--output_dir /path/to/snapshots/ \
384+
--umap-only --umap_region FColl-SRh
385+
```
370386

371387
Pre-trained UMAP artifacts are stored in `reference_data/` and contain no subject identifiers (only anonymous 2D coordinates and fitted model parameters).
372388

src/generate_snapshots.py

Lines changed: 103 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -316,27 +316,73 @@ def generate_tiles_snapshot(crops_dir, output_path, size=(800, 600), level=1,
316316
return snapshots
317317

318318

319-
COLLATERAL_FILES = {
320-
"left": "FColl-SRh_left_name06-43-43--210_embeddings.csv",
321-
"right": "FColl-SRh_right_name06-56-15--113_embeddings.csv",
322-
}
319+
def discover_umap_pairs(embeddings_dir, reference_data_dir, regions=None):
320+
"""Find (csv_path, model_path, coords_path, region, hemi) tuples.
321+
322+
Scans ``embeddings_dir`` for ``*_embeddings.csv`` files and matches each
323+
one to pre-trained UMAP model artefacts in ``reference_data_dir``.
324+
325+
CSV filenames are expected to follow the pattern::
326+
327+
{region}_{hemi}_{identifier}_embeddings.csv
328+
329+
where ``hemi`` is ``left`` or ``right``. The corresponding model files
330+
must be named::
331+
332+
umap_{region}_{hemi}.pkl
333+
umap_{region}_{hemi}_coords.npy
334+
335+
Args:
336+
embeddings_dir: Directory containing embedding CSV files.
337+
reference_data_dir: Directory containing pre-trained UMAP models.
338+
regions: Optional list of region names to include. ``None`` means
339+
include all regions for which model artefacts exist.
340+
341+
Returns:
342+
List of tuples ``(csv_path, model_path, coords_path, region, hemi)``.
343+
"""
344+
csv_files = sorted(glob.glob(osp.join(embeddings_dir, "*_embeddings.csv")))
345+
pairs = []
346+
for csv_path in csv_files:
347+
parts = osp.basename(csv_path).split("_")
348+
if len(parts) < 3:
349+
continue
350+
region, hemi = parts[0], parts[1]
351+
if hemi not in ("left", "right"):
352+
continue
353+
if regions and region not in regions:
354+
continue
355+
model_path = osp.join(reference_data_dir, f"umap_{region}_{hemi}.pkl")
356+
coords_path = osp.join(reference_data_dir,
357+
f"umap_{region}_{hemi}_coords.npy")
358+
if osp.exists(model_path) and osp.exists(coords_path):
359+
pairs.append((csv_path, model_path, coords_path, region, hemi))
360+
else:
361+
print(f" UMAP model not found for {region} {hemi} — skipping "
362+
f"(expected {osp.basename(model_path)} in reference_data_dir)")
363+
return pairs
323364

324365

325366
def generate_umap_snapshot(embeddings_dir, reference_data_dir, output_path,
326-
size=(800, 600)):
327-
"""Generate UMAP scatter plots for the collateral sulcus region.
367+
size=(800, 600), regions=None):
368+
"""Generate UMAP scatter plots for embedding regions.
328369
329-
Projects the pipeline's new subject(s) onto a pre-trained UMAP fitted
330-
on UKBioBank40 reference embeddings. Produces one plot per hemisphere.
370+
Discovers embedding CSV files in ``embeddings_dir`` and generates a UMAP
371+
projection plot for every (region, hemisphere) pair that has both a CSV and
372+
pre-trained model artefacts in ``reference_data_dir``.
331373
332374
Args:
333-
embeddings_dir: Path to pipeline embeddings (stage 5 output)
334-
reference_data_dir: Path to pre-trained UMAP models and coords
335-
output_path: Base path for output images (suffixed with _left/_right)
336-
size: Tuple of (width, height)
375+
embeddings_dir: Path to pipeline embeddings directory.
376+
reference_data_dir: Path to pre-trained UMAP models and reference
377+
coordinate arrays.
378+
output_path: Base path for output images. Each plot is saved as
379+
``{basename}_{region}_{hemi}{ext}``.
380+
size: Tuple of (width, height) in pixels.
381+
regions: Optional list of region names to include. ``None`` means
382+
generate plots for all available regions.
337383
338384
Returns:
339-
List of generated snapshot file paths
385+
List of generated snapshot file paths.
340386
"""
341387
import joblib
342388
import matplotlib
@@ -348,35 +394,23 @@ def generate_umap_snapshot(embeddings_dir, reference_data_dir, output_path,
348394
ext = osp.splitext(output_path)[1] or ".png"
349395
snapshots = []
350396

351-
for hemi, csv_name in COLLATERAL_FILES.items():
352-
region = csv_name.split("_")[0]
353-
model_path = osp.join(
354-
reference_data_dir, f"umap_{region}_{hemi}.pkl"
355-
)
356-
coords_path = osp.join(
357-
reference_data_dir, f"umap_{region}_{hemi}_coords.npy"
358-
)
359-
if not osp.exists(model_path) or not osp.exists(coords_path):
360-
print(f" UMAP artifacts not found for {hemi}, skipping")
361-
continue
397+
pairs = discover_umap_pairs(embeddings_dir, reference_data_dir,
398+
regions=regions)
399+
if not pairs:
400+
print(" No matching (embedding CSV, UMAP model) pairs found")
401+
return snapshots
362402

403+
for csv_path, model_path, coords_path, region, hemi in pairs:
363404
model = joblib.load(model_path)
364405
ref_coords = np.load(coords_path)
365-
print(f" [{hemi}] Loaded {ref_coords.shape[0]} reference points")
366-
367-
new_csv = osp.join(embeddings_dir, csv_name)
368-
if not osp.exists(new_csv):
369-
print(f" [{hemi}] No embedding found at {new_csv}, skipping")
370-
continue
406+
print(f" [{region} {hemi}] Loaded {ref_coords.shape[0]} reference points")
371407

372-
df = pd.read_csv(new_csv)
408+
df = pd.read_csv(csv_path)
373409
X_new = df.drop(columns=["ID"]).values.astype(np.float32)
374410
new_coords = model.transform(X_new)
375-
print(f" [{hemi}] Projected {X_new.shape[0]} new subject(s)")
411+
print(f" [{region} {hemi}] Projected {X_new.shape[0]} new subject(s)")
376412

377-
fig, ax = plt.subplots(
378-
figsize=(size[0] / 100, size[1] / 100)
379-
)
413+
fig, ax = plt.subplots(figsize=(size[0] / 100, size[1] / 100))
380414
ax.scatter(
381415
ref_coords[:, 0], ref_coords[:, 1],
382416
s=1, c="#4a90d9", alpha=0.08,
@@ -387,16 +421,14 @@ def generate_umap_snapshot(embeddings_dir, reference_data_dir, output_path,
387421
s=80, c="#e74c3c", edgecolors="white", linewidths=0.8,
388422
zorder=5, label="Your subject",
389423
)
390-
ax.set_title(
391-
f"Collateral sulcus \u2014 {hemi}", fontsize=12
392-
)
424+
ax.set_title(f"{region} \u2014 {hemi}", fontsize=12)
393425
ax.legend(loc="best", fontsize=9, framealpha=0.9)
394426
ax.set_xlabel("UMAP 1", fontsize=9)
395427
ax.set_ylabel("UMAP 2", fontsize=9)
396428
ax.tick_params(labelsize=8)
397429
plt.tight_layout()
398430

399-
snap = f"{basename}_{hemi}{ext}"
431+
snap = f"{basename}_{region}_{hemi}{ext}"
400432
plt.savefig(snap, dpi=150)
401433
plt.close(fig)
402434
snapshots.append(snap)
@@ -440,6 +472,10 @@ def __init__(self):
440472
.add_optional_argument(
441473
"--reference_data_dir",
442474
"Path to pre-trained UMAP models and reference coords")
475+
.add_optional_argument(
476+
"--umap_region",
477+
"Comma-separated list of region names to generate UMAP plots for "
478+
"(e.g. FColl-SRh,S.Or.). Defaults to all regions with available models.")
443479
.add_optional_argument(
444480
"--tiles_level",
445481
"Region threshold level (0-3)",
@@ -568,23 +604,35 @@ def _run_tiles(self, size):
568604
def _run_umap(self, size):
569605
"""Generate UMAP scatter plots."""
570606
snapshots = []
571-
if (self.args.embeddings_dir
572-
and osp.exists(self.args.embeddings_dir)
573-
and self.args.reference_data_dir
574-
and osp.exists(self.args.reference_data_dir)):
575-
print("\nGenerating UMAP scatter plots...")
576-
out = osp.join(self.args.output_dir, "umap_collateral.png")
577-
try:
578-
snaps = generate_umap_snapshot(
579-
self.args.embeddings_dir,
580-
self.args.reference_data_dir,
581-
out, size,
582-
)
583-
snapshots.extend(snaps)
584-
except Exception as e:
585-
print(f" Error generating UMAP snapshot: {e}")
586-
elif self.args.reference_data_dir and not osp.exists(self.args.reference_data_dir):
607+
608+
if not self.args.embeddings_dir:
609+
return snapshots
610+
if not osp.exists(self.args.embeddings_dir):
611+
print(f"Embeddings directory not found: {self.args.embeddings_dir}")
612+
return snapshots
613+
if not self.args.reference_data_dir:
614+
return snapshots
615+
if not osp.exists(self.args.reference_data_dir):
587616
print(f"Reference data directory not found: {self.args.reference_data_dir}")
617+
return snapshots
618+
619+
regions = None
620+
umap_region = getattr(self.args, "umap_region", None)
621+
if umap_region:
622+
regions = [r.strip() for r in umap_region.split(",")]
623+
print(f"\nUMAP region filter: {', '.join(regions)}")
624+
625+
print("\nGenerating UMAP scatter plots...")
626+
out = osp.join(self.args.output_dir, "umap.png")
627+
try:
628+
snaps = generate_umap_snapshot(
629+
self.args.embeddings_dir,
630+
self.args.reference_data_dir,
631+
out, size, regions=regions,
632+
)
633+
snapshots.extend(snaps)
634+
except Exception as e:
635+
print(f" Error generating UMAP snapshot: {e}")
588636
return snapshots
589637

590638

0 commit comments

Comments
 (0)