Skip to content

Conversation

effigies
Copy link
Contributor

@effigies effigies commented Feb 4, 2025

This updates the intersects() function in the schema expression language to:

  1. Return the intersection, if non-empty. The order will match the order in the longer list.
  2. Return null on empty intersections to ensure that intersects() remains falsey.

This seems like the simplest resolution for resolving bids-standard/bids-specification#914 along the lines of:

If participants.tsv exists, then the participant_label entries are a superset of subject directories and participant labels found in phenotype/.

With this change, the rule could be written:

SubDirSubset:
  selectors:
    - path == '/participants.tsv'
  checks:
    - |
      allequal(
        sorted(intersects(dataset.subjects.participant_id, dataset.subjects.sub_dirs)),
        sorted(dataset.subjects.sub_dirs)
      )

PhenotypeSubset:
  selectors:
    - path == '/participants.tsv'
    - dataset.subjects.phenotype
  checks:
    - |
      allequal(
        sorted(intersects(dataset.subjects.participant_id, dataset.subjects.phenotype)),
        sorted(dataset.subjects.phenotype)
      )

If we require that validators sort the arrays in dataset.subjects on load, then we can drop the extra sorted() calls here.

Because we've seen datasets with subjects in the thousands, I've updated this function to create a Set() based on the smaller list. There is very likely a small n where this constitutes a performance regression.

@effigies
Copy link
Contributor Author

effigies commented Feb 4, 2025

cc @ericearl

@effigies effigies force-pushed the feat/intersects branch 2 times, most recently from 59b5921 to 44db152 Compare February 4, 2025 19:34
@effigies
Copy link
Contributor Author

effigies commented Feb 6, 2025

@effigies effigies merged commit 3a66a21 into bids-standard:main Feb 6, 2025
17 checks passed
@effigies effigies deleted the feat/intersects branch February 6, 2025 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants