Skip to content

Conversation

yarikoptic
Copy link
Collaborator

@yarikoptic yarikoptic commented Oct 3, 2024

It is often desired to be able to determine that values used for entities in the dataset belong to some controlled vocabulary, or simply defined centrally within some "id" authority. E.g. could be unique scanning session IDs per scanning center, or similarly subject_ids defined per study or centrally for the center.

It is of particular interest for large studies where multiple datasets could be created, one per site or primary data modality, to later possibly be composed into a single dataset or just to become parts of the one larger multi-site one. In such cases it becomes quite important to annotate that particular entities (subject_id, session_id and possibly even _desc- or _acq- values) are defined in the scope of the specific larger study and thus correspond to the "same" thing given the same contextURI and value.

TODOs:

  • discuss definition of centrally defining context prefixes (see below)
  • provide example to bids-examples datasets

Context Prefixes

in .jsonld etc it is common to centrally define common JSON-LD Contexts which could even be defined externally and pointed via @context attribute. E.g. in https://dandiarchive.s3.amazonaws.com/dandisets/000003/draft/dandiset.jsonld we point to https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json which would tell within its @context that "ORCID": "https://orcid.org/", and "spdx": "http://spdx.org/licenses/",. Now if we specify that "license": "spdx:apache-2.0" we know that license "identity" is really http://spdx.org/licenses/apache-2.0 (actual URL does not even have to exist).

So, I wonder if we could/should define within dataset_description.json also Context: dict[str, str] which would provide similar mappings. So then I could

  • in dataset_description.json have "Contexts": {"thelab": "http://thelab.example.com/term/"}
  • in participants.json for participant_id to have "ContextURI": "thelab:subject" which in turn for every participant_id ultimately get expanded into http://thelab.example.com/term/subject/{participant_id} if to map across datasets.

attn @satra and @tekrajchhetri who know "linked" stuff better and could express their recommendations how we could align even better

It is often desired to be able to determine that values used
for entities in the dataset belong to some controlled vocabulary,
or simply defined centrally within some "id" authority. E.g. could be
unique scanning session IDs per scanning center, or similarly subject_ids defined
per study or centrally for the center.

It is of particular interest for large studies where multiple datasets could be
created, one per site or primary data modality, to later possibly be composed
into a single dataset or just to become parts of the one larger multi-site one.
In such cases it becomes quite important to annotate that particular entities
(subject_id, session_id and possibly even _desc- or _acq- values) are defined
in the scope of the specific  larger study and thus correspond to the "same" thing
given the same contextURI and value.
"ContextURI": "https://thelab.example.com/term/subject/"
}
}
```
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess here would also be nice to give example with bids: URI to point to this particular dataset to say that IDs are unique to this dataset?

@yarikoptic
Copy link
Collaborator Author

@effigies WDYT about this idea ? It came up again in the scope of the

as a way to define context for pointing to external resource for probe geometries. I think it would overall be handy to complement bids: URIs.

Let me know if we need one more bbq to discuss it "interactively".

@effigies
Copy link
Collaborator

effigies commented Oct 1, 2025

Looked very briefly. Apologies if this is a dumb question, but isn't TermURL designed for this?

@yarikoptic
Copy link
Collaborator Author

yarikoptic commented Oct 2, 2025

TermURL:

URL pointing to a formal definition of this type of data in an ontology available on the web.
For example: https://www.ncbi.nlm.nih.gov/mesh/68008297 for "male".

So it is a complete URL pointing to definition and expected to resolve etc.

  • Whenever TermURL is designated to be associated with a specific value (hence defined for each Level value), ContextURI is rather more of a "prefix" defined for an entire column, so should be taken together with the value (like some "open ended" definitions in HED), be it a Level or just a string or anything else.
  • ContextURI (and combination of the value with it) in principle does not even need to be a legit/full URL and resolve! It is rather to build a "unique identifier" of a kind, so that f"{ContextURI}{value}" would constitute an id for a value.

There is probably some notion of "one or another", as I do not see immediately when I should use both TermURL and ContextURI, but they still could co-exist. E.g. for X I might have ContextURI point to some specific prefix to identify selection of values I would be using in a corresponding study, while then point with TermURL to specific ontological descriptions, potentially even in different ontologies.

If overall makes sense, I would really like to follow on my promise in original description and add Contexts to dataset_description.json. I think this would be the most influential and useful addition. I would probably need to mention that bids: already has special meaning per https://bids-specification.readthedocs.io/en/stable/common-principles.html#bids-uri and should not be overloaded.

@effigies
Copy link
Collaborator

effigies commented Oct 3, 2025

From a schema perspective, TermURL is just a URI (we don't try to separate URLs and URNs), so it seems like for the sake of giving unresolveable names, you can do that right now. If you want them to be resolvable, you can expand the URI to a full URL.

I think it would overall be handy to complement bids: URIs.

The problems BIDS-URI aimed to solve (see #471 (comment)) were:

  1. Multiplicity of ways to refer to files within a BIDS dataset.
  2. Compatibility with URIs allowing local and remote files to be referenced as appropriate.
  3. Handling relocated datasets centrally.

There doesn't seem to be a problem analogous to (1) here, and as I state above, I believe 2 is adequately satisfied by TermURL (though TermURI might have been a more forward-thinking name).

The argument here seems to rest on 3, but that seems just as easily satisfied by having the sidecars for your TSVs at the root of the dataset.


I'm concerned about trying to bake too much semantic web into BIDS and ultimately make it harder for a semantic web naïf to understand a dataset. I think TermURL is self-explanatory and everybody knows how to follow a link, while a ContextURI that requires looking the scheme up in the dataset_description.json strikes me as too machine oriented. Are there machines that are trying to do this, and are TermURLs that are populated with URIs inadequate?

Given that BIDS as a whole is not JSONLD-friendly, presumably some kind of scaffolding is necessary to make it understandable. Would a root level .context.jsonld file serve your purposes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

opinions wanted Please read and offer your opinion on this matter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants