-
Notifications
You must be signed in to change notification settings - Fork 188
[ENH] Add ContextURI to allow to define the context for the entity values #1939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
It is often desired to be able to determine that values used for entities in the dataset belong to some controlled vocabulary, or simply defined centrally within some "id" authority. E.g. could be unique scanning session IDs per scanning center, or similarly subject_ids defined per study or centrally for the center. It is of particular interest for large studies where multiple datasets could be created, one per site or primary data modality, to later possibly be composed into a single dataset or just to become parts of the one larger multi-site one. In such cases it becomes quite important to annotate that particular entities (subject_id, session_id and possibly even _desc- or _acq- values) are defined in the scope of the specific larger study and thus correspond to the "same" thing given the same contextURI and value.
"ContextURI": "https://thelab.example.com/term/subject/" | ||
} | ||
} | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess here would also be nice to give example with bids:
URI to point to this particular dataset to say that IDs are unique to this dataset?
@effigies WDYT about this idea ? It came up again in the scope of the as a way to define context for pointing to external resource for probe geometries. I think it would overall be handy to complement Let me know if we need one more bbq to discuss it "interactively". |
Looked very briefly. Apologies if this is a dumb question, but isn't |
So it is a complete URL pointing to definition and expected to resolve etc.
There is probably some notion of "one or another", as I do not see immediately when I should use both If overall makes sense, I would really like to follow on my promise in original description and add |
From a schema perspective,
The problems BIDS-URI aimed to solve (see #471 (comment)) were:
There doesn't seem to be a problem analogous to (1) here, and as I state above, I believe 2 is adequately satisfied by TermURL (though TermURI might have been a more forward-thinking name). The argument here seems to rest on 3, but that seems just as easily satisfied by having the sidecars for your TSVs at the root of the dataset. I'm concerned about trying to bake too much semantic web into BIDS and ultimately make it harder for a semantic web naïf to understand a dataset. I think Given that BIDS as a whole is not JSONLD-friendly, presumably some kind of scaffolding is necessary to make it understandable. Would a root level |
It is often desired to be able to determine that values used for entities in the dataset belong to some controlled vocabulary, or simply defined centrally within some "id" authority. E.g. could be unique scanning session IDs per scanning center, or similarly subject_ids defined per study or centrally for the center.
It is of particular interest for large studies where multiple datasets could be created, one per site or primary data modality, to later possibly be composed into a single dataset or just to become parts of the one larger multi-site one. In such cases it becomes quite important to annotate that particular entities (subject_id, session_id and possibly even _desc- or _acq- values) are defined in the scope of the specific larger study and thus correspond to the "same" thing given the same contextURI and value.
TODOs:
Context Prefixes
in .jsonld etc it is common to centrally define common JSON-LD Contexts which could even be defined externally and pointed via
@context
attribute. E.g. in https://dandiarchive.s3.amazonaws.com/dandisets/000003/draft/dandiset.jsonld we point to https://raw.githubusercontent.com/dandi/schema/master/releases/0.6.0/context.json which would tell within its@context
that"ORCID": "https://orcid.org/",
and"spdx": "http://spdx.org/licenses/",
. Now if we specify that"license": "spdx:apache-2.0"
we know that license "identity" is reallyhttp://spdx.org/licenses/apache-2.0
(actual URL does not even have to exist).So, I wonder if we could/should define within
dataset_description.json
alsoContext: dict[str, str]
which would provide similar mappings. So then I coulddataset_description.json
have"Contexts": {"thelab": "http://thelab.example.com/term/"}
participants.json
forparticipant_id
to have"ContextURI": "thelab:subject"
which in turn for everyparticipant_id
ultimately get expanded intohttp://thelab.example.com/term/subject/{participant_id}
if to map across datasets.attn @satra and @tekrajchhetri who know "linked" stuff better and could express their recommendations how we could align even better