Skip to content

TableInfo: collection.tsv

abradyIGS edited this page Sep 3, 2021 · 28 revisions

The C2M2 collection entity is a generalization of "dataset" -- a named grouping of files, biosamples and/or subjects.

Defining collections is optional. The `collection.tsv table will have one row for each collection you define for your program.

Please see the technical documentation for a complete treatment of how C2M2 collections are used.

Field Field Description Required? Attributes Extra Info
id_namespace A CFDE-cleared identifier representing the top-level data space containing this collection [part 1 of 2-component composite primary key] If a row has this value, local_id must also have a value Value type is string If your program has not implemented multiple id_namespaces, this will be exactly the same for all rows

This will be the value of id_namespace in project.tsv for the overarching project in your program and/or the value of project_id_namespace in primary_dcc_contact
local_id An identifier representing this collection, unique within this id_namespace [part 2 of 2-component composite primary key] If a row has this value, id_namespace must also have a value The value in each row must be different

Value type is string
Each individual collection needs a unique local_id value (every row should be different).

The local_id column appears in many tables but values should not be repeated across tables. e.g. 'file' local_id is a separate concept from 'biosample' local_id.
persistent_id A persistent, resolvable (not necessarily retrievable) URI or compact ID permanently attached to this collection Non-required: Any number of rows after the header can be filled The value in each row must be different

Value type is string
Meant to serve as a permanent address to which landing pages (which summarize metadata associated with this collection) and other relevant annotations and functions can optionally be attached, including information enabling resolution to a network location from which the file can be downloaded. Actual network locations must not be embedded directly within this identifier: one level of indirection is required in order to protect persistent_id values from changes in network location over time as files are moved around.
creation_time An ISO 8601 -; RFC 3339 (subset)-compliant timestamp documenting this Collections creation time: YYYY-MM-DDTHH:MM:SS±NN:NN Non-required: Any number of rows after the header can be filled Value must be datetime Example valid dates:
2021-01-08
2021-01-08T00:45:40Z>
2021-01-08T00:45:40+00:00
abbreviation A very short display label for this collection Non-required: Any number of rows after the header can be filled Value can be any string that is not already in use at the CFDE; Value should be 10 characters or fewer

Cannot contain special unix characters
This is the display abbreviation for this collection in the portal
name A short, human-readable, machine-read-friendly label for this collection Non-required: Any number of rows after the header can be filled Value type is string This is the display name for this collection in the portal
description A human-readable description of this collection Non-required: Any number of rows after the header can be filled Value type is string This is the display description for this collection in the portal

Clone this wiki locally