-
Notifications
You must be signed in to change notification settings - Fork 7
TableInfo: file.tsv
Amanda Charbonneau edited this page Mar 4, 2021
·
35 revisions
The file table will contain one row per file in your Program
| Field | Field Description | Required? | Attributes | Extra Info |
|---|---|---|---|---|
| id_namespace | A CFDE-cleared identifier representing the top-level data space containing this file [part 1 of 2-component composite primary key] | Required | Value type is string |
id_namespace is the unique identifier for your program, or some subset of your program, that identifies it as your data. In the simplest case, your program would use the exact same value for the id_namespace column in every row for every table. More complex Programs may choose to use multiple namespaces. id_namespaces should all be listed in the primary_dcc_contact table
|
| local_id | An identifier representing this file, unique within this id_namespace [part 2 of 2-component composite primary key] | Required | The value in each row must be different for a given namespace Value type is string |
Each individual file needs a unique local_id value (every row should be different). If every file has a unique name you can use the filename for the local_id (and repeat it in the filename field). If your system does not have unique names for every file, an easy way to generate a unique local_id for every file is to use your path as the local_id. e.g. /Users/amanda/DCCData/Study_1/QualityControl.txt. The local_id column appears in many tables but values should not be repeated across tables. e.g. 'file' local_id is a seperate concept from 'biosample' local_id. If your program is using a single id_namespace, then every value for local_id across all tables should be unique. |
| project_id_namespace | The id_namespace of the primary project within which this file was created [part 1 of 2-component composite foreign key] | Required | Value type is string | If you have not implemented multiple namespaces, this will be the same as id_namespace. |
| project_local_id | The local_id of the primary project within which this file was created [part 2 of 2-component composite foreign key] | Required | Value can be any string | For each row (each file), this will be the value of 'local_id' in the project table for the project this file came from |
| persistent_id | A persistent, resolvable (not nec. retrievable) URI generated by a DCC (using, e.g. our minid server) and attached to this file | Non-required: Any number of rows after the header can be filled | The value in each row must be different Value type is string |
Meant to serve as a permanent address to which landing pages (which summarize metadata associated with this file) and other relevant annotations and functions can optionally be attached, including information enabling resolution to a network location from which the file can be downloaded. Actual network locations must not be embedded directly within this identifier: one level of indirection is required in order to protect persistent_id values from changes in network location over time as files are moved around. |
| creation_time | An ISO 8601 -; RFC 3339 (subset)-compliant timestamp documenting this file's creation time: YYYY-MM-DDTHH:MM:SS±NN:NN | Non-required: Any number of rows after the header can be filled | Value must be datetime | Example valid dates:2021-01-082021-01-08T00:45:40Z2021-01-08T00:45:40+00:00
|
| size_in_bytes | The size of this file in bytes | Non-required: Any number of rows after the header can be filled | Value type is integer | do not include decimal places or decimal sign .
|
| uncompressed_size_in_bytes | The total decompressed size in bytes of the contents of this file | Non-required: Any number of rows after the header can be filled | Value type is integer | do not include decimal places or decimal sign .
|
| sha256 | (preferred) SHA-256 checksum for this file [sha256, md5 cannot both be null] | Either this field OR md5 must be populated | Value should be HSA-256 hash or nothing; If you have both please use sha256 | You may populate both md5 and sha256 for a given row, but only one is required |
| md5 | (allowed) MD5 checksum for this file [sha256, md5 cannot both be null] | Either this field OR sha256 must be populated | Value should be MD5 hash or nothing; If you have both, please use sha256. | You may populate both md5 and sha256 for a given row but only one is required |
| filename | A filename with no prepended PATH information. | Non-required: Any number of rows after the header can be filled | Value type is string | Filenames do not need to be unique. Uniqueness is ensured by the local_id |
| file_format | An EDAM CV term ID identifying the digital format of this file (e.g. TSV or FASTQ) | Non-required: Any number of rows after the header can be filled | Value must be a valid EDAM ID |
EDAM format lookup Example valid EDAM IDs: format:1930format:3712format:2310
|
| data_type | An EDAM CV term ID identifying the type of information stored in this file (e.g. RNA sequence reads) | Non-required: Any number of rows after the header can be filled | Value must be a valid EDAM ID |
EDAM data type lookup Example valid EDAM IDs: data:2044data:2050data:2082
|
| assay_type | An OBI CV term ID describing the type of experiment that generated the results summarized by this file | Non-required: Any number of rows after the header can be filled | Value must be a valid OBI ID |
OBI lookup service Example valid OBI IDs: OBI:0000366OBI:0001177OBI:0002763
|
| mime_type | A MIME type describing this file | Non-required: Any number of rows after the header can be filled | Value must be a valid MIME type |
Common MIME types Tutorial for bulk MIME type identification Example valid MIME types: image/jpegtext/htmlapplication/octet-stream
|
-
Tutorials
-
C2M2 Table Guide
-
Table Summary
- analysis_type.tsv
- anatomy.tsv
- assay_type.tsv
- biofluid.tsv
- biosample.tsv
- biosample_disease.tsv
- biosample_from_subject.tsv
- biosample_gene.tsv
- biosample_in_collection.tsv
- biosample_protein.tsv
- biosample_ptm.tsv
- biosample_substance.tsv
- collection.tsv
- collection_anatomy.tsv
- collection_biofluid.tsv
- collection_compound.tsv
- collection_defined_by_project.tsv
- collection_disease.tsv
- collection_gene.tsv
- collection_in_collection.tsv
- collection_phenotype.tsv
- collection_protein.tsv
- collection_ptm.tsv
- collection_substance.tsv
- collection_taxonomy.tsv
- compound.tsv
- data_type.tsv
- dcc.tsv (formerly
primary_dcc_contact.tsv - disease.tsv
- domain_location.tsv
- file.tsv
- file_describes_biosample.tsv
- file_describes_collection.tsv
- file_describes_subject.tsv
- file_format.tsv
- file_in_collection.tsv
- gene.tsv
- id_namespace.tsv
- ncbi_taxonomy.tsv
- phenotype.tsv
- phenotype_disease.tsv
- phenotype_gene.tsv
- project.tsv
- project_in_project.tsv
- protein.tsv
- protein_gene.tsv
- ptm.tsv
- ptm_type.tsv
- ptm_subtype.tsv
- sample_prep_method.tsv
- subject.tsv
- subject_disease.tsv
- subject_in_collection.tsv
- subject_phenotype.tsv
- subject_race.tsv
- subject_role_taxonomy.tsv
- subject_substance.tsv
- substance.tsv
- Reference Tables
-
Table Summary