diff --git a/pages/faq.md b/pages/faq.md index 6d693a9..8e12c40 100644 --- a/pages/faq.md +++ b/pages/faq.md @@ -5,6 +5,10 @@ permalink: /faq/ toc: true --- + +[camtrapdp]: https://inbo.github.io/camtrapdp/ +[frictionless-py]: https://framework.frictionlessdata.io/ + {:id="bboxes"} ## How to describe bounding boxes of detected objects? @@ -36,7 +40,7 @@ There are two ways to include additional information (values not covered by the ### Using tags -Deployment and observation tables include [`deploymentTags`](/data/#deployments.deploymentTags) and [`observationTags`](/data/#observations.observationTags) fields. You can use these fields to store additional information as key:value pairs, separated by a pipe character (|). For example, this is how temperature and snow cover information could be represented in the deployment table: +Deployment and observation tables include [`deploymentTags`](/data/#deployments.deploymentTags) and [`observationTags`](/data/#observations.observationTags) fields. You can use these fields to store additional information as key:value pairs, separated by a pipe character (`|`). For example, this is how temperature and snow cover information could be represented in the deployment table: deploymentID | deploymentTags --- | --- @@ -51,50 +55,50 @@ You can add a custom table to the data package to store additional information. ```json { - "name": "deployment-measurements", - "title": "Deployment measurements", - "description": "Table with weather measurements for deployments. Associated with deployments (`deploymentID`).", - "fields": [ - { - "name": "deploymentID", - "description": "Identifier of the deployment. Foreign key to `deployments.deploymentID`.", - "skos:broadMatch": "http://rs.tdwg.org/dwc/terms/parentEventID", - "type": "string", - "constraints": { - "required": true - }, - "example": "dep1" - }, - { - "name": "temperature", - "description": "Temperature (in Celsius) at the time of the observation.)", - "type": "number", - "constraints": { - "required": false, - "minimum": -50, - "maximum": 100 - }, - "example": 19.5 - }, - { - "name": "snowCover", - "description": "Snow cover present at the time of the observation.", - "type": "boolean", - "constraints": { - "required": false - }, - "example": true - } - ], - "foreignKeys": [ - { - "fields": "deploymentID", - "reference": { - "resource": "deployments", - "fields": "deploymentID" - } - } - ] + "name": "deployment-measurements", + "title": "Deployment measurements", + "description": "Table with weather measurements for deployments. Associated with deployments (`deploymentID`).", + "fields": [ + { + "name": "deploymentID", + "description": "Identifier of the deployment. Foreign key to `deployments.deploymentID`.", + "skos:broadMatch": "http://rs.tdwg.org/dwc/terms/parentEventID", + "type": "string", + "constraints": { + "required": true + }, + "example": "dep1" + }, + { + "name": "temperature", + "description": "Temperature (in Celsius) at the time of the observation.)", + "type": "number", + "constraints": { + "required": false, + "minimum": -50, + "maximum": 100 + }, + "example": 19.5 + }, + { + "name": "snowCover", + "description": "Snow cover present at the time of the observation.", + "type": "boolean", + "constraints": { + "required": false + }, + "example": true + } + ], + "foreignKeys": [ + { + "fields": "deploymentID", + "reference": { + "resource": "deployments", + "fields": "deploymentID" + } + } + ] } ``` @@ -120,6 +124,55 @@ We provide an [R package](https://inbo.github.io/camtrapdp/) to read and manipul Consult the merge function documentation to understand exactly how specific fields are merged to avoid information loss. Please note that when merging data packages x and y, the [`project$samplingDesign`](/metadata/#project.samplingDesign) field in the resulting package will be set to the value of `project$samplingDesign` from data package x. Therefore, we recommend merging data packages only for projects that use the same sampling design. +{:id="large-tables"} +## Do I need to use CSV files? + +No. Some studies have media and observations tables with over a million records, which may be hard to produce or consume as CSV files. Here are two approaches for formatting large files: + +### gzipped CSV files + +By compressing a CSV file, you can often reduce its size by a factor. We recommend gzip over zip, as it allows direct file reading. Compressed CSV files are supported in all versions of Camtrap DP, by [frictionless-py][frictionless-py] and the [camtrapdp][camtrapdp] R package. + +1. Compress the file: + + ``` + gzip media.csv + ``` + +2. Refer to the compressed CSV file in the `datapackage.json` as follows: + + ```json + { + "name": "media", + "path": "media.csv.gz", + "profile": "tabular-data-resource", + "format": "csv", + "mediatype": "text/csv", + "encoding": "UTF-8", + "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.2/media-table-schema.json" + } + ``` + +### Apache parquet + +[Apache Parquet](https://parquet.apache.org/) is an open source data file format, designed for efficient data storage and retrieval. Parquet files are supported in Camtrap DP 1.0.2, by the [frictionless-py][frictionless-py] after installing an [extension](https://framework.frictionlessdata.io/docs/formats/parquet.html), but **not by the [camtrapdp][camtrapdp] R package** (as it is not yet supported by [its dependency](https://github.com/frictionlessdata/frictionless-r/issues/117)). + +1. Create the parquet file (e.g. with the arrow R package). + +2. Refer to the parquet file in the `datapackage.json` as follows: + + ```json + { + "name": "media", + "path": "media.parquet", + "profile": "tabular-data-resource", + "format": "parquet", + "mediatype": "application/vnd.apache.parquet", + "encoding": "UTF-8", + "schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.2/media-table-schema.json" + } + ``` + {:id="ask"} ## Have a question?