Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
143 changes: 98 additions & 45 deletions pages/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ permalink: /faq/
toc: true
---

<!-- References -->
[camtrapdp]: https://inbo.github.io/camtrapdp/
[frictionless-py]: https://framework.frictionlessdata.io/

{:id="bboxes"}
## How to describe bounding boxes of detected objects?

Expand Down Expand Up @@ -36,7 +40,7 @@ There are two ways to include additional information (values not covered by the

### Using tags

Deployment and observation tables include [`deploymentTags`](/data/#deployments.deploymentTags) and [`observationTags`](/data/#observations.observationTags) fields. You can use these fields to store additional information as key:value pairs, separated by a pipe character (|). For example, this is how temperature and snow cover information could be represented in the deployment table:
Deployment and observation tables include [`deploymentTags`](/data/#deployments.deploymentTags) and [`observationTags`](/data/#observations.observationTags) fields. You can use these fields to store additional information as key:value pairs, separated by a pipe character (`|`). For example, this is how temperature and snow cover information could be represented in the deployment table:

deploymentID | deploymentTags
--- | ---
Expand All @@ -51,50 +55,50 @@ You can add a custom table to the data package to store additional information.

```json
{
"name": "deployment-measurements",
"title": "Deployment measurements",
"description": "Table with weather measurements for deployments. Associated with deployments (`deploymentID`).",
"fields": [
{
"name": "deploymentID",
"description": "Identifier of the deployment. Foreign key to `deployments.deploymentID`.",
"skos:broadMatch": "http://rs.tdwg.org/dwc/terms/parentEventID",
"type": "string",
"constraints": {
"required": true
},
"example": "dep1"
},
{
"name": "temperature",
"description": "Temperature (in Celsius) at the time of the observation.)",
"type": "number",
"constraints": {
"required": false,
"minimum": -50,
"maximum": 100
},
"example": 19.5
},
{
"name": "snowCover",
"description": "Snow cover present at the time of the observation.",
"type": "boolean",
"constraints": {
"required": false
},
"example": true
}
],
"foreignKeys": [
{
"fields": "deploymentID",
"reference": {
"resource": "deployments",
"fields": "deploymentID"
}
}
]
"name": "deployment-measurements",
"title": "Deployment measurements",
"description": "Table with weather measurements for deployments. Associated with deployments (`deploymentID`).",
"fields": [
{
"name": "deploymentID",
"description": "Identifier of the deployment. Foreign key to `deployments.deploymentID`.",
"skos:broadMatch": "http://rs.tdwg.org/dwc/terms/parentEventID",
"type": "string",
"constraints": {
"required": true
},
"example": "dep1"
},
{
"name": "temperature",
"description": "Temperature (in Celsius) at the time of the observation.)",
"type": "number",
"constraints": {
"required": false,
"minimum": -50,
"maximum": 100
},
"example": 19.5
},
{
"name": "snowCover",
"description": "Snow cover present at the time of the observation.",
"type": "boolean",
"constraints": {
"required": false
},
"example": true
}
],
"foreignKeys": [
{
"fields": "deploymentID",
"reference": {
"resource": "deployments",
"fields": "deploymentID"
}
}
]
}
```

Expand All @@ -120,6 +124,55 @@ We provide an [R package](https://inbo.github.io/camtrapdp/) to read and manipul

Consult the merge function documentation to understand exactly how specific fields are merged to avoid information loss. Please note that when merging data packages x and y, the [`project$samplingDesign`](/metadata/#project.samplingDesign) field in the resulting package will be set to the value of `project$samplingDesign` from data package x. Therefore, we recommend merging data packages only for projects that use the same sampling design.

{:id="large-tables"}
## Do I need to use CSV files?

No. Some studies have media and observations tables with over a million records, which may be hard to produce or consume as CSV files. Here are two approaches for formatting large files:

### gzipped CSV files

By compressing a CSV file, you can often reduce its size by a factor. We recommend gzip over zip, as it allows direct file reading. Compressed CSV files are supported in all versions of Camtrap DP, by [frictionless-py][frictionless-py] and the [camtrapdp][camtrapdp] R package.

1. Compress the file:

```
gzip media.csv
```

2. Refer to the compressed CSV file in the `datapackage.json` as follows:

```json
{
"name": "media",
"path": "media.csv.gz",
"profile": "tabular-data-resource",
"format": "csv",
"mediatype": "text/csv",
"encoding": "UTF-8",
"schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.2/media-table-schema.json"
}
```

### Apache parquet

[Apache Parquet](https://parquet.apache.org/) is an open source data file format, designed for efficient data storage and retrieval. Parquet files are supported in Camtrap DP 1.0.2, by the [frictionless-py][frictionless-py] after installing an [extension](https://framework.frictionlessdata.io/docs/formats/parquet.html), but **not by the [camtrapdp][camtrapdp] R package** (as it is not yet supported by [its dependency](https://github.com/frictionlessdata/frictionless-r/issues/117)).

1. Create the parquet file (e.g. with the arrow R package).

2. Refer to the parquet file in the `datapackage.json` as follows:

```json
{
"name": "media",
"path": "media.parquet",
"profile": "tabular-data-resource",
"format": "parquet",
"mediatype": "application/vnd.apache.parquet",
"encoding": "UTF-8",
"schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.2/media-table-schema.json"
}
```

{:id="ask"}
## Have a question?

Expand Down