Skip to content

Commit d203f98

Browse files
Karolina KuczkowskaKarolina Kuczkowska
authored andcommitted
added FAQ entry about using Parquet files for data
1 parent 82c707e commit d203f98

File tree

1 file changed

+41
-1
lines changed

1 file changed

+41
-1
lines changed

pages/faq.md

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ There are two ways to include additional information (values not covered by the
3636

3737
### Using tags
3838

39-
Deployment and observation tables include [`deploymentTags`](/data/#deployments.deploymentTags) and [`observationTags`](/data/#observations.observationTags) fields. You can use these fields to store additional information as key:value pairs, separated by a pipe character (|). For example, this is how temperature and snow cover information could be represented in the deployment table:
39+
Deployment and observation tables include [`deploymentTags`](/data/#deployments.deploymentTags) and [`observationTags`](/data/#observations.observationTags) fields. You can use these fields to store additional information as key:value pairs, separated by a pipe character (|). For example, this is how temperature and snow cover information could be represented in the deployment table:
4040

4141
deploymentID | deploymentTags
4242
--- | ---
@@ -120,6 +120,46 @@ We provide an [R package](https://inbo.github.io/camtrapdp/) to read and manipul
120120

121121
Consult the merge function documentation to understand exactly how specific fields are merged to avoid information loss. Please note that when merging data packages x and y, the [`project$samplingDesign`](/metadata/#project.samplingDesign) field in the resulting package will be set to the value of `project$samplingDesign` from data package x. Therefore, we recommend merging data packages only for projects that use the same sampling design.
122122

123+
{:id="parquet"}
124+
## Can I use Parquet format instead of CSV for very large tables (>1M rows)?
125+
126+
[Apache Parquet](https://parquet.apache.org/) is an open source data file format, designed for efficient data storage and retrieval. `"mediatype": "application/vnd.apache.parquet"` is a [registered media type](https://www.iana.org/assignments/media-types/application/vnd.apache.parquet).
127+
128+
Frictionless framework can be used to read and write Parquet files after installing an [extension](https://framework.frictionlessdata.io/docs/formats/parquet.html).
129+
As of Camtrap DP [1.0.2](https://github.com/tdwg/camtrap-dp/releases/tag/1.0.2), the standard supports using Parquet files for storing data. This is an example of the `resources` section of the package metadata, adapted for using Parquet format files:
130+
131+
```
132+
"resources": [
133+
{
134+
"name": "deployments",
135+
"type": "table",
136+
"profile": "tabular-data-resource",
137+
"path": "deployments.parquet",
138+
"format": "parquet",
139+
"mediatype": "application/vnd.apache.parquet",
140+
"schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/deployments-table-schema.json"
141+
},
142+
{
143+
"name": "media",
144+
"type": "table",
145+
"profile": "tabular-data-resource",
146+
"path": "media.parquet",
147+
"format": "parquet",
148+
"mediatype": "application/vnd.apache.parquet",
149+
"schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/media-table-schema.json"
150+
},
151+
{
152+
"name": "observations",
153+
"type": "table",
154+
"profile": "tabular-data-resource",
155+
"path": "observations.parquet",
156+
"format": "parquet",
157+
"mediatype": "application/vnd.apache.parquet",
158+
"schema": "https://raw.githubusercontent.com/tdwg/camtrap-dp/1.0.1/observations-table-schema.json"
159+
}
160+
],
161+
```
162+
123163
{:id="ask"}
124164
## Have a question?
125165

0 commit comments

Comments
 (0)