You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## How to describe bounding boxes of detected objects?
10
14
@@ -120,45 +124,54 @@ We provide an [R package](https://inbo.github.io/camtrapdp/) to read and manipul
120
124
121
125
Consult the merge function documentation to understand exactly how specific fields are merged to avoid information loss. Please note that when merging data packages x and y, the [`project$samplingDesign`](/metadata/#project.samplingDesign) field in the resulting package will be set to the value of `project$samplingDesign` from data package x. Therefore, we recommend merging data packages only for projects that use the same sampling design.
122
126
123
-
{:id="parquet"}
124
-
## Can I use Parquet format instead of CSV for very large tables (>1M rows)?
127
+
{:id="large-tables"}
128
+
## Do I need to use CSV files?
125
129
126
-
[Apache Parquet](https://parquet.apache.org/) is an open source data file format, designed for efficient data storage and retrieval. `"mediatype": "application/vnd.apache.parquet"` is a [registered media type](https://www.iana.org/assignments/media-types/application/vnd.apache.parquet).
130
+
No. Some studies have media and observations tables with over a million records, which may be hard to produce or consume as CSV files. Here are two approaches for formatting large files:
127
131
128
-
Frictionless framework can be used to read and write Parquet files after installing an [extension](https://framework.frictionlessdata.io/docs/formats/parquet.html).
129
-
As of Camtrap DP [1.0.2](https://github.com/tdwg/camtrap-dp/releases/tag/1.0.2), the standard supports using Parquet files for storing data. This is an example of the `resources` section of the package metadata, adapted for using Parquet format files:
By compressing a CSV file, you can often reduce its size by a factor. We recommend gzip over zip, as it allows direct file reading. Compressed CSV files are supported in all versions of Camtrap DP, by [frictionless-py][frictionless-py] and the [camtrapdp][camtrapdp] R package.
135
+
136
+
1. Compress the file:
137
+
138
+
```
139
+
gzip media.csv
140
+
```
141
+
142
+
2. Refer to the compressed CSV file in the `datapackage.json` as follows:
[Apache Parquet](https://parquet.apache.org/) is an open source data file format, designed for efficient data storage and retrieval. Parquet files are supported in Camtrap DP 1.0.2, by the [frictionless-py][frictionless-py] after installing an [extension](https://framework.frictionlessdata.io/docs/formats/parquet.html), but **not by the [camtrapdp][camtrapdp] R package** (as it is not yet supported by [its dependency](https://github.com/frictionlessdata/frictionless-r/issues/117)).
159
+
160
+
1. Create the parquet file (e.g. with the arrow R package).
161
+
162
+
2. Refer to the parquet file in the `datapackage.json` as follows:
0 commit comments