Skip to content

Commit d49b889

Browse files
committed
Collection-level updates
1 parent 02576c6 commit d49b889

File tree

10 files changed

+92
-61
lines changed

10 files changed

+92
-61
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,11 +11,13 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
1111

1212
- Property `category`
1313
- Property `determination_details`
14+
- Information about the encoding of datatypes at the collection-level
1415

1516
### Changed
1617

1718
- Switched from v0.1.0 to v0.2.0 of the schema language
1819
- Renamed `fiboa_extensions` to `schemas`
20+
- GeoJSON: Switched `contentEncoding` for data type `binary` from `binary` to `base64`
1921

2022
### Deprecated
2123

best-practices/README.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,10 @@
77

88
All properties should be using snake case.
99
For example a field for a land-use class should be named `landuse_class` instead of `landuseClass`.
10+
11+
## Extension prefixes
12+
13+
All properties in an extensions should have a common prefix.
14+
Extensions commonly use the colon (`:`) as separator between prefix and property name, e.g. `crop:name`.
15+
A single underscore (`_`) should be avoided to avoid conflicts with other property names (see [Casing](#casing)).
16+
Nevertheless, the separator can be chosen freely by extension authors.

core/README.md

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,16 @@ This allows to define a clear mapping between the core specification and its enc
2323
- [Data types](https://github.com/fiboa/schema/blob/v0.2.0/datatypes.md)
2424
- [Vocabulary](https://github.com/fiboa/schema/blob/v0.2.0/README.md#vocabulary)
2525

26+
## Collections
27+
28+
A Collection is a group of one or more features with a unique identifier (see property `collection`).
29+
30+
Each collection must have a single set of applicable schemas.
31+
32+
Any property that consists of the same value across all features can be de-duplicated to the collection-level
33+
if more than two features are available for the collection.
34+
The specific location and behaviour of collection-level data is specified in the encoding-specific specifications.
35+
2636
## General Properties
2737

2838
| Property Name | Data Type | Description |
@@ -33,10 +43,10 @@ This allows to define a clear mapping between the core specification and its enc
3343
| category | array\<string> | A set of categories the field boundary belongs to. |
3444

3545
**schemas:** The schemas the collection implements. Must be URLs to the schema YAML files.
36-
3746
The schema for this specification (see above) is required to be provided.
3847

39-
**collection:** The collection identifier is usually only needed for merged datasets.
48+
**collection:** The collection identifier is usually only needed for merged datasets and it is **required** in this case.
49+
A validatior can't check whether the `collection` property is required, the data providers must ensure this.
4050

4151
**category:** Choose any (unique) combination of the following values:
4252

core/schema/schema.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ properties:
1010
type: string
1111
format: uri
1212
contains:
13+
type: string
1314
enum:
1415
- https://fiboa.github.io/specification/v0.2.0/schema.yaml
1516
id:

geojson/README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,18 @@ The generic GeoJSON format is defined in
1414

1515
## FeatureCollection
1616

17-
A FeatureCollection may have a top-level property named `fiboa`.
18-
If present, it contains all properties that are common across the features.
19-
In validation they must be copied to the `properties` in each Feature.
17+
A FeatureCollection may have a top-level property named `fiboa` to contain all collection-level data.
18+
If present, it contains all properties that are common across the features
19+
and the features shall not contain those properties.
20+
Validation must ensure that the collection-level properties are taken into account.
2021
All features in a FeatureCollection must be fiboa-compliant.
2122

23+
The following properties can't be collection-level properties:
24+
25+
- `id`
26+
- `geometry`
27+
- `bbox`
28+
2229
## Feature
2330

2431
Each [fiboa Feature](../core/README.md#features) must be a valid
@@ -42,7 +49,7 @@ The following properties are defined for a GeoJSON Feature (at the top-level of
4249
4350
### `properties`
4451

45-
Must include any property that is required by the fiboa core specification (currently `fiboa_version`).
52+
Must include any property that is required by the fiboa core specification.
4653
May include any additional property.
4754
All properties defined by the core specification (except for `id`, `geometry` and `bbox`) or extensions should be provided here.
4855

geojson/datatypes.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,28 @@
33
The following table shows the data types that are used by fiboa in the Property definitions.
44
It also shows the mapping to the GeoJSON data types.
55

6-
| fiboa data type | (Geo)JSON |
7-
| --------------------------------------------------- | ------------------------------------------------------------ |
8-
| boolean | boolean |
9-
| int8 | integer<br />minimum: -128<br />maximum: 127 |
10-
| uint8 | integer<br />minimum: 0<br />maximum: 255 |
11-
| int16 | integer<br />minimum: -32768<br />maximum: 32767 |
12-
| uint16 | integer<br />minimum: 0<br />maximum: 65535 |
13-
| int32 | integer<br />minimum: -2147483648<br />maximum: 2147483647 |
14-
| uint32 | integer<br />minimum: 0<br />maximum: 4294967295 |
15-
| int64 | integer<br />minimum: -9223372036854775808<br />maximum: 9223372036854775807 |
16-
| uint64 | integer<br />minimum: 0<br />maximum: 18446744073709551615 |
17-
| float<br />IEEE 32-bit | number<br />minimum: ?<br />maximum: ? |
18-
| double<br />IEEE 64-bit | number<br />minimum: ?<br />maximum: ? |
19-
| binary | string<br />contentEncoding: binary |
20-
| string<br />charset: UTF-8 | string |
21-
| array | array |
22-
| object<br />keys: string<br />values: any | object<br />additionalProperties: false |
23-
| date | string<br />format: date |
24-
| date-time<br />with milliseconds<br />timezone: UTC | string<br />format: date-time<br />pattern: Z$ |
25-
| geometry | [object with schema](https://geojson.org/schema/Geometry.json) |
26-
| bounding-box<br />x and y only, no z | array<br />minItems: 4<br />maxItems: 4<br />items: number |
27-
| *required* (not a datatype) | null |
6+
| fiboa data type | (Geo)JSON | Collection-level |
7+
| --------------------------------------------------- | ------------------------------------------------------------ | ---------------- |
8+
| boolean | boolean | yes |
9+
| int8 | integer<br />minimum: -128<br />maximum: 127 | yes |
10+
| uint8 | integer<br />minimum: 0<br />maximum: 255 | yes |
11+
| int16 | integer<br />minimum: -32768<br />maximum: 32767 | yes |
12+
| uint16 | integer<br />minimum: 0<br />maximum: 65535 | yes |
13+
| int32 | integer<br />minimum: -2147483648<br />maximum: 2147483647 | yes |
14+
| uint32 | integer<br />minimum: 0<br />maximum: 4294967295 | yes |
15+
| int64 | integer<br />minimum: -9223372036854775808<br />maximum: 9223372036854775807 | yes |
16+
| uint64 | integer<br />minimum: 0<br />maximum: 18446744073709551615 | yes |
17+
| float<br />IEEE 32-bit | number<br />minimum: ?<br />maximum: ? | yes |
18+
| double<br />IEEE 64-bit | number<br />minimum: ?<br />maximum: ? | yes |
19+
| binary | string<br />contentEncoding: base64 | yes |
20+
| string<br />charset: UTF-8 | string | yes |
21+
| array | array | yes |
22+
| object<br />keys: string<br />values: any | object<br />additionalProperties: false | yes |
23+
| date | string<br />format: date | yes |
24+
| date-time<br />with milliseconds<br />timezone: UTC | string<br />format: date-time<br />pattern: Z$ | yes |
25+
| geometry | [object with schema](https://geojson.org/schema/Geometry.json) | no |
26+
| bounding-box<br />x and y only, no z | array<br />minItems: 4<br />maxItems: 4<br />items: number | no |
27+
| *if a property is not required* | null | yes |
2828

2929
## Potential issues in conversion
3030

geojson/examples/featurecollection/features.json

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
{
22
"fiboa": {
3-
"fiboa_extensions": [
3+
"schemas": [
44
"https://fiboa.github.io/specification/v0.2.0/schema.yaml",
55
"https://fiboa.github.io/inspire-extension/v0.2.0/schema.yaml",
6-
"https://fiboa.github.io/flik-extension/v0.1.0/schema.yaml"
6+
"https://fiboa.github.io/flik-extension/v0.1.0/schema.yaml",
7+
"https://fiboa.github.io/crop-extension/v0.1.0/schema.yaml"
78
],
89
"collection": "de_nrw",
910
"license": "dl-de/by-2-0",
@@ -18,8 +19,8 @@
1819
"inspire:id": "https://geodaten.nrw.de/id/inspire-lc-fb/landcoverunit/12324",
1920
"flik": "DENWLI0542130247",
2021
"determination_datetime": "2005-02-28T00:00:00Z",
21-
"nutz_code": "A",
22-
"nutz_txt": "Ackerland",
22+
"crop:code": "A",
23+
"crop:name": "Ackerland",
2324
"area": 1.631100058555603
2425
},
2526
"geometry": {
@@ -86,9 +87,9 @@
8687
"properties": {
8788
"inspire:id": "https://geodaten.nrw.de/id/inspire-lc-fb/landcoverunit/2713",
8889
"flik": "DENWLI0540210084",
89-
"determination_datetime": "2005-02-28T00:00:00Z",
90-
"nutz_code": "A",
91-
"nutz_txt": "Ackerland",
90+
"determination_datetime": "2005-02-22T00:00:00Z",
91+
"crop:code": "W",
92+
"crop:name": "Weide",
9293
"area": 1.8975000381469727
9394
},
9495
"geometry": {

geojson/examples/individual-features/2713.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"id": "2713",
33
"type": "Feature",
44
"properties": {
5-
"fiboa_extensions": [
5+
"schemas": [
66
"https://fiboa.github.io/specification/v0.2.0/schema.yaml",
77
"https://fiboa.github.io/flik-extension/v0.1.0/schema.yaml"
88
],

geoparquet/README.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,6 @@ We aim to support any future version of GeoParquet, too.
1717
The GeoParquet file must embed the collection-level metadata
1818
in the Parquet metadata in a property named `fiboa`.
1919

20-
It is recommended to additionally provide the fiboa Collection as a separate JSON file, too.
21-
2220
## Features
2321

2422
Each [fiboa Feature](../core/README.md#features) corresponds to a row in a GeoParquet file.
@@ -31,3 +29,8 @@ i.e. the column can be missing from the GeoParquet file.
3129

3230
The mapping between the Parquet data types and the fiboa data types, can be found in the
3331
[data type mapping](datatypes.md).
32+
33+
## Best practices
34+
35+
For data with a lot of repetition, brotli compression is recommended.
36+
This applies particularly for merged datasets that don't deduplicate properties to the collection-level.

geoparquet/datatypes.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3,28 +3,28 @@
33
The following table shows the data types that are used by fiboa in the Property definitions.
44
It also shows the mapping to the GeoParquet data types.
55

6-
| fiboa Schema data type | (Geo)Parquet |
7-
| --------------------------------------------------- | ------------------------------------------------------------ |
8-
| boolean | BOOLEAN |
9-
| int8 | IntType<br />bitWidth: 8<br />isSigned: true<br />(deprecated: INT_8) |
10-
| uint8 | IntType<br />bitWidth: 8<br />isSigned: false<br />(deprecated: UINT_8) |
11-
| int16 | IntType<br />bitWidth: 16<br />isSigned: true<br />(deprecated: INT_16) |
12-
| uint16 | IntType<br />bitWidth: 16<br />isSigned: false<br />(deprecated: UINT_16) |
13-
| int32 | IntType<br />bitWidth: 32<br />isSigned: true<br />(deprecated: INT_32) |
14-
| uint32 | IntType<br />bitWidth: 64<br />isSigned: false<br />(deprecated: UINT_32) |
15-
| int64 | IntType<br />bitWidth: 64<br />isSigned: true<br />(deprecated: INT_64) |
16-
| uint64 | IntType<br />bitWidth: 64<br />isSigned: false<br />(deprecated: UINT_64) |
17-
| float<br />IEEE 32-bit | FLOAT |
18-
| double<br />IEEE 64-bit | DOUBLE |
19-
| binary | BYTE_ARRAY |
20-
| string<br />charset: UTF-8 | STRING (BYTE_ARRAY) |
21-
| array | LIST |
22-
| object<br />keys: string<br />values: any | STRUCT / MAP |
23-
| date | DATE (INT32) |
24-
| date-time<br />with milliseconds<br />timezone: UTC | TimestampType (INT64)<br />isAdjustedToUTC: true<br />unit: MILLIS<br />(deprecated: TIMESTAMP_MILLIS) |
25-
| geometry | BYTE_ARRAY<br />encoded as WKB |
26-
| bounding-box<br />x and y only, no z | STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT) |
27-
| *if a field is not required* | [Nullity](https://parquet.apache.org/docs/file-format/nulls/) |
6+
| fiboa Schema data type | (Geo)Parquet | Collection-level |
7+
| --------------------------------------------------- | ------------------------------------------------------------ | ------------------------------- |
8+
| boolean | BOOLEAN | yes |
9+
| int8 | IntType<br />bitWidth: 8<br />isSigned: true<br />(deprecated: INT_8) | yes |
10+
| uint8 | IntType<br />bitWidth: 8<br />isSigned: false<br />(deprecated: UINT_8) | yes |
11+
| int16 | IntType<br />bitWidth: 16<br />isSigned: true<br />(deprecated: INT_16) | yes |
12+
| uint16 | IntType<br />bitWidth: 16<br />isSigned: false<br />(deprecated: UINT_16) | yes |
13+
| int32 | IntType<br />bitWidth: 32<br />isSigned: true<br />(deprecated: INT_32) | yes |
14+
| uint32 | IntType<br />bitWidth: 64<br />isSigned: false<br />(deprecated: UINT_32) | yes |
15+
| int64 | IntType<br />bitWidth: 64<br />isSigned: true<br />(deprecated: INT_64) | yes |
16+
| uint64 | IntType<br />bitWidth: 64<br />isSigned: false<br />(deprecated: UINT_64) | yes |
17+
| float<br />IEEE 32-bit | FLOAT | yes |
18+
| double<br />IEEE 64-bit | DOUBLE | yes |
19+
| binary | BYTE_ARRAY | as string, base64-encoded |
20+
| string<br />charset: UTF-8 | STRING (BYTE_ARRAY) | yes |
21+
| array | LIST | yes |
22+
| object<br />keys: string<br />values: any | STRUCT / MAP | yes |
23+
| date | DATE (INT32) | as string, compliant to ISO8601 |
24+
| date-time<br />with milliseconds<br />timezone: UTC | TimestampType (INT64)<br />isAdjustedToUTC: true<br />unit: MILLIS<br />(deprecated: TIMESTAMP_MILLIS) | as string, compliant to ISO8601 |
25+
| geometry | BYTE_ARRAY<br />encoded as WKB | no |
26+
| bounding-box<br />x and y only, no z | STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT) | no |
27+
| *if a property is not required* | [Nullity](https://parquet.apache.org/docs/file-format/nulls/) | yes |
2828

2929
The integer data types and the data type string can also be mapped to the ENUM data type in Parquet
3030
if a pre-defined set of values is available.
@@ -45,4 +45,4 @@ The following data types occur in Parquet, but are not currently supported in fi
4545

4646
## Potential issues in conversion
4747

48-
- The micro/nanosecond precision of Datetime / Times may got lost
48+
- The micro/nanosecond precision of Datetime / Times may get lost

0 commit comments

Comments
 (0)