diff --git a/.github/workflows/test.yaml b/.github/workflows/test.yaml index 43c1462..b7f9e3c 100644 --- a/.github/workflows/test.yaml +++ b/.github/workflows/test.yaml @@ -3,7 +3,7 @@ on: - push - pull_request jobs: - deploy: + docs: runs-on: ubuntu-latest steps: - uses: actions/setup-python@v5 @@ -15,7 +15,30 @@ jobs: pip install pipenv pipenv install - run: pipenv run test-docs + schema: + runs-on: ubuntu-latest + steps: + - uses: actions/setup-python@v5 + with: + python-version: '>=3.9' + - uses: actions/checkout@v4 + - name: Install pipenv + run: | + pip install pipenv + pipenv install - run: pipenv run test-schema + examples: + runs-on: ubuntu-latest + needs: schema + steps: + - uses: actions/setup-python@v5 + with: + python-version: '>=3.9' + - uses: actions/checkout@v4 + - name: Install pipenv + run: | + pip install pipenv + pipenv install - run: pipenv run test-geojson-features - run: pipenv run test-geojson-collection - - run: pipenv run test-geoparquet \ No newline at end of file + - run: pipenv run test-geoparquet diff --git a/CHANGELOG.md b/CHANGELOG.md index 07ee2b6..77a7c73 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -11,27 +11,34 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0. - Property `category` - Property `determination_details` +- Information about the encoding of datatypes at the collection-level ### Changed -- ... - -### Deprecated - -- ... +- Switched from v0.1.0 to v0.2.0 of the schema language +- Renamed `fiboa_extensions` to `schemas` +- Schemas must be valid HTTP(S) URLs +- GeoParquet: Renamed Parquet metadata key from `fiboa` to `collection` +- GeoJSON: Switched `contentEncoding` for data type `binary` from `binary` to `base64` +- GeoJSON data types: `null` is not allowed any longer, instead omit the property +- GeoJSON FeatureCollection: Collection-level data is provided at the top-level, not in a `fiboa` property ### Removed -- Value `administrative` was removed from `determination_method` in favour of the new property `category` +- Value `administrative` was removed from `determination_method` in favor of the new property `category` +- `fiboa_version` in favor of adding the schema URL of the specification to `schemas`. +- GeoJSON Feature: `links` property ### Fixed - Various minor clarifications and editorial enhancements -- GeoParquet encoding: Properties that are optional can be omitted if all values are null values -- GeoJSON encoding: Clarify the encoding of the top-level properties (including `links` and `fiboa`) -- GeoJSON encoding: Clarify the use of RFC 7946 -- GeoParquet encoding for bounding boxes and objects - Added descriptions to the allowed values for `determination_method` +- Clarified handling of missing values +- GeoJSON: Clarify the encoding of the top-level properties (including `links` and `fiboa`) +- GeoJSON: Clarify the use of RFC 7946 +- GeoParquet: Properties that are optional can be omitted if all values are null values +- GeoParquet: Added encoding for bounding boxes and objects +- GeoParquet: Clarified the use of Map and Struct data types ## [v0.2.0] - 2024-04-10 diff --git a/CITATION.cff b/CITATION.cff index 5685691..b2fe93f 100644 --- a/CITATION.cff +++ b/CITATION.cff @@ -8,7 +8,7 @@ preferred-citation: type: standard title: "Field Boundaries for Agriculture (fiboa) specification" abstract: "Making field boundaries openly available in a unified way." - version: 0.2.0 + version: 0.3.0 year: 2024 date-released: 2024-04-10 license: Apache-2.0 diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 936f0f0..f0bfd14 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -40,6 +40,9 @@ We use pipenv to execute the tests. Start with the following command in the folder where this README is located: `pip install pipenv --user` +Install the dependencies for the test: +`pipenv install` + Finally, you can run the tests as follows: - To check the markdown run: `pipenv run test-docs` diff --git a/README.md b/README.md index 8c2e1e8..546f6f3 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ This repository contains the core specification for fiboa, including the data sc For more context, information on the ecosystem, and points of contact see the [fiboa github organization](https://github.com/fiboa/). -- Version: **0.2.0** +- Version: **0.3.0** > [!IMPORTANT] > The fiboa specification is a work in progress. @@ -29,15 +29,30 @@ The specification in this repository consists of three parts: - [GeoJSON Encoding](geojson/README.md) - [GeoParquet Encoding](geoparquet/README.md) -To completent the specification, there are also best practices and extensions available: +To complement the specification, there are also best practices and extensions available: - [Best Practices](best-practices/README.md) - [Extensions](https://github.com/fiboa/extensions/) -The repository also contains additional information about the project: +## Relation to other standards and working groups -- [Changelog](CHANGELOG.md) -- [Citation Details (as CFF file)](CITATION.cff) +fiboa doesn't aim to reinvent the wheel. +Our aim is to align with existing efforts as much as possible. +Some parts of the specification are already based on the work of other initiatives, +e.g. the determination-related fields in the core specification. + +Related standards and working groups are: + +- [Adapt standard](https://adaptstandard.org), including their [WG17](https://github.com/ADAPT/Standard/issues/97) +- [Varda FieldID](https://www.varda.ag/global-field-id) +- [Deere Boundaries](https://developer.deere.com/dev-docs/boundaries) +- [AgGateway](https://aggateway.org/), including their + [Locking in on Field Boundaries](https://aggateway.org/Portals/1010/WebSite/About%20Us/FIELD%20BOUNDARY%20FLYER%20122123.pdf?ver=2024-01-03-212959-590) initiative + +If you think we are missing relevant work here, we'd love to hear from you. +Please get in touch by [opening an issue](https://github.com/fiboa/specification/issues/new)! + +## Contributing The fiboa community strives to provide a welcoming and transparent environment for all of the project’s participants. You can find additional information about our community best practices and collaborative development processes below: diff --git a/best-practices/README.md b/best-practices/README.md index 3ba9c78..9632de9 100644 --- a/best-practices/README.md +++ b/best-practices/README.md @@ -7,3 +7,10 @@ All properties should be using snake case. For example a field for a land-use class should be named `landuse_class` instead of `landuseClass`. + +## Extension prefixes + +All properties in an extensions should have a common prefix. +Extensions commonly use the colon (`:`) as separator between prefix and property name, e.g. `crop:name`. +A single underscore (`_`) should be avoided to avoid conflicts with other property names (see [Casing](#casing)). +Nevertheless, the separator can be chosen freely by extension authors. diff --git a/core/README.md b/core/README.md index bb6259b..defddf0 100644 --- a/core/README.md +++ b/core/README.md @@ -1,57 +1,88 @@ -# Core Specification +# Core Specification -This specification describes the core data and metadata properties for both at the -Collection and Feature level. +This specification describes the core data and metadata properties that describe a fiboa Feature. +The specification doesn't distinguish between collection-level and feature-level properties, +common definitions are shared across these levels. - A Collection refers to a group of one or more features. - A Feature is a single field geometry with additional properties. -> [!NOTE] -> The Core Specification is still work in progress. Feedback is welcome! +- **Schema:** -- **Schema:** +## Table of Contents -## Schema +- [General Properties](#general-properties) + - [schemas](#schemas) + - [id](#id) + - [collection](#collection) + - [category](#category) +- [Spatial Properties](#spatial-properties) + - [area / perimeter](#area--perimeter) +- [Determination Properties](#determination-properties) + - [determination\_datetime](#determination_datetime) + - [determination\_method](#determination_method) +- [Schema Language](#schema-language) -The data types in the following document are defined in -[fiboa Schema](https://github.com/fiboa/schema), v0.1.0. +## General Properties -fiboa Schema defines a (limited) set of data types and a vocabulary to express -additional constraints for these data types. -This allows to define a clear mapping between the core specification and its encodings. +| Property Name | Data Type | Description | +| ------------- | ------------------------------- | ----------- | +| schemas | object\> | **REQUIRED.** A list of schemas the collection implements. | +| id | string | **REQUIRED.** An identifier for the field. | +| collection | string | **REQUIRED.** The identifier of the collection. | +| category | array\ | A set of categories the field boundary belongs to. | + +### schemas + +The schemas the collection implements. +Each schema must be a valid HTTP(S) URLs to an existing YAML files compliant to fiboa Schema. +The schema for this specification (see above) is required to be provided. + +Each `collection` must have a single set of applicable schemas. +The key of the dictionary must be equal to the value provided for the `collection` property. -- [Data types](https://github.com/fiboa/schema/blob/v0.1.0/datatypes.md) -- [Vocabulary](https://github.com/fiboa/schema/blob/v0.1.0/README.md#vocabulary) +The schema URI for fiboa that is listed above is required to be present. -## Collection +**Example for `schemas`:** -Collection-level metadata must be provided in an object that contains the properties below. -The invidiual encodings may decide to embed the collection or make it available separately. +This describes two collections `abc` and `xyz`. -### Properties +```json +{ + "abc": [ + "https://fiboa.github.io/specification/v0.3.0/schema.yaml" + ], + "xyz": [ + "https://fiboa.github.io/specification/v0.3.0/schema.yaml", + "https://fiboa.github.io/crop-extension/v0.1.0/schema.yaml", + ] +} +``` -| Property Name | Data Type | Description | -| ---------------- | -------------- | ----------- | -| fiboa_version | string | **REQUIRED.** Version number of the fiboa specification this entity implements. | -| fiboa_extensions | array\ | A list of URIs to extensions this entity implements. | +### id -Generally, the version and the extensions must be uniform per Collection. +It must be unique per collection, i.e. `collection` and `id` form a unique identifier. -Other properties are also allowed to be provided, but are not described by this specification. +### collection -## Features +A collection is a group of one or more features with a unique identifier, stored in the `collection` property. -### General Properties +Encodings may support to store properties that consists of the same value across all features at the collection-level. +This de-duplicates data for more efficient resource usage, but only applies if more than two features are available for the collection. +The specific location and behaviour of collection-level data is specified in the encoding-specific specifications. -| Property Name | Data Type | Description | -| ------------- | -------------- | ----------- | -| id | string | **REQUIRED.** A unique identifier for the field. It must be unique within the [Collection](#collection). | -| collection | string | The identifier of the parent collection. | -| category | array\ | A set of categories the field boundary belongs to. | +**Example:** -**collection:** The collection identifier is usually only needed for merged datasets. +You have two different field boundary datasets named `abc` (CC-0 licensed) and `xyz` (CC-BY-4.0 licensed). +If you store the datasets separately, you can store the license in the collection-level data +as the value for the property is the same for all features. +Once you merged the two datasets, you must ensure that a unique identifier for the collection is provieded +(here: `abc` and `xyz`) so that IDs are unique. +Additionally, you have to add the license property on the feature-level as the licenses are now twofold. -**category:** Choose any (unique) combination of the following values: +### category + +Choose any (unique) combination of the following values: - `conceptual`: This boundary represents how the grower thinks of a field, and what they would share with service providers to allocate information at the highest level of the field concept within their operation. @@ -65,7 +96,7 @@ Other properties are also allowed to be provided, but are not described by this The categories are based on the [definitions of the AgGateway initiative](https://aggateway.org/Portals/1010/WebSite/About%20Us/FIELD%20BOUNDARY%20FLYER%20122123.pdf?ver=2024-01-03-212959-590). -### Spatial Properties +## Spatial Properties | Property Name | Data Type | Description | | ------------- | ------------ | ----------- | @@ -74,21 +105,26 @@ The categories are based on the [definitions of the AgGateway initiative](https: | area | float | Area of the field, in hectares. Must be > 0 and <= 100,000. | | perimeter | float | Perimeter of the field, in meters. Must be > 0 and <= 125,000. | -**area/perimeter:** These are derived attributes from the geometry itself, +### area / perimeter + +These are derived attributes from the geometry itself, and must match the geometry's area/perimeter. If they do not match then the geometry should be considered canonical. Validators may flag the value as invalid if it exceeds a certain threshold. -### Determination Properties +## Determination Properties -| Property Name | Data Type | Description | -| ---------------------- | --------- | ------------------------------------------------------------ | -| determination_method | string | The boundary creation method, one of the values below. | -| determination_datetime | datetime | The last timestamp at which the field did exist and was observed, in UTC. | +| Property Name | Data Type | Description | +| ---------------------- | --------- | ----------- | +| determination_method | string | The boundary creation method, one of the values below. | +| determination_datetime | datetime | The last timestamp at which the field did exist and was observed. | | determination_details | string | Further details about the determination, especially the methodology. | -**determination_datetime**: In case the source of the information is an -interval or a set of timestamps, use the end. +### determination_datetime + +The last timestamp at which the field did exist and was observed, provided in the UTC timezone. + +In case the source of the information is an interval or a set of timestamps, use the end. For example, for ML you'd use the timestamp of the last image and not the timestamp of the actual execution. @@ -96,7 +132,9 @@ timestamp of the actual execution. > We define more temporal properties in the > [timestamps extension](https://github.com/fiboa/timestamps). -**determination_method**: Must be one of the following values: +### determination_method + +The determination method must be one of the following values: - `manual`: Hand created from imagery, e.g. using a tool to point and click on a map. - `surveyed`: Determined through a professional land survey measuring the actual distances and angles on the ground. @@ -107,3 +145,15 @@ timestamp of the actual execution. The determination methods are based on the definitions of the [AgGateway initiative - WG17](https://aggateway.org/). The specific values have [not been published yet](https://github.com/fiboa/specification/issues/31). + +## Schema Language + +The schema language used for fiboa is [fiboa Schema](https://github.com/fiboa/schema), version 0.2.0. + +The data types in the tables above are defined in the document +[Data Types](https://github.com/fiboa/schema/blob/v0.2.0/datatypes.md). + +fiboa Schema defines a (limited) set of data types and a +[vocabulary](https://github.com/fiboa/schema/blob/v0.2.0/README.md#vocabulary) +to express additional constraints for these data types. +This allows to define a clear mapping between the core specification and its encodings. diff --git a/core/schema/schema.yaml b/core/schema/schema.yaml index 6dbd5b1..85522f8 100644 --- a/core/schema/schema.yaml +++ b/core/schema/schema.yaml @@ -1,8 +1,25 @@ -$schema: https://fiboa.github.io/schema/v0.1.0/schema.json +$schema: https://fiboa.github.io/schema/v0.3.0/schema.json required: + - schemas - id + - collection - geometry +collection: + schemas: true + id: false + geometry: false + bbox: false properties: + schemas: + type: array + items: + type: string + format: uri + pattern: ^https?:// + contains: + type: string + enum: + - https://fiboa.github.io/specification/v0.3.0/schema.yaml id: type: string minLength: 1 diff --git a/geojson/README.md b/geojson/README.md index a24ff36..55f04d1 100644 --- a/geojson/README.md +++ b/geojson/README.md @@ -1,62 +1,73 @@ # GeoJSON Encoding Specification -The GeoJSON encoding defines how field boundaries compliant to fiboa must be published. -The generic GeoJSON format is defined in -[IETF RFC 7946](https://datatracker.ietf.org/doc/html/rfc7946). +The GeoJSON encoding defines to encode field boundaries compliant to fiboa as +GeoJSON as defined in [IETF RFC7946](https://datatracker.ietf.org/doc/html/rfc7946). -> [!NOTE] -> The GeoJSON encoding is still work in progress. Feedback is welcome! +A single fiboa Feature must be encoded as a GeoJSON [`Feature`](#feature). +Multiple fiboa Featurs should be provided as a GeoJSON [`FeatureCollection`](#featurecollection). +Other GeoJSON types are not allowed. -- **[Examples](examples/):** - 1. [as a FeatureCollection](examples/featurecollection/features.json) - 2. [as individual Features with a dedicated Collection](examples/individual-features/) -- **[Datatype mapping](datatypes.md)** +Related documents: -## Collection +- [Examples](examples/) +- [Datatype mapping](datatypes.md) -A [fiboa Collection](../core/README.md#collection) must be provided as a JSON object, either -1. embedded into the GeoJSON in a top-level property named `fiboa` (see example 1), or -2. separately as a JSON file that is linked to from the GeoJSON (see example 2). +## Feature -A fiboa Collection may be a [GeoJSON FeatureCollection](https://datatracker.ietf.org/doc/html/rfc7946#section-3.3). -All features in a FeatureCollection must be fiboa-compliant. +- Example: [individual features](examples/individual-features/) -## Features - -Each [fiboa Feature](../core/README.md#features) must be a valid +Each [fiboa Feature](../core/README.md) must be a valid [GeoJSON Feature](https://datatracker.ietf.org/doc/html/rfc7946#section-3.2). The following properties are defined for a GeoJSON Feature (at the top-level of the object): -| Property Name | Data Type | Description | -| ------------- | ------------------- | ------------------------------------------------------------ | -| id | string | **REQUIRED. ** See [id](../core/README.md#general-properties) in the core specification, must not be a `number` | -| type | string | **REQUIRED. ** The GeoJSON type, must be: `Feature` | +| Property Name | Data Type | Description | +| ------------- | ------------------- | ----------- | +| id | string | **REQUIRED.** See [id](../core/README.md#id) in the core specification, must not be a `number` | +| type | string | **REQUIRED.** The GeoJSON type, must be: `Feature` | | geometry | object | **REQUIRED.** A [GeoJSON Geometry Object](https://datatracker.ietf.org/doc/html/rfc7946#section-3.1), must not be `null` | | bbox | array\ | A [GeoJSON Bounding Box](https://datatracker.ietf.org/doc/html/rfc7946#section-5) | -| properties | object | An object with additional properties (see [`properties`](#properties)) | -| links | array\ | A list of links (see [`links`](#links)) | -| fiboa | object | An object with the [fiboa Collection](../core/README.md#collection) properties if not provided as a link (see [Collection](#collection)). | +| properties | object | An object with all additional properties (see [`properties`](#properties)) | + +The mapping between the Parquet data types and the fiboa data types, can be found in the +[data type mapping](datatypes.md). > [!IMPORTANT] > RFC 7946 doesn't support a property named `crs`, which was only available in an earlier version of GeoJSON (2008). > The CRS of the GeoJSON geometry and bbox must be WGS 84 / OGC CRS 84, -> see the [RFC 7946, chapter 4](https://datatracker.ietf.org/doc/html/rfc7946#section-4) for details. +> see the [RFC 7946, chapter 4](https://datatracker.ietf.org/doc/html/rfc7946#section-4) for details. -### `properties` +[Collection-level](../core/README.md#collection) data is not supported. +All properties are provides in the JSON object with the key [`properties`](#properties). -Must include any property that is required by the fiboa core specification (currently none). +### properties + +Must include any property that is required by the fiboa core specification. May include any additional property. All properties defined by the core specification (except for `id`, `geometry` and `bbox`) or extensions should be provided here. -### `links` +## FeatureCollection + +- Example: [a feature collection](examples/featurecollection/features.json) + +All features in a GeoJSON FeatureCollection must be fiboa-compliant. + +Properties can also be stored at the [collection-level](../core/README.md#collection) +if all values for a specific property have the same value in all features. +This de-duplicates data for more efficient resource usage. +All properties are stored on the top-level of the FeatureCollection object as +[foreign members](https://datatracker.ietf.org/doc/html/rfc7946#section-6.1). +The individual features shall not contain any properties that are stored at the collection-level. +Validation must ensure that the collection-level properties are taken into account. + +The following properties in Features can't be collection-level properties: -An array of links where each link conforms to the -[Hyperlink Schema](http://schemas.opengis.net/ogcapi/common/part1/1.0/openapi/schemas/link.yaml) -defined in -[OGC API - Common - Part 1](https://docs.ogc.org/is/19-072/19-072.html#_11b9b4f7-42fc-413a-b63a-e7fb060b5e4b). +- `id` +- `geometry` +- `bbox` -The following relation types are commonly used: +Properties with the following names can#t be moved to the collection-level due to conflicts with the +FeatureCollection properties defined by GeoJSON: -- `self`: Absolute link to the GeoJSON file itself. -- `collection`: Link to the [Collection](#collection) +- `features` +- `type` diff --git a/geojson/datatypes.md b/geojson/datatypes.md index 5a747de..c0bece5 100644 --- a/geojson/datatypes.md +++ b/geojson/datatypes.md @@ -3,28 +3,33 @@ The following table shows the data types that are used by fiboa in the Property definitions. It also shows the mapping to the GeoJSON data types. -| fiboa data type | (Geo)JSON | -| --------------------------------------------------- | ------------------------------------------------------------ | -| boolean | boolean | -| int8 | integer
minimum: -128
maximum: 127 | -| uint8 | integer
minimum: 0
maximum: 255 | -| int16 | integer
minimum: -32768
maximum: 32767 | -| uint16 | integer
minimum: 0
maximum: 65535 | -| int32 | integer
minimum: -2147483648
maximum: 2147483647 | -| uint32 | integer
minimum: 0
maximum: 4294967295 | -| int64 | integer
minimum: -9223372036854775808
maximum: 9223372036854775807 | -| uint64 | integer
minimum: 0
maximum: 18446744073709551615 | -| float
IEEE 32-bit | number
minimum: ?
maximum: ? | -| double
IEEE 64-bit | number
minimum: ?
maximum: ? | -| binary | string
contentEncoding: binary | -| string
charset: UTF-8 | string | -| array | array | -| object
keys: string
values: any | object
additionalProperties: false | -| date | string
format: date | -| date-time
with milliseconds
timezone: UTC | string
format: date-time
pattern: Z$ | -| geometry | [object with schema](https://geojson.org/schema/Geometry.json) | -| bounding-box
x and y only, no z | array
minItems: 4
maxItems: 4
items: number | -| *required* (not a datatype) | null | +| fiboa data type | (Geo)JSON | Collection-level | +| --------------------------------------------------- | ------------------------------------------------------------ | ---------------- | +| boolean | boolean | yes | +| int8 | integer
minimum: -128
maximum: 127 | yes | +| uint8 | integer
minimum: 0
maximum: 255 | yes | +| int16 | integer
minimum: -32768
maximum: 32767 | yes | +| uint16 | integer
minimum: 0
maximum: 65535 | yes | +| int32 | integer
minimum: -2147483648
maximum: 2147483647 | yes | +| uint32 | integer
minimum: 0
maximum: 4294967295 | yes | +| int64 | integer
minimum: -9223372036854775808
maximum: 9223372036854775807 | yes | +| uint64 | integer
minimum: 0
maximum: 18446744073709551615 | yes | +| float
IEEE 32-bit | number
minimum: ?
maximum: ? | yes | +| double
IEEE 64-bit | number
minimum: ?
maximum: ? | yes | +| binary | string
contentEncoding: base64 | yes | +| string
charset: UTF-8 | string | yes | +| array | array | yes | +| object
keys: string
values: any | object
additionalProperties: false | yes | +| date | string
format: date | yes | +| date-time
with milliseconds
timezone: UTC | string
format: date-time
pattern: Z$ | yes | +| geometry | [object with schema](https://geojson.org/schema/Geometry.json) | no | +| bounding-box
x and y only, no z | array
minItems: 4
maxItems: 4
items: number | no | + +## Missing values + +For optional properties, values might be missing. +This is expressed by omitting the JSON property. +The value `null` is not allowed. ## Potential issues in conversion diff --git a/geojson/examples/featurecollection/features.json b/geojson/examples/featurecollection/features.json index ab48994..d31c1af 100644 --- a/geojson/examples/featurecollection/features.json +++ b/geojson/examples/featurecollection/features.json @@ -1,4 +1,15 @@ { + "schemas": { + "de_nrw": [ + "https://fiboa.github.io/specification/v0.3.0/schema.yaml", + "https://fiboa.github.io/inspire-extension/v0.2.0/schema.yaml", + "https://fiboa.github.io/flik-extension/v0.1.0/schema.yaml", + "https://fiboa.github.io/crop-extension/v0.1.0/schema.yaml" + ] + }, + "collection": "de_nrw", + "license": "dl-de/by-2-0", + "attribution": "Land Nordrhein-Westfalen / Open.NRW - https://www.opengeodata.nrw.de/produkte/umwelt_klima/bodennutzung/landwirtschaft/", "type": "FeatureCollection", "features": [ { @@ -8,67 +19,29 @@ "inspire:id": "https://geodaten.nrw.de/id/inspire-lc-fb/landcoverunit/12324", "flik": "DENWLI0542130247", "determination_datetime": "2005-02-28T00:00:00Z", - "nutz_code": "A", - "nutz_txt": "Ackerland", - "area": 1.631100058555603 + "crop:code": "A", + "crop:name": "Ackerland", + "area": 1.6311 }, "geometry": { "type": "Polygon", "coordinates": [ [ - [ - 7.875243329949302, - 51.7469574917968 - ], - [ - 7.8754156210171224, - 51.74865579902567 - ], - [ - 7.87559517961007, - 51.748657516128716 - ], - [ - 7.875727139469757, - 51.74864762337336 - ], - [ - 7.875865723118926, - 51.74861179149097 - ], - [ - 7.876160946694515, - 51.74853656922356 - ], - [ - 7.876274940061089, - 51.748526513043004 - ], - [ - 7.876646213349393, - 51.74852263605798 - ], - [ - 7.876669177898854, - 51.74759587524452 - ], - [ - 7.876683221091441, - 51.7470291214554 - ], - [ - 7.875243329949302, - 51.7469574917968 - ] + [7.8752433, 51.7469574], + [7.8754156, 51.7486557], + [7.8755951, 51.7486575], + [7.8757271, 51.7486476], + [7.8758657, 51.7486117], + [7.8761609, 51.7485365], + [7.8762749, 51.7485265], + [7.8766462, 51.7485226], + [7.8766691, 51.7475958], + [7.8766832, 51.7470291], + [7.8752433, 51.7469574] ] ] }, - "bbox": [ - 7.875243329949302, - 51.7469574917968, - 7.876683221091441, - 51.748657516128716 - ] + "bbox": [7.8752433, 51.7469574, 7.8766832, 51.7486575] }, { "id": "2713", @@ -76,90 +49,33 @@ "properties": { "inspire:id": "https://geodaten.nrw.de/id/inspire-lc-fb/landcoverunit/2713", "flik": "DENWLI0540210084", - "determination_datetime": "2005-02-28T00:00:00Z", - "nutz_code": "A", - "nutz_txt": "Ackerland", - "area": 1.8975000381469727 + "determination_datetime": "2005-02-22T00:00:00Z", + "crop:code": "W", + "crop:name": "Weide", + "area": 1.8975 }, "geometry": { "type": "Polygon", "coordinates": [ [ - [ - 9.279072225112648, - 51.925508828714925 - ], - [ - 9.279848170539884, - 51.92582918268683 - ], - [ - 9.280173032315249, - 51.925963048968214 - ], - [ - 9.280599939130775, - 51.92614034991495 - ], - [ - 9.280660193987938, - 51.926028714865886 - ], - [ - 9.280886077078973, - 51.9256102896548 - ], - [ - 9.281335286046785, - 51.924778127406576 - ], - [ - 9.281305739341624, - 51.92472580957354 - ], - [ - 9.280917027691007, - 51.92458295033388 - ], - [ - 9.279903540966059, - 51.92421337118715 - ], - [ - 9.279817610187122, - 51.92423316888092 - ], - [ - 9.279398358118248, - 51.92501015234708 - ], - [ - 9.279241344298002, - 51.925301083950984 - ], - [ - 9.279072225112648, - 51.925508828714925 - ] + [9.2790722, 51.9255088], + [9.2798481, 51.9258291], + [9.280173, 51.925963], + [9.2805999, 51.9261403], + [9.2806601, 51.9260287], + [9.280886, 51.9256102], + [9.2813352, 51.9247781], + [9.2813057, 51.9247258], + [9.280917, 51.9245829], + [9.2799035, 51.9242133], + [9.2798176, 51.9242331], + [9.2793983, 51.9250101], + [9.2792413, 51.925301], + [9.2790722, 51.9255088] ] ] }, - "bbox": [ - 9.279072225112648, - 51.92421337118715, - 9.281335286046785, - 51.92614034991495 - ] + "bbox": [9.2790722, 51.9242133, 9.2813352, 51.9261403] } - ], - "fiboa": { - "fiboa_version": "0.2.0", - "fiboa_extensions": [ - "https://fiboa.github.io/inspire-extension/v0.2.0/schema.yaml" - ], - "id": "de_nrw", - "title": "Field boundaries for North Rhine-Westphalia (NRW), Germany", - "license": "dl-de/by-2-0", - "attribution": "Land Nordrhein-Westfalen / Open.NRW - https://www.opengeodata.nrw.de/produkte/umwelt_klima/bodennutzung/landwirtschaft/" - } -} \ No newline at end of file + ] +} diff --git a/geojson/examples/individual-features/12324.json b/geojson/examples/individual-features/12324.json index 6402e59..2f8b87d 100644 --- a/geojson/examples/individual-features/12324.json +++ b/geojson/examples/individual-features/12324.json @@ -2,68 +2,35 @@ "id": "12324", "type": "Feature", "properties": { + "schemas": { + "example": [ + "https://fiboa.github.io/specification/v0.3.0/schema.yaml", + "https://fiboa.github.io/flik-extension/v0.1.0/schema.yaml" + ] + }, + "collection": "example", "flik": "DENWLI0542130247", "determination_datetime": "2005-02-28T00:00:00Z", "nutz_code": "A", "nutz_txt": "Ackerland", - "area": 1.631100058555603 + "area": 1.6311 }, "geometry": { "type": "Polygon", "coordinates": [ [ - [ - 7.875243329949302, - 51.7469574917968 - ], - [ - 7.8754156210171224, - 51.74865579902567 - ], - [ - 7.87559517961007, - 51.748657516128716 - ], - [ - 7.875727139469757, - 51.74864762337336 - ], - [ - 7.875865723118926, - 51.74861179149097 - ], - [ - 7.876160946694515, - 51.74853656922356 - ], - [ - 7.876274940061089, - 51.748526513043004 - ], - [ - 7.876646213349393, - 51.74852263605798 - ], - [ - 7.876669177898854, - 51.74759587524452 - ], - [ - 7.876683221091441, - 51.7470291214554 - ], - [ - 7.875243329949302, - 51.7469574917968 - ] + [7.8752433, 51.7469574], + [7.8754156, 51.7486557], + [7.8755951, 51.7486575], + [7.8757271, 51.7486476], + [7.8758657, 51.7486117], + [7.8761609, 51.7485365], + [7.8762749, 51.7485265], + [7.8766462, 51.7485226], + [7.8766691, 51.7475958], + [7.8766832, 51.7470291], + [7.8752433, 51.7469574] ] ] - }, - "links": [ - { - "href": "collection.json", - "rel": "collection", - "type": "application/json" - } - ] -} \ No newline at end of file + } +} diff --git a/geojson/examples/individual-features/2713.json b/geojson/examples/individual-features/2713.json index 7c917ce..d53f07a 100644 --- a/geojson/examples/individual-features/2713.json +++ b/geojson/examples/individual-features/2713.json @@ -2,80 +2,41 @@ "id": "2713", "type": "Feature", "properties": { + "schemas": { + "example": [ + "https://fiboa.github.io/specification/v0.3.0/schema.yaml", + "https://fiboa.github.io/flik-extension/v0.1.0/schema.yaml" + ] + }, + "collection": "example", "flik": "DENWLI0540210084", "determination_datetime": "2005-02-28T00:00:00Z", "nutz_code": "A", "nutz_txt": "Ackerland", - "area": 1.8975000381469727 + "area": 1.8975000 }, "geometry": { "type": "Polygon", "coordinates": [ [ - [ - 9.279072225112648, - 51.925508828714925 - ], - [ - 9.279848170539884, - 51.92582918268683 - ], - [ - 9.280173032315249, - 51.925963048968214 - ], - [ - 9.280599939130775, - 51.92614034991495 - ], - [ - 9.280660193987938, - 51.926028714865886 - ], - [ - 9.280886077078973, - 51.9256102896548 - ], - [ - 9.281335286046785, - 51.924778127406576 - ], - [ - 9.281305739341624, - 51.92472580957354 - ], - [ - 9.280917027691007, - 51.92458295033388 - ], - [ - 9.279903540966059, - 51.92421337118715 - ], - [ - 9.279817610187122, - 51.92423316888092 - ], - [ - 9.279398358118248, - 51.92501015234708 - ], - [ - 9.279241344298002, - 51.925301083950984 - ], - [ - 9.279072225112648, - 51.925508828714925 - ] + [9.2790722, 51.9255088], + [9.2798481, 51.9258291], + [9.2801730, 51.9259630], + [9.2805999, 51.9261403], + [9.2806601, 51.9260287], + [9.2808860, 51.9256102], + [9.2813352, 51.9247781], + [9.2813057, 51.9247258], + [9.2809170, 51.9245829], + [9.2799035, 51.9242133], + [9.2798176, 51.9242331], + [9.2793983, 51.9250101], + [9.2792413, 51.9253010], + [9.2790722, 51.9255088] ] ] }, - "links": [ - { - "href": "collection.json", - "rel": "collection", - "type": "application/json" - } + "bbox": [ + 9.2790722, 51.9242133, 9.2813352, 51.9261403 ] -} \ No newline at end of file +} diff --git a/geojson/examples/individual-features/collection.json b/geojson/examples/individual-features/collection.json deleted file mode 100644 index 210bda1..0000000 --- a/geojson/examples/individual-features/collection.json +++ /dev/null @@ -1,62 +0,0 @@ -{ - "fiboa_version": "0.2.0", - "fiboa_extensions": [], - "stac_version": "1.0.0", - "type": "Collection", - "id": "de_nrw", - "title": "Field boundaries for North Rhine-Westphalia (NRW), Germany", - "description": "A field block (German: \"Feldblock\") is a contiguous agricultural area surrounded by permanent boundaries, which is cultivated by one or more farmers with one or more crops, is fully or partially set aside or is fully or partially taken out of production. Field blocks are classified separately according to the main land uses of arable land, grassland, permanent crops, 2nd pillar and other. Since 2005, field blocks in NRW have represented the area reference within the framework of the Integrated Administration and Control System (IACS) for EU agricultural subsidies.", - "license": "proprietary", - "providers": [ - { - "name": "Land Nordrhein-Westfalen / Open.NRW", - "roles": [ - "producer", - "licensor" - ], - "url": "https://www.opengeodata.nrw.de/produkte/umwelt_klima/bodennutzung/landwirtschaft/" - }, - { - "name": "fiboa CLI", - "roles": [ - "processor" - ], - "url": "https://pypi.org/project/fiboa-cli" - }, - { - "name": "Source Cooperative", - "roles": [ - "host" - ], - "url": "https://beta.source.coop/fiboa/de-nrw/" - } - ], - "extent": { - "spatial": { - "bbox": [ - [ - 5.8659988131, - 50.3226989435, - 9.4476584861, - 52.5310351488 - ] - ] - }, - "temporal": { - "interval": [ - [ - "2005-02-28T00:00:00Z", - "2024-03-28T00:00:00Z" - ] - ] - } - }, - "links": [ - { - "href": "https://www.govdata.de/dl-de/by-2-0", - "title": "Data licence Germany - attribution - Version 2.0", - "type": "text/html", - "rel": "license" - } - ] -} \ No newline at end of file diff --git a/geojson/schema/datatypes.json b/geojson/schema/datatypes.json index 3c92bb7..ab49171 100644 --- a/geojson/schema/datatypes.json +++ b/geojson/schema/datatypes.json @@ -1,6 +1,6 @@ { "$schema": "https://json-schema.org/draft/2020-12/schema", - "$id": "https://fiboa.github.io/specification/v0.2.0/geojson/datatypes.json", + "$id": "https://fiboa.github.io/specification/v0.3.0/geojson/datatypes.json", "$defs": { "boolean": { "type": "boolean" diff --git a/geoparquet/README.md b/geoparquet/README.md index c2fcf39..3af2b42 100644 --- a/geoparquet/README.md +++ b/geoparquet/README.md @@ -1,26 +1,12 @@ # GeoParquet Encoding Specification The Geoparquet encoding defines how field boundaries compliant to fiboa must be published. -The generic GeoParquet format is defined in the -[OGC GeoParquet specification v1.0.0](https://geoparquet.org/releases/v1.0.0/). +The generic GeoParquet format is defined in the OGC GeoParquet specification, +either version [v1.0.0](https://geoparquet.org/releases/v1.0.0/) +or [v1.1.0](https://geoparquet.org/releases/v1.1.0/). We aim to support any future version of GeoParquet, too. -> [!NOTE] -> The GeoParquet encoding is still work in progress. Feedback is welcome! - -- **[Examples](examples/)** -- **[Data type mapping](datatypes.md)** - -## Collection - -The GeoParquet file must embed the [fiboa Collection](../core/README.md#collection) -in the Parquet metadata in a property named `fiboa`. - -It is recommended to additionally provide the fiboa Collection as a separate JSON file, too. - -## Features - -Each [fiboa Feature](../core/README.md#features) corresponds to a row in a GeoParquet file. +Each [fiboa Feature](../core/README.md) corresponds to a row in a GeoParquet file. The properties defined for fiboa Features are made available as individual columns in the GeoParquet file. @@ -28,6 +14,20 @@ Properties that are optional can be omitted if all values are [null values](https://parquet.apache.org/docs/file-format/nulls/), i.e. the column can be missing from the GeoParquet file. +Properties can also be stored at the [collection-level](../core/README.md#collection) if all values in a column have the same value. +This de-duplicates data for more efficient resource usage and simplifies the sturcture of the Parquet file. +The GeoParquet file must embed the properties in the Parquet metadata in a property named `collection`. +The metadata must be JSON-encoded. + The mapping between the Parquet data types and the fiboa data types, can be found in the [data type mapping](datatypes.md). +Related documents: + +- [Examples](examples/) +- [Data type mapping](datatypes.md) + +## Best practices + +For data with a lot of repetition, brotli compression is recommended. +This applies particularly for merged datasets that don't deduplicate properties to the collection-level. diff --git a/geoparquet/datatypes.md b/geoparquet/datatypes.md index 403bbee..734f640 100644 --- a/geoparquet/datatypes.md +++ b/geoparquet/datatypes.md @@ -3,32 +3,45 @@ The following table shows the data types that are used by fiboa in the Property definitions. It also shows the mapping to the GeoParquet data types. -| fiboa Schema data type | (Geo)Parquet | -| --------------------------------------------------- | ------------------------------------------------------------ | -| boolean | BOOLEAN | -| int8 | IntType
bitWidth: 8
isSigned: true
(deprecated: INT_8) | -| uint8 | IntType
bitWidth: 8
isSigned: false
(deprecated: UINT_8) | -| int16 | IntType
bitWidth: 16
isSigned: true
(deprecated: INT_16) | -| uint16 | IntType
bitWidth: 16
isSigned: false
(deprecated: UINT_16) | -| int32 | IntType
bitWidth: 32
isSigned: true
(deprecated: INT_32) | -| uint32 | IntType
bitWidth: 64
isSigned: false
(deprecated: UINT_32) | -| int64 | IntType
bitWidth: 64
isSigned: true
(deprecated: INT_64) | -| uint64 | IntType
bitWidth: 64
isSigned: false
(deprecated: UINT_64) | -| float
IEEE 32-bit | FLOAT | -| double
IEEE 64-bit | DOUBLE | -| binary | BYTE_ARRAY | -| string
charset: UTF-8 | STRING (BYTE_ARRAY) | -| array | LIST | -| object
keys: string
values: any | STRUCT / MAP | -| date | DATE (INT32) | -| date-time
with milliseconds
timezone: UTC | TimestampType (INT64)
isAdjustedToUTC: true
unit: MILLIS
(deprecated: TIMESTAMP_MILLIS) | -| geometry | BYTE_ARRAY
encoded as WKB | -| bounding-box
x and y only, no z | STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT) | -| *if a field is not required* | [Nullity](https://parquet.apache.org/docs/file-format/nulls/) | +| fiboa Schema data type | (Geo)Parquet | Collection-level | +| --------------------------------------------------- | ------------------------------------------------------------ | ------------------------------- | +| boolean | BOOLEAN | yes | +| int8 | IntType
bitWidth: 8
isSigned: true
(deprecated: INT_8) | yes | +| uint8 | IntType
bitWidth: 8
isSigned: false
(deprecated: UINT_8) | yes | +| int16 | IntType
bitWidth: 16
isSigned: true
(deprecated: INT_16) | yes | +| uint16 | IntType
bitWidth: 16
isSigned: false
(deprecated: UINT_16) | yes | +| int32 | IntType
bitWidth: 32
isSigned: true
(deprecated: INT_32) | yes | +| uint32 | IntType
bitWidth: 64
isSigned: false
(deprecated: UINT_32) | yes | +| int64 | IntType
bitWidth: 64
isSigned: true
(deprecated: INT_64) | yes | +| uint64 | IntType
bitWidth: 64
isSigned: false
(deprecated: UINT_64) | yes | +| float
IEEE 32-bit | FLOAT | yes | +| double
IEEE 64-bit | DOUBLE | yes | +| binary | BYTE_ARRAY | as string, base64-encoded | +| string
charset: UTF-8 | STRING (BYTE_ARRAY) | yes | +| array | LIST | yes | +| object
keys: string
values: any | STRUCT or MAP (see below) | yes | +| date | DATE (INT32) | as string, compliant to ISO8601 | +| date-time
with milliseconds
timezone: UTC | TimestampType (INT64)
isAdjustedToUTC: true
unit: MILLIS
(deprecated: TIMESTAMP_MILLIS) | as string, compliant to ISO8601 | +| geometry | BYTE_ARRAY
encoded as WKB | no | +| bounding-box
x and y only, no z | STRUCT(xmin FLOAT, ymin FLOAT, xmax FLOAT, ymax FLOAT) | no | The integer data types and the data type string can also be mapped to the ENUM data type in Parquet if a pre-defined set of values is available. +## Missing values + +For optional properties, values might be missing. +This is expressed by providing the values `null` +(see data type [Nullity](https://parquet.apache.org/docs/file-format/nulls/)). + +## Struct vs Map + +Parquet has both Map and Struct types. The struct type is similar to a named dictionary while the map type is similar to a list of ordered (key, value) pairs. The main difference is that you need to know up-front the keys for the struct type, while you don't for the map type. + +Due to this difference, the **Struct** type can only be used if `additionalProperties` is `false` (the default value) and only `properties` is provided to clearly specify the exact names of the properties. + +Any variability in the keys through the use of `additionalProperties` (except for the default `false`) or `patternProperties` requires the use of the **Map** data type. Please note that the order of the Map type is guaranteed to be preserved. + ## Unsupported Data Types The following data types occur in Parquet, but are not currently supported in fiboa: @@ -45,4 +58,4 @@ The following data types occur in Parquet, but are not currently supported in fi ## Potential issues in conversion -- The micro/nanosecond precision of Datetime / Times may got lost +- The micro/nanosecond precision of Datetime / Times may get lost