RFC: Raw geophysical data schemas v2

# Summary

The current "raw geophysical data" schemas are comprised of:
- `frequency-domain-electromagnetic` (FDEM);
- `time-domain-electromagnetic` (TDEM);
- `gravity`;
- `magnetics`;
- `radiometric`.

These objects are unique in that they are intended to represent (primarily) data captured from surveys, conducted in lines. In their current form, they are heavily inspired by Oasis montaj “geodatabases” (“GDBs”).

There are a few matters that may need some cleaning up with the raw geophysical data schemas. Some of these are general, and span _all_ raw geophysical data schemas (e.g., matters pertaining to coordinate representation within `survey-line`), while others are specific to a given particular schema/discipline (e.g., time-gate handling in EM schemas, or live/dead-time handling in the radiometric schema).

# Motivation

The raw geophysical data schemas have yet to see wide adoption or integration into existing workflows. There are, at the moment, parallel streams of work which are attempting to contend with these schemas for the first time. In doing so, some of initial shortcomings are being brought to light.

We have a good opportunity to capitalize on the lack of adoption in these schemas and make necessary changes to make them usable & suited for our needs.

While _some_ of the proposed changes do not necessarily constitute breaking changes, it is our recommendation to create a new version of these schemas altogether. We should also mark the `1.x.x` of the raw geophysical schemas as officially "deprecated", so as to avoid investment/integration in them.

# Coordinate representations
In Oasis montaj geodatabases, coordinates are “unpacked”/decomposed into their components, and stored in separate “channels” (columns). Below, the (x, y, z) components of each coordinate is distributed among the (`DH_East`, `DH_North`, `DH_RL`) channels, respectively.

![An Oasis montaj geodatabase with spatial channel annotations.](https://github.com/user-attachments/assets/2edb5426-64b3-49c5-a928-a04d703789d6)

Similarly, in raw geophysical schemas, coordinate components are distributed among three channels “names”. The `X` and `Y` channel names being required, and the `Z` channel name being optional (from `survey-line`):
```json
    "location_channels": {
      "type": "object",
      "description": "Survey location coordinate channels.",
      "properties": {
        "x": {
          "type": "string",
          "description": "Channel name indicating which of the channel attributes corresponds to the X channel."
        },
        "y": {
          "type": "string",
          "description": "Channel name indicating which of the channel attributes corresponds to the Y channel."
        },
        "z": {
          "type": "string",
          "description": "Channel name indicating which of the channel attributes corresponds to the Z channel."
        }
      },
      "required": [
        "x",
        "y"
      ]
    },
```

The coordinate component data would then be stored in the `survey-line.channel_attributes` list:
```json
    "channel_attributes": {
      "title": "List of Channel Attributes",
      "description": "List of channel attributes.",
      "type": "array",
      "items": {
        "$ref": "/components/channel-attribute/1.1.0/channel-attribute.schema.json"
      }
    }
```

An extra burden is placed on integrating workflows to “zip” coordinate components to produce true coordinates. This would seem to violate the intended value of the geoscience object schemas to “bias towards the reader”. In addition, A weak/implicit reference is created between coordinate channel names and the channel-attribute.name property, creating the opportunity for name mismatches or failed component name lookups to occur. In this case, users would be free to publish “valid” raw geophysical data objects with nonexistent coordinate column names, such as: (foo, bar, baz). The onus would be on the reader to attempt to make sense of the raw geophysical data, or discard it altogether.

Coordinates should be a "first class" property of survey lines, and not relegated to data channels, referenced by a lookup via a `string` field.

# Survey line requirements
We should be careful about marking properties of the raw geophysical data schemas as required. Inclusion of default-initialized properties which are not providing any additional value will, at best, increase “clutter” in the object, and, at worst, cause issues downstream from improper handling of default cases. When given the choice of expressing the omission of certain data, we should opt for the exclusion of the property itself, instead of expressing omission through some zero- or default-initialized value. This makes the exclusion explicit, and removes any conflation of the default-initialized value with representing any “real” data.

The survey-line schema includes the following properties, all of which are marked as required:
```json
    "line_number": { "description": "The number of the line, can be alphanumeric.", "type": "string" },
    "date": { "description": "Date.", "type": "string", "format": "date-time" },
    "version": { "description": "Version.", "type": "integer", "default": 0, "minimum": 0 },
    "group": { "description": "Represents the group when the data is collected.", "type": "integer", "default": 0, "minimum": 0 },
    "type": { "description": "Survey line type.", "enum": [ "..." ], "type": "string", "default": "Line" },
    "location_channels": { "type": "object", "description": "Survey location coordinate channels." },
    "channel_attributes": { "title": "List of Channel Attributes", "description": "List of channel attributes.", "type": "array", 
```

Raw geophysical data schemas contain an array of survey lines, for example, from the `radiometric` geoscience object:
```json
        "line_list": {
          "description": "Line list.",
          "type": "array",
          "minItems": 1,
          "items": {
            "$ref": "/components/survey-line/1.1.0/survey-line.schema.json"
          }
        }
```

The `line_number` should remain required. Though we cannot impose that it be unique (similar to `base-attribute.key`), in practice, it can serve as a means to provide integrators with a lookup key/reference to survey line lists. Given that initial requirement, the following fields should be marked as optional (removed as required), as we would no longer rely on them for providing unique identifiers for individual survey-line objects:
- `date`: This is additional metadata, and shouldn’t be enforced on publishing;
- `version`: This may not apply to all survey lines, and is likely to be default-initialized in practice;
- `group`: Bundling “groups” of lines is indeed useful in practice, but shouldn’t be enforced by the schemas.

The following fields should remain required:
- `line_number`: Rationale given above, this will be our index into arrays of survey lines;
- `type`: This seems sensible in practice;
- `location_channels`: Identifying x, y, and (optionally) z channels should be enforced; and
- `channel_attributes`: A survey line should always have some attributes to provide.

# Radiometrics Data
It’s common practice that spectrometers produce metrics surrounding live (or dead) time, which conveys time spent caching/serializing captured spectrometer data, when measured against the overall sample increment.

The current radiometric schema imposes providing the following values:
```json
        "dead_time": {
          "description": "Dead time (msec).",
          "type": "number",
          "minimum": 0.0
        },
        "live_time": {
          "description": "Live time (msec).",
          "type": "number",
          "minimum": 0.0
        },
        "idle_time": {
          "description": "Idle time (msec).",
          "type": "number",
          "minimum": 0.0
        },
```

We are currently missing the sample increment value, which is the total time that elapses between each record in the radiometrics database. The radiometrics schema should be extended to provide this value `(sample_time: float)`. This will be a property of the `radiometric` schema, as it is constant per `radiometric` schema object.

Live and dead time are complimentary. The sum of both should equal the sample increment, which is constant for the survey. In addition, these values should be provided through `channel-attribute` components of `survey-line`s. Each radiometric observation will include _either_ a `live_time` or `dead_time`, which prescribes a value `<= sample_increment`, conveying the total time the spectrometer spent writing results and not actively capturing data. Finally, while these values are common in practice, they are not theoretically required. Modern spectrometers are minimizing dead time sufficiently that a day may come where these values are no longer practically used/required. `idle_time` is not expected to be provided.

# Detailed design

I will be creating a pull request to our existing geoscience object schemas and linking it to this RFC.

# Contributing guidelines

- [x] I have read the contributing guidelines

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Raw geophysical data schemas v2 #25

Summary

Motivation

Coordinate representations

Survey line requirements

Radiometrics Data

Detailed design

Contributing guidelines

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Raw geophysical data schemas v2 #25

Description

Summary

Motivation

Coordinate representations

Survey line requirements

Radiometrics Data

Detailed design

Contributing guidelines

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions