-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
enhancementNew feature or requestNew feature or request
Description
We would like to propose the implementation of data quality diagnostics for processed_cube objects in dubicube. The goal is to support exploratory data analysis and to assess whether a data cube is fit for use, either in general or in relation to specific biodiversity indicators.
Motivation
Biodiversity indicators can be highly sensitive to uneven data coverage across space, time, and taxonomy. Currently, users have limited tooling to diagnose these issues before (or alongside) indicator calculation. Providing standardized robustness diagnostics would:
- Improve transparency and interpretability of indicators
- Support informed indicator selection
- Enable early detection of problematic datasets
Scope (high level)
We envision diagnostics along three main dimensions:
Spatial
- Data distribution (clustered vs evenly spread)
- Geographical coverage (localized vs widespread species)
Temporal
- Temporal variation in occurrences
- Stability over time, optionally compared to higher taxonomic levels
Taxonomical
- Species prevalence or abundance
- Overrepresentation of certain species in multispecies indicators
Proposed functionality (overview)
-
Measuring data cube robustness
- Input:
processed_cube, optionally an indicator function - Output: a structured summary with warnings / flags and short explanations
- Intended for data exploration and pre-indicator checks
- Input:
-
Filtering observations based on robustness criteria
- Input:
processed_cube - Output: filtered
processed_cube - Includes sensible defaults (e.g. excluding species observed only once in a single year for trend analysis)
- Input:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request