-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Environment
CKAN version: 2.10.1
ckanext-scheming version: release-3.0.0
Description
Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called custom_fields that uses repeating_subfields to allow users to specify additional key-value pairs to associate with their dataset. The key subfield has a validator that makes sure that the key's name meets certain requirements:
- field_name: custom_fields
label: Custom Fields
repeating_label: Custom Field
repeating_subfields:
- field_name: key
label: Key
validators: key_validator
required: true
- field_name: value
label: Value
required: trueWe want to block users from specifying the same key value in multiple entries in custom_fields. The obvious way to do this is to define a validator function, keys_are_unique(), that can be applied to custom_fields to verify that the keys are unique.
- field_name: custom_fields
label: Custom Fields
validators: keys_are_unique
repeating_label: Custom Field
repeating_subfields:
- field_name: key
label: Key
validators: key_validator
required: true
- field_name: value
label: Value
required: trueAfter trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from list[str] to dict. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.
One example of this behavior is in the _field_validators() function in ckanext/scheming/plugins.py:
ckanext-scheming/ckanext/scheming/plugins.py
Lines 582 to 608 in 8646a9d
| def _field_validators(f, schema, convert_extras): | |
| """ | |
| Return the validators for a scheming field f | |
| """ | |
| if 'validators' in f: | |
| validators = validation.validators_from_string( | |
| f['validators'], | |
| f, | |
| schema | |
| ) | |
| elif helpers.scheming_field_required(f): | |
| validators = [not_empty] | |
| else: | |
| validators = [ignore_missing] | |
| if convert_extras: | |
| validators.append(convert_to_extras) | |
| # If this field contains children, we need a special validator to handle | |
| # them. | |
| if 'repeating_subfields' in f: | |
| validators = { | |
| sf['field_name']: _field_validators(sf, schema, False) | |
| for sf in f['repeating_subfields'] | |
| } | |
| return validators |
As a user, this behavior is unexpected -- I would expect that validators can be applied to both the top-level & sub-level fields. I think that it's worth updating the documentation for repeating_subfields and/or validators to explain that if a field has repeating_subfields, then validators for that field are ignored in favor of validators defined for the subfields.
I also have two specific questions, any insight would be appreciated:
-
We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a
ckanext-schemingschema. Is our current approach (custom_fieldsfield withrepeating_subfields) the "correct" way to do what we are trying to do, or is there another method that is better supported, either byckanext-schemingor by CKAN itself? -
Assuming that what we are doing is correct, roughly how difficult would it be to modify
ckanext-schemingto support validation of both the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.