Validators ignored for fields with repeating subfields

## Environment

**CKAN version:** 2.10.1
**ckanext-scheming version:** release-3.0.0

## Description

Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called `custom_fields` that uses `repeating_subfields` to allow users to specify additional key-value pairs to associate with their dataset. The `key` subfield has a validator that makes sure that the key's name meets certain requirements:

```yaml
- field_name: custom_fields
  label: Custom Fields
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true
```

We want to block users from specifying the same `key` value in multiple entries in `custom_fields`. The obvious way to do this is to define a validator function, `keys_are_unique()`, that can be applied to `custom_fields` to verify that the keys are unique.

```yaml
- field_name: custom_fields
  label: Custom Fields
  validators: keys_are_unique
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true
```

After trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from `list[str]` to `dict`. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.

One example of this behavior is in the `_field_validators()` function in `ckanext/scheming/plugins.py`:

https://github.com/ckan/ckanext-scheming/blob/8646a9dce79aa0b5a46274271ca6330dc0870b92/ckanext/scheming/plugins.py#L582-L608

As a user, this behavior is unexpected -- I would expect that validators can be applied to *both* the top-level & sub-level fields. I think that it's worth updating the documentation for `repeating_subfields` and/or `validators` to explain that if a field has `repeating_subfields`, then `validators` for that field are ignored in favor of validators defined for the subfields.

I also have two specific questions, any insight would be appreciated:

1) We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a `ckanext-scheming` schema. Is our current approach (`custom_fields` field with `repeating_subfields`) the "correct" way to do what we are trying to do, or is there another method that is better supported, either by `ckanext-scheming` or by CKAN itself?

2) Assuming that what we are doing is correct, roughly how difficult would it be to modify `ckanext-scheming` to support validation of *both* the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.

	def _field_validators(f, schema, convert_extras):
	"""
	Return the validators for a scheming field f
	"""
	if 'validators' in f:
	validators = validation.validators_from_string(
	f['validators'],
	f,
	schema
	)
	elif helpers.scheming_field_required(f):
	validators = [not_empty]
	else:
	validators = [ignore_missing]

	if convert_extras:
	validators.append(convert_to_extras)

	# If this field contains children, we need a special validator to handle
	# them.
	if 'repeating_subfields' in f:
	validators = {
	sf['field_name']: _field_validators(sf, schema, False)
	for sf in f['repeating_subfields']
	}

	return validators

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validators ignored for fields with repeating subfields #394

Environment

Description

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Validators ignored for fields with repeating subfields #394

Description

Environment

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions