Skip to content

Config validation #3613

@astrojuanlu

Description

@astrojuanlu

Description

Today I spoke to a user that had a very long and treacherous parameters.yml like this:

sensors:
  sensor1:
    name: "Sensor 1"
    type: "temperature"
    stderr: 0.1
  sensor2:
    name: "Sensor 2"
    type: "humidity"
    stderr: 0.1
  sensor3:
    name: "Sensor 3"
    type: "temperature"
    stderr: 0.1
  sensor4:
    name: "Sensor 4"
    type: "temperature"
    stderr: 0.2

And so forth. So, there are several problems:

  1. There's lots of repetition. The user mentioned that it would be ideal to be able to "inherit" in YAML, something like:
_sensor:
  type: "<unknown>"
  stderr: 0.1

sensors:
  sensor1: ${_sensor}  # All defaults are taken
  sensor2: ${_sensor}
    stderr: 0.2  # Try to override default, but 💥 syntax error
  1. It's unclear how to validate this YAML. We did a quick proof of concept combining OmegaConf and Pydantic v2:
from omegaconf import OmegaConf
from pydantic import BaseModel

class Sensor(BaseModel):
    name: str
    sensor_type: str = "<unknown>"
    stderr: t.Optional[float] = 0.1

class Config(BaseModel):
    sensors: t.Dict[str, Sensor]

config = OmegaConf.load("conf/base/parameters.yml")
c = Config.validate(config)
print(c.sensors["sensor3"].stderr)

Which was cool! Because the defaults were filled from the Sensor model.

However, (2a) it's not clear how to keep the defaults in the YAML, which was desirable (although there's maybe a way to achieve that in Pydantic), (2b) it's not clear if this should be in parameters.yml or rather a custom sensors.yml, and most importantly, (2c) it's not clear how or where to perform such validation. There's no after_config_loaded hook.

I think the closest might be what kedro-mlflow does using after_context_created https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L78-L79 but then it's trying to inject the config in the KedroContext https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L129-L134, with all the problems discussed in #3214.

How can we better support this use case?

Paging @datajoely, @Galileo-Galilei

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue: Feature RequestNew feature or improvement to existing feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions