Skip to content

Breaking reference cycles during validation #1062

@eslavich

Description

@eslavich

As you may know, over in asdf we have a history of taking outrageous liberties with jsonschema's implementation details. Thanks to some new ideas from @braingram and the changes coming in 4.18, we're going to be able to eliminate most of that and use public interfaces instead.

One remaining problem we need to solve is how to break reference cycles during validation. For example, validating this document leads to RecursionError:

from referencing import Registry, Resource
import jsonschema

schema = {
    "$schema": "http://json-schema.org/draft-04/schema#",
    "id": "http://example.com/reference-cycle",
    "type": "object",
    "properties": {
        "foo": {"$ref": "#"}
    }
}

instance = {}
instance["foo"] = instance

resource = Resource.from_contents(schema)
registry = resource @ Registry()
validator = jsonschema.Draft4Validator(schema, registry=registry)

validator.validate(instance)

My best idea so far is to replace the validator methods with our own doctored up versions. We might replace the "properties" method with something like this:

seen = {}

def properties(validator, properties, instance, schema):
    if not validator.is_type(instance, "object"):
        return

    for property, subschema in properties.items():
	if property in instance:
            key = (id(instance[property]), id(subschema))
            if key in seen:
                for error in list(seen[key]):
                    yield ValidationError(error.message)
            else:
                errors = []
                seen[key] = errors
	        for error in validator.descend(
                    instance[property],
	            subschema,
                    path=property,
                    schema_path=property,
                ):
                    errors.append(error)
                    yield error

(and store that seen variable somewhere we can reliably clear it after each validation)

Is that a reasonable strategy? In doing so are we using anything that should not be considered a public API?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions