-
-
Notifications
You must be signed in to change notification settings - Fork 18
Description
This originates downstream in python-jsonschema/check-jsonschema#376 . I was just able to look at the offending schema and trace things out enough to find the true cause. I've stuck with using the CLI for my reproducers, but I could work up some python for this if it would be useful.
check-jsonschema is using jsonschema+referencing. It keeps the jsonschema behavior of checking a schema against its metaschema. Therefore, the following schema is caught as invalid:
invalid-caught-by-metaschema
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"definitions": {
"foo": "invalid"
},
"properties": {
"foo": {
"$ref": "#/definitions/foo"
}
}
}(Note how the definition for "foo" is a string, rather than a schema.)
However, if the invalid definition is nested, as follows, the metaschema check is not sufficient:
invalid-not-caught-by-metaschema
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"definitions": {
"sub": {
"foo": "invalid"
}
},
"properties": {
"foo": {
"$ref": "#/definitions/sub/foo"
}
}
}As a result, it is possible, with jsonschema+referencing, to be handling this in a validator. The next step is to try to use it.
With the check-jsonschema CLI, we get the following trace (trimmed to relevant parts):
full-cli-traceback
$ check-jsonschema --schemafile badrefschema.json <(echo '{"foo": "bar"}')
Traceback (most recent call last):
<<<<<check-jsonschema invocation path shows here>>>>>
File "/home/sirosen/projects/jsonschema/check-jsonschema/src/check_jsonschema/checker.py", line 73, in _build_result
for err in validator.iter_errors(data):
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 368, in iter_errors
for error in errors:
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_keywords.py", line 295, in properties
yield from validator.descend(
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 416, in descend
for error in errors:
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_keywords.py", line 274, in ref
yield from validator._validate_reference(ref=ref, instance=instance)
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 410, in descend
for k, v in applicable_validators(schema):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/_legacy_keywords.py", line 17, in ignore_ref_siblings
ref = schema.get("$ref")
^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'
This shows an error in jsonschema (not quite at referencing yet!). I had to tinker around with things to find the right trigger conditions, but this seems to do the trick, as a nasty schema:
{
"$schema": "http://json-schema.org/draft-07/schema",
"type": "object",
"definitions": {
"sub": {
"foo": {
"type": "object",
"properties": {
"bar": "invalid"
}
}
}
},
"properties": {
"foo": {
"$ref": "#/definitions/sub/foo"
}
}
}Now, trying to use it triggers an attempt to call .get() on a string:
$ check-jsonschema --schemafile badrefschema.json <(echo '{"foo": {"bar": "baz"}}')
(trimmed trace)
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/jsonschema/validators.py", line 408, in descend
resolver = self._resolver.in_subresource(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/_core.py", line 689, in in_subresource
id = subresource.id()
^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/_core.py", line 225, in id
id = self._specification.id_of(self.contents)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/sirosen/projects/jsonschema/check-jsonschema/.venv/lib/python3.11/site-packages/referencing/jsonschema.py", line 50, in _legacy_dollar_id
id = contents.get("$id")
^^^^^^^^^^^^
AttributeError: 'str' object has no attribute 'get'
Proposed Resolution
Obviously, the best resolution is for such a schema to never get written, since it's invalid. 😁
And I'll be trying to advise the upstream provider of the schema against nesting things under definitions, since it weakens the kind of checking which is possible.
But should various referencing internals be more wary that an input may be a dict, a bool, or something unwanted/unexpected? Or should referencing check for bad inputs more aggressively when loading data?
The goal here is to have a better and clearer error in this case, not to ignore the malformed schema.
I think the best thing here is that one of the levels of document loading in Referencing checks if the value is dict | boolean. If we only consider JSON Schema, it could be a validator on Resource.contents. That makes the most sense to me, conceptually, for JSON Schema, but it ties in non-generic details of that spec (vs more general $ref resolution). Perhaps JSONSchemaResource(Resource) is the right thing, which adds said validation?
(NB: I'm reading a lot of the internals pretty quickly and for the first time, so my ideas here may not hold up.)