Skip to content

Conversation

candleindark
Copy link
Contributor

@candleindark candleindark commented Mar 4, 2025

The stable version of schema.json fails to validate against the stable version of metaschema.json if format validation is enabled.

Specifically, running check-jsonschema --schemafile metaschema.json schema.json produces the following output

Schema validation errors were encountered.
  schema.json::$.objects.formats.participant_relative.pattern: '(?!/)(?!sub-)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.participant_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.bids_uri.pattern: 'bids:[0-9a-zA-Z/#:\\?\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.bids_uri: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.stimuli_relative.pattern: '(?!/)(?!stimuli/)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.stimuli_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.file_relative.pattern: '(?!/)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.file_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.dataset_relative.pattern: '(?!/)[0-9a-zA-Z/\\_\\-\\.]+' is not a 'regex'
  schema.json::$.objects.formats.dataset_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  schema.json::$.objects.formats.hed_version.pattern: '^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\ (?:-(?:(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?\\ (?:\\+(?:[0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$' is not a 'regex'
  schema.json::$.objects.formats.hed_version: Unevaluated properties are not allowed ('pattern' was unexpected)

where check-jsonschema is at 0.31.2, and by default, it validates with format validation enabled.

My guess is that BIDS schemas are intended to be validated against the meta schema with format validation enabled. This PR provides a solution to ensure validations of BIDS schemas are done with format validation enabled. Specifically, this PR includes the following changes.

TODOs:

  • Provide a helper function to create jsonschema validators
  • Provide a helper function to create the jsonschema validator for validating BIDS schemas with format validation enabled. Additionally, this function also caches the validator upon the return of the first execution to improve performance.
  • Use the validator from previous step to validate BIDS schemas
  • Include the format extra in the jsonschema package as a dependency to enable format validation when using the package. (See https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-formats for details)
  • Add test_valid_schema_with_check_jsonschema() to test the validity of the BIDS schema against the metaschema using the check-jsonschema CLI. (The check-jsonschema supports the interpretation of regex patterns in the unicode dialect by default.)
  • 4 @effigies to fix up the schema

@candleindark candleindark force-pushed the jsonschema-validator branch from 7597bb7 to f8043f5 Compare March 4, 2025 06:12
Copy link

codecov bot commented Mar 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.07%. Comparing base (78213dc) to head (fdf4534).
Report is 198 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2077      +/-   ##
==========================================
+ Coverage   82.92%   83.07%   +0.14%     
==========================================
  Files          17       17              
  Lines        1511     1524      +13     
==========================================
+ Hits         1253     1266      +13     
  Misses        258      258              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@candleindark candleindark marked this pull request as ready for review March 4, 2025 06:16
@yarikoptic yarikoptic requested review from effigies and tsalo March 5, 2025 19:34
@yarikoptic
Copy link
Collaborator

ping @effigies and @tsalo -- let's get "kosher" again?

@effigies
Copy link
Collaborator

effigies commented Mar 5, 2025

There aren't any changes to the schema or metaschema being proposed, so I don't see how this addresses the stated problem.

I note that this is having problems with the schema in /stable/. If the problem is addressed in latest, then what's there to do?

@candleindark
Copy link
Contributor Author

There aren't any changes to the schema or metaschema being proposed, so I don't see how this addresses the stated problem.

You are right. This PR doesn't correct the problems in the BIDS schema. It only ensures subsequent validations of the BIDS schemas are done with format validation enabled.

I don't think I am the right person to correct any existing problems in the BIDS schema at this point.

@effigies
Copy link
Collaborator

effigies commented Mar 5, 2025

I ran check-jsonschema locally, and this reproduces now. I believe that these are new errors as of python-jsonschema/check-jsonschema#511.

Given that you're not fixing the regexes, shouldn't the goal be for this PR be for CI to fail until they're fixed?

I have a patch that will fix the regexes, but it doesn't make much sense to apply it before we have a test that fails.

@effigies effigies closed this Mar 5, 2025
@effigies effigies reopened this Mar 5, 2025
@effigies
Copy link
Collaborator

effigies commented Mar 5, 2025

(Clicked the wrong comment button.)

@candleindark
Copy link
Contributor Author

I ran check-jsonschema locally, and this reproduces now. I believe that these are new errors as of python-jsonschema/check-jsonschema#511.

You are right. The errors are result of 'regular expression interpretation in "pattern", "patternProperties", and "format": "regex" usages now uses unicode-mode JS regular expressions by default' in check-jsonschema. The following commands produced no error.

(tmp) ➜  tmp check-jsonschema --regex-variant python --schemafile metaschema.json schema.json
ok -- validation done
(tmp) ➜  tmp check-jsonschema --regex-variant nonunicode --schemafile metaschema.json schema.json
ok -- validation done

Given that you're not fixing the regexes, shouldn't the goal be for this PR be for CI to fail until they're fixed?

I just tried using the jsonschema package to validate the BIDS schema against the metaschema, and I was not able to reproduce the errors output by the check-jsonschema CLI. The validation was successful and didn't produce any error at all. In fact, according to python-jsonschema/check-jsonschema#511, to get the default behavior of the check-jsonschema, one has to build a customer validators using the jsonschema package's extend() API. Since this is just a matter of choice regex variant, I don't think it is a good idea, or add much value, to deviate from the default behavior of jsonschema package.

There is an existing test that tests the validation of the BIDS schema against the metaschema, test_valid_schema. It will fail should the BIDS schema fail to validate (now including format validation with this PR).

Though the observed errors that initiated this PR are no longer applicable. I still think the changes in this PR should be included since without them format validation would not run at all in validation of BIDS schemas.

@effigies
Copy link
Collaborator

effigies commented Mar 6, 2025

Though the observed errors that initiated this PR are no longer applicable.

They are still applicable:

󰅂 uv run bst export | uvx check-jsonschema --schemafile src/metaschema.json -
Schema validation errors were encountered.
  -::$.objects.formats.participant_relative.pattern: '(?!/)(?!sub-)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.participant_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.stimuli_relative.pattern: '(?!/)(?!stimuli/)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.stimuli_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.bids_uri.pattern: 'bids:[0-9a-zA-Z/#:\\?\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.bids_uri: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.dataset_relative.pattern: '(?!/)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.dataset_relative: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.hed_version.pattern: '^(?:[a-zA-Z]+:)?(?:[a-zA-Z]+_)?(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\.(?:0|[1-9]\\d*)\\ (?:-(?:(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?\\ (?:\\+(?:[0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$' is not a 'regex'
  -::$.objects.formats.hed_version: Unevaluated properties are not allowed ('pattern' was unexpected)
  -::$.objects.formats.file_relative.pattern: '(?!/)[0-9a-zA-Z+/\\_\\-\\.]+' is not a 'regex'
  -::$.objects.formats.file_relative: Unevaluated properties are not allowed ('pattern' was unexpected)

That's why I was hoping we could get to the point of having them appear before I fixed them, so we know the check works.

@candleindark
Copy link
Contributor Author

I will give it another shot, possibly sometime next week though.

@effigies effigies added schema-code Updates or changes to the code used to parse, filter, and render the schema. exclude-from-changelog This item will not feature in the automatically generated changelog labels Mar 7, 2025
This func returns a jsonschema validator that
is bound to the meta schema for validating
 BIDS schemas. Format validation is
enabled in this validator. Additionally, the
validator is cached.
This schema validator have format validation
enabled which is needed to correctly validate
a BIDS schema against the meta schema
To make format validation available in the
`jsonschema` package, the `format` extra
must be included.
See https://python-jsonschema.readthedocs.io/en/latest/validate/#validating-formats
for more details.
@candleindark candleindark force-pushed the jsonschema-validator branch 2 times, most recently from 48dd8b6 to 5151873 Compare March 25, 2025 19:19
@candleindark candleindark force-pushed the jsonschema-validator branch from 5151873 to 0c46207 Compare March 25, 2025 19:22
@candleindark
Copy link
Contributor Author

@effigies I added the test that invokes json-checkschema to validate the BIDS schema against the metaschema. Currently, all supported regex variants in check-jsonschema are tested. If you prefer only the default/unicode variant, please let me know, and I will remove the others.

@yarikoptic
Copy link
Collaborator

@effigies -- we are quite red now. Should you add fixes to this PR to make it green again before the merge? (I added TODO in original description) or how would you like to proceed?

Copy link
Collaborator

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that there are strangely organized fixtures providing constants. I will push the patch to fix the test.

effigies and others added 3 commits April 2, 2025 17:09
Use fixtures are unnecessarily complicated
when the schemas don't change
@effigies
Copy link
Collaborator

effigies commented Apr 8, 2025

Okay, I think this is all set. @candleindark @yarikoptic Do you want to look over my changes before merging?

@candleindark
Copy link
Contributor Author

Okay, I think this is all set. @candleindark @yarikoptic Do you want to look over my changes before merging?

I think it is good to go.

@effigies effigies merged commit 44d8461 into bids-standard:master Apr 9, 2025
27 checks passed
@candleindark candleindark deleted the jsonschema-validator branch April 9, 2025 00:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

exclude-from-changelog This item will not feature in the automatically generated changelog schema-code Updates or changes to the code used to parse, filter, and render the schema.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants