Skip to content

Improve (better arguments validation, avoiding repeated creation of validator objects, etc) _validate_obj_json() in metadata.py and supporting funcs#278

Merged
yarikoptic merged 7 commits intodandi:masterfrom
candleindark:validate-json
Feb 19, 2025
Merged

Conversation

@candleindark
Copy link
Member

@candleindark candleindark commented Jan 29, 2025

This PR make the following improvement relating to validating metadata instances against a DANDI model's JSON schema.

  1. Redefine the interface of _validate_obj_json() so that it take a jsonschema validator instead of a JSON schema as argument to avoid recreation of validators for the same schema.
  2. Define a utility function, `jsonschema_validator(), for creating an appropriate JSON schema validator given a schema.
    1. Ensure validity of the schema is checked to prevent undefined behavior. (A bug fix)
    2. Utilized the "$schema" property in a provide schema to determine the appropriate validator class
  3. Define a utility function, validate_json(), to validate a data instance in general, without the missing_ok parameter.
  4. Define a utility function, dandi_jsonschema_validator, that decides the default/fallback validator class to used based on the "schemaVersion" property of the given schema instead of the "schemaVersion" field of a given instance. (A bug fix)
  5. Define utility functions, _get_jsonschema_validator()and _get_jsonschema_validator_local(), with cache to avoid repeated creation of jsonschema validators for the same schemas. Note: functools.cache() instead of functools.lru_cache() is used, for improved performance, in these functions because valid inputs to these functions have been limited by validation, so there is no worry about the cache growing without bound.
  6. Reimplement _validate_obj_json() using the aforementioned utility functions
  7. Rename schema_map in metadata.py to SCHEMA_MAP for the reference dictionary is a constant.
  8. Add extensive tests to functions covered in this PR.

@codecov
Copy link

codecov bot commented Jan 29, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.24%. Comparing base (87cc55f) to head (e32f32d).
Report is 8 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #278      +/-   ##
==========================================
- Coverage   97.63%   93.24%   -4.40%     
==========================================
  Files          16       16              
  Lines        1775     1983     +208     
==========================================
+ Hits         1733     1849     +116     
- Misses         42      134      +92     
Flag Coverage Δ
unittests 93.24% <100.00%> (-4.40%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@candleindark candleindark force-pushed the validate-json branch 2 times, most recently from d1bebc7 to 6bf8109 Compare January 29, 2025 21:40
@candleindark candleindark marked this pull request as draft January 31, 2025 08:47
@candleindark candleindark added the bug Something isn't working label Jan 31, 2025
@candleindark candleindark marked this pull request as ready for review January 31, 2025 17:02
@candleindark candleindark changed the title Improve _validate_obj_json() in metadata.py and provide utility funcs for validating with JSON schema Improve _validate_obj_json() in metadata.py and provide utility funcs for validating against JSON schemas Jan 31, 2025
@candleindark candleindark force-pushed the validate-json branch 2 times, most recently from 8f6c760 to d4d802f Compare February 6, 2025 21:47
)

# Ensure the schema is valid
validator_cls.check_schema(schema)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't that be called for the same schema over and over again for every instance?

A util func for validating a metadata
instance of a DANDI model against the
JSON schema of the model
Additionally, replace relative import
with absolute import in `test_util.py`
The relative import was referring to
an ancestor. See https://docs.astral.sh/ruff/rules/relative-imports/
Improve type annotations. Make boolean
parameter keyword parameter. Make it
more correct and robust by using
`validate_json()` underneath.
@candleindark candleindark added patch Increment the patch version when merged and removed bug Something isn't working labels Feb 17, 2025
Refactor code to allow caching of jsonschema validators.
Specifically, `_get_jsonschema_validator()` caches
validators for schemas downloaded from the `dandi/schema`
repo, and `_get_jsonschema_validator_local()` caches
validator for schemas generated from locally defined
Pydantic models. In this refactoring, additional
appropriate validations of arguments to involved funcs
are added.
@candleindark candleindark changed the title Improve _validate_obj_json() in metadata.py and provide utility funcs for validating against JSON schemas Improve _validate_obj_json() in metadata.py and supporting funcs Feb 18, 2025
@candleindark
Copy link
Member Author

@yarikoptic I think the code is good now. However, I did quite a bit of validation of function args. I could have used @validate_call from Pydantic to remove some code duplications and boilerplates. Let me know if you want me to do that as well, but I think I spent enough time on this PR already.

@candleindark candleindark added the performance Improve performance of an existing feature label Feb 18, 2025
@yarikoptic yarikoptic changed the title Improve _validate_obj_json() in metadata.py and supporting funcs Improve (better arguments validation, avoiding repeated creation of validator objects, etc) _validate_obj_json() in metadata.py and supporting funcs Feb 19, 2025
@yarikoptic yarikoptic merged commit f0be592 into dandi:master Feb 19, 2025
38 of 39 checks passed
@candleindark candleindark deleted the validate-json branch February 20, 2025 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

patch Increment the patch version when merged performance Improve performance of an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants