Skip to content

Commit c14ec26

Browse files
yarikopticclaudemvandenburghcandleindarkjjnesbitt
authored
Local runtime schema serialization endpoints (#2386)
* Design document with a diagram of metadata life cycle Metadata lifecycle inspired by the one we created for BIDS: see bids-standard/bids-website#626 * Remove duplication with now present vendorization issue * Address all questions and detail implementation more * Implement local schema serialization endpoints - Add new API endpoints to serialize JSONSchema at runtime - Update info endpoint to use local schema URL instead of GitHub - Use TypeAdapter for proper Pydantic schema generation - Add tests for schema endpoints - Add implementation documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * doc: Add mermaid diagram for updated schema architecture - Add visual representation of the new schema flow - Highlight key differences from the current approach - Show elimination of dependency on external schema repository 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Provide custom name for django testserver instance The default name django uses is "testserver" and then URL becomes http://testserver, but DJANGO itself does not consider it to be a valid URL because it lacks TLD. So whenever we add "schema_url" pointing to our server (in tests just "testserver") DJANGO starts raising validation errors. See encode/django-rest-framework#9705 for more information/rationale. * RF: refactor auto-AI-generated tests into parametrized ones to concetrate the testing logic and also to accent on the differences (parameters of the test) * Add GitHub schema comparison to test_schema_latest Extends test to compare local runtime-generated schema against static GitHub schema content. This test is expected to fail, demonstrating that vendorized schemas differ from static schemas due to runtime customizations. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Use TransitionalGenerateJsonSchema with proper type annotations Refactors schema generation to match dandischema approach and adds type safety. * Fix schema endpoints to use regular Dandiset/Asset models Changes from Published models to regular models to match GitHub static schemas: - Use Dandiset instead of PublishedDandiset - Use Asset instead of PublishedAsset - Update test parameterization accordingly Test now shows minimal vendorization difference (repository default value) demonstrating the expected runtime customization vs static schema difference. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Adjust test to handle vendorization differences Add logic to normalize vendorization differences before schema comparison: - Adjust repository default value from runtime config to match GitHub schema - Test now passes, confirming schemas are equivalent after vendorization adjustment - Demonstrates successful runtime schema generation with expected customizations 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Add published schema endpoints and refactor to reduce code duplication Extends the schema API to support both regular and published models: New endpoints: - /api/schema/latest/published-dandiset/ - /api/schema/<version>/published-dandiset/ - /api/schema/latest/published-asset/ - /api/schema/<version>/published-asset/ Refactoring improvements: - Extract common logic into _schema_view_impl() helper function - Update URL patterns and view exports Testing enhancements: - Extend parametrized tests to cover published schema endpoints 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com> * Fixup to mermaid diagram In d41f24e I committed a little rushed tune up by claude and it replaced "current state" diagram with the actually implemented, and then for implemented it just removed having jsonschema serializations altogether. Since we did merges into this branch, rebasing is "tricky", so I decided to just push a fixup commit. * Remove portion of the test checking for identity to released schemas * chore: remove unused import This resolve a complain from ruff * style: use `settings.DANDI_API_URL` to construct schema URL There are already consistent uses of `settings.DANDI_API_URL` for building URLs in the codebase. Check out #2386 (comment) detailed rationale behind this change. Co-authored-by: Mike VanDenburgh <37340715+mvandenburgh@users.noreply.github.com> * Don't re-implement get_schema_url in tests * Unify schema endpoints * Remove version query param from schema endpoint * Update wording Co-authored-by: Isaac To <candleindark@users.noreply.github.com> * Add test against unsupported model values * Update schema endpoint swagger docs * Add endpoint for listing available models * Fix view import/export * feat: change endpoints related to serving schemas * doc: update design docs regarding endpoint changes. * docs: remove mention of `TypeAdapter` in `vendored-schema-1-implementation.md` Pydantic models can generate their own JSONSchema, and we do generate the JSONSchema directly from the models. TypeAdapter in Pydantic has other uses, but we are not using it here. * docs: consolidate vendor-configurable metadata models design docs into one file --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Mike VanDenburgh <michael.vandenburgh@kitware.com> Co-authored-by: Isaac To <isaac.chun.to@gmail.com> Co-authored-by: Isaac To <candleindark@users.noreply.github.com> Co-authored-by: Mike VanDenburgh <37340715+mvandenburgh@users.noreply.github.com> Co-authored-by: Jacob Nesbitt <jjnesbitt2@gmail.com>
1 parent 2006da3 commit c14ec26

File tree

7 files changed

+281
-11
lines changed

7 files changed

+281
-11
lines changed

dandiapi/api/tests/test_info.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,16 @@
33
from django.conf import settings
44

55
from dandiapi import __version__
6-
from dandiapi.api.views.info import schema_url
6+
from dandiapi.api.views.info import get_schema_url
77

88

99
def test_rest_info(api_client):
1010
resp = api_client.get('/api/info/')
1111
assert resp.status_code == 200
12+
13+
# Get the expected schema URL
14+
schema_url = get_schema_url()
15+
1216
assert resp.json() == {
1317
'schema_version': settings.DANDI_SCHEMA_VERSION,
1418
'schema_url': schema_url,

dandiapi/api/tests/test_schema.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
from __future__ import annotations
2+
3+
from dandischema.models import Asset, CommonModel, Dandiset, PublishedAsset, PublishedDandiset
4+
from dandischema.utils import TransitionalGenerateJsonSchema
5+
import pytest
6+
7+
8+
@pytest.mark.parametrize(
9+
'model',
10+
[
11+
Dandiset,
12+
Asset,
13+
PublishedDandiset,
14+
PublishedAsset,
15+
],
16+
)
17+
def test_schema_latest(api_client, model: CommonModel):
18+
"""Test that the schema endpoints return valid schemas."""
19+
resp = api_client.get('/api/schemas/', {'model': model.__name__})
20+
assert resp.status_code == 200
21+
22+
# Verify that the schema is json and has core properties
23+
schema = resp.json()
24+
assert isinstance(schema, dict)
25+
assert 'properties' in schema
26+
assert 'title' in schema
27+
28+
# Compare with expected schema from pydantic using same generator as dandischema
29+
expected_schema = model.model_json_schema(schema_generator=TransitionalGenerateJsonSchema)
30+
assert schema == expected_schema
31+
32+
33+
def test_schema_unsupported_model(api_client):
34+
"""Test that the schema endpoint returns an error when passed invalid choice."""
35+
resp = api_client.get('/api/schemas/', {'model': 'NotAValidModel'})
36+
assert resp.status_code == 400

dandiapi/api/views/__init__.py

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from .info import info_view
99
from .robots import robots_txt_view
1010
from .root import root_content_view
11+
from .schema import schema_list_view, schema_view
1112
from .stats import stats_view
1213
from .upload import (
1314
blob_read_view,
@@ -29,13 +30,11 @@
2930
'authorize_view',
3031
'blob_read_view',
3132
'info_view',
32-
'info_view',
3333
'mailchimp_csv_view',
3434
'robots_txt_view',
35-
'robots_txt_view',
3635
'root_content_view',
37-
'root_content_view',
38-
'stats_view',
36+
'schema_list_view',
37+
'schema_view',
3938
'stats_view',
4039
'upload_complete_view',
4140
'upload_initialize_view',

dandiapi/api/views/info.py

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,30 @@
11
from __future__ import annotations
22

3+
from urllib.parse import ParseResult, urlencode, urlparse, urlunparse
4+
35
from django.conf import settings
6+
from django.urls import reverse
47
from drf_yasg.utils import no_body, swagger_auto_schema
58
from rest_framework import serializers
69
from rest_framework.decorators import api_view
710
from rest_framework.response import Response
811

912
from dandiapi import __version__
1013

11-
schema_url = (
12-
'https://raw.githubusercontent.com/dandi/schema/master/'
13-
f'releases/{settings.DANDI_SCHEMA_VERSION}/dandiset.json'
14-
)
14+
15+
def get_schema_url():
16+
"""Get the URL for the schema based on current server deployment."""
17+
scheme, netloc = urlparse(settings.DANDI_API_URL)[:2]
18+
return urlunparse(
19+
ParseResult(
20+
scheme=scheme,
21+
netloc=netloc,
22+
path=reverse('schema-view'),
23+
query=urlencode({'model': 'Dandiset'}),
24+
params='',
25+
fragment='',
26+
)
27+
)
1528

1629

1730
class ApiServiceSerializer(serializers.Serializer):
@@ -55,12 +68,12 @@ def __init__(self, *args, **kwargs):
5568
method='GET',
5669
)
5770
@api_view()
58-
def info_view(self):
71+
def info_view(request):
5972
api_url = f'{settings.DANDI_API_URL}/api'
6073
serializer = ApiInfoSerializer(
6174
data={
6275
'schema_version': settings.DANDI_SCHEMA_VERSION,
63-
'schema_url': schema_url,
76+
'schema_url': get_schema_url(),
6477
'version': __version__,
6578
'cli-minimal-version': '0.60.0',
6679
'cli-bad-versions': [],

dandiapi/api/views/schema.py

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
from __future__ import annotations
2+
3+
from typing import TYPE_CHECKING
4+
5+
from dandischema.models import Asset, Dandiset, PublishedAsset, PublishedDandiset
6+
from dandischema.utils import TransitionalGenerateJsonSchema
7+
from drf_yasg.utils import swagger_auto_schema
8+
from rest_framework import serializers
9+
from rest_framework.decorators import api_view
10+
from rest_framework.response import Response
11+
12+
if TYPE_CHECKING:
13+
from rest_framework.request import Request
14+
15+
16+
_model_name_mapping = {
17+
m.__name__: m
18+
for m in [
19+
Dandiset,
20+
Asset,
21+
PublishedDandiset,
22+
PublishedAsset,
23+
]
24+
}
25+
26+
27+
class SchemaQuerySerializer(serializers.Serializer):
28+
model = serializers.ChoiceField(choices=list(_model_name_mapping))
29+
30+
31+
@swagger_auto_schema(method='GET', operation_summary='List schema models')
32+
@api_view(['GET'])
33+
def schema_list_view(request: Request) -> Response:
34+
"""Return the list of models which can be requested via the schema endpoint."""
35+
return Response(_model_name_mapping.keys())
36+
37+
38+
@swagger_auto_schema(
39+
method='GET',
40+
operation_summary='Get model schema',
41+
operation_description='Returns the JSON Schema of the requested metadata model',
42+
query_serializer=SchemaQuerySerializer,
43+
)
44+
@api_view(['GET'])
45+
def schema_view(request: Request) -> Response:
46+
"""
47+
Return the JSON Schema of the requested metadata model.
48+
49+
This endpoint returns the JSON Schema of the requested metadata model
50+
as it is defined in this DANDI archive instance, with instance specific
51+
parameters such as instance name and DOI prefix.
52+
"""
53+
serializer = SchemaQuerySerializer(data=request.query_params)
54+
serializer.is_valid(raise_exception=True)
55+
56+
# Generate the JSON schema using the same approach as dandischema
57+
model_class = _model_name_mapping[serializer.validated_data['model']]
58+
schema = model_class.model_json_schema(schema_generator=TransitionalGenerateJsonSchema)
59+
60+
return Response(schema)

dandiapi/urls.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,8 @@
2323
mailchimp_csv_view,
2424
robots_txt_view,
2525
root_content_view,
26+
schema_list_view,
27+
schema_view,
2628
stats_view,
2729
upload_complete_view,
2830
upload_initialize_view,
@@ -66,6 +68,8 @@
6668
path('api/stats/', stats_view),
6769
path('api/info/', info_view),
6870
path('api/blobs/digest/', blob_read_view, name='blob-read'),
71+
path('api/schemas/available/', schema_list_view, name='schema-list-view'),
72+
path('api/schemas/', schema_view, name='schema-view'),
6973
path('api/uploads/initialize/', upload_initialize_view, name='upload-initialize'),
7074
re_path(
7175
r'api/uploads/(?P<upload_id>[0-9a-f\-]{36})/complete/',
Lines changed: 154 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
# Vendor-Configurable Metadata Models
2+
3+
This document illustrates the limitations originate from the metadata models in the DANDI ecosystem with
4+
[dandi-schema](https://github.com/dandi/dandi-schema) before version 0.12.0 and outlines the solution
5+
of vendor-configurable metadata models implemented across multiple components of the ecosystem,
6+
with the implementation starting and coordinated with dandi-schema at v0.12.0, to overcome these
7+
limitations.
8+
9+
## Limitations
10+
11+
The current metadata models and their uses in the DANDI ecosystem have the following limitations:
12+
13+
1. We do not actually support or use multiple versions of the metadata models in dandi-archive.
14+
2. We use two representations of metadata models – Pydantic models and their respective JSON Schema derivatives – and
15+
rely on an external process to generate the JSON Schema representation from the Pydantic models.
16+
3. We manually trigger updates of web frontend files according to a specific version of the JSON Schema representation of the models.
17+
4. We hardcode vendor-specific information inside the dandi-archive codebase (backend and frontend).
18+
5. Any vendor-specific configuration done at runtime in Pydantic models is not reflected in the JSON Schema representation used
19+
by the web frontend since the web frontend uses a static versioned JSON Schema representation stored at
20+
[dandi/schema](https://github.com/dandi/schema) that has been generated by the external process mentioned in point 2.
21+
22+
```mermaid
23+
flowchart TD
24+
%% repositories as grouped nodes
25+
subgraph dandi_schema_repo["<a href='https://github.com/dandi/dandi-schema/'>dandi/dandi-schema</a>"]
26+
Pydantic["Pydantic Models"]
27+
end
28+
29+
subgraph schema_repo["<a href='https://github.com/dandi/schema/'>dandi/schema</a>"]
30+
JSONSchema["JSON Schema<br>serializations"]
31+
32+
end
33+
34+
subgraph dandi_cli_repo["<a href='https://github.com/dandi/dandi-cli'>dandi-cli</a>"]
35+
CLI["CLI & Library<br>validation logic<br/>(Python)"]
36+
end
37+
38+
subgraph dandi_archive_repo["<a href='https://github.com/dandi/dandi-archive/'>dandi-archive</a>"]
39+
Meditor["Web UI<br/>Metadata Editor<br/>(meditor; Vue)"]
40+
API["Archive API<br/>(Python; DJANGO)"]
41+
Storage[("DB (Postgresql)")]
42+
end
43+
44+
%% main flow
45+
Pydantic -->|"serialize into<br/>(CI)"| JSONSchema
46+
Pydantic -->|used to validate| CLI
47+
Pydantic -->|used to validate| API
48+
49+
JSONSchema -->|used to produce| Meditor
50+
JSONSchema -->|used to validate| Meditor
51+
Meditor -->|submits metadata| API
52+
53+
CLI -->|used to upload & submit metadata| API
54+
55+
API <-->|metadata JSON| Storage
56+
57+
%% styling
58+
classDef repo fill:#f9f9f9,stroke:#333,stroke-width:1px;
59+
classDef code fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
60+
classDef ui fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px;
61+
classDef data fill:#fff3e0,stroke:#e65100,stroke-width:1px;
62+
JSONSchema@{ shape: docs }
63+
64+
class dandi_schema_repo,schema_repo,dandi_cli_repo,dandi_archive_repo repo;
65+
class Pydantic,CLI,API code;
66+
class JSONSchema,Storage data;
67+
class Meditor ui;
68+
```
69+
> The diagram above depicts how metadata models are defined, represented, and used in the DANDI ecosystem before version 0.12.0 of dandi-schema.
70+
71+
## Solution to Overcome Limitations
72+
73+
The solution of vendor-configurable metadata model addresses the limitations outlined above by implementing the following changes:
74+
75+
1. Make the metadata models vendor-configurable in dandi-schema.
76+
2. Create an API endpoint at `/api/schemas/` in dandi-archive to distribute dynamically generated JSON Schema representations of the metadata models
77+
from the Pydantic models that define the metadata models at runtime.
78+
3. Update the `/api/info/` endpoint at dandi-archive
79+
1. to point to the endpoint in point 2 for retrieving a JSON Schema representation of the metadata models
80+
instead of a static versioned JSON Schema representation stored at [dandi/schema](https://github.com/dandi/schema).
81+
2. to include vendor-specific information.
82+
4. Remove any hardcoded vendor-specific information in dandi-archive, and use the vendor-specific configuration
83+
provided by dandi-schema.
84+
5. Have clients, such as the web UI (meditor), use the endpoints in point 2 and 3 to retrieve vendor-specific configurations
85+
and JSON Schema representations of the metadata models that match the metadata models used by the particular instance of dandi-archive.
86+
87+
88+
```mermaid
89+
flowchart TD
90+
%% repositories as grouped nodes
91+
subgraph dandi_schema_repo["<a href='https://github.com/dandi/dandi-schema/'>dandi/dandi-schema</a>"]
92+
Pydantic["Pydantic Models"]
93+
end
94+
95+
subgraph schema_repo["<a href='https://github.com/dandi/schema/'>dandi/schema</a>"]
96+
JSONSchema["JSON Schema<br>serializations"]
97+
98+
end
99+
100+
subgraph dandi_archive_repo["<a href='https://github.com/dandi/dandi-archive/'>dandi-archive</a>"]
101+
Meditor["Web UI<br/>Metadata Editor<br/>(meditor; Vue)"]
102+
API["Archive API<br/>(Python; DJANGO)"]
103+
SchemaEndpoint["Schema API Endpoint<br/>(JSONSchema)"]
104+
Storage[("DB (Postgresql)")]
105+
end
106+
107+
subgraph dandi_cli_repo["<a href='https://github.com/dandi/dandi-cli'>dandi-cli</a>"]
108+
CLI["CLI & Library<br>validation logic<br/>(Python)"]
109+
end
110+
111+
%% main flow
112+
Pydantic -->|"serialize into<br/>(CI)"| JSONSchema
113+
Pydantic -->|used to validate| CLI
114+
Pydantic -->|used to validate| API
115+
116+
API -->|serialize at runtime| SchemaEndpoint
117+
SchemaEndpoint -->|used to produce| Meditor
118+
SchemaEndpoint -->|used to validate| Meditor
119+
120+
Meditor -->|submits metadata| API
121+
CLI -->|used to upload & submit metadata| API
122+
API <-->|metadata JSON| Storage
123+
124+
%% styling
125+
classDef repo fill:#f9f9f9,stroke:#333,stroke-width:1px;
126+
classDef code fill:#e1f5fe,stroke:#0277bd,stroke-width:1px;
127+
classDef ui fill:#e8f5e9,stroke:#2e7d32,stroke-width:1px;
128+
classDef data fill:#fff3e0,stroke:#e65100,stroke-width:1px;
129+
JSONSchema@{ shape: docs }
130+
131+
class dandi_schema_repo,dandi_cli_repo,dandi_archive_repo repo;
132+
class Pydantic,CLI,API,SchemaEndpoint code;
133+
class JSONSchema,Storage data;
134+
class Meditor ui;
135+
```
136+
> The diagram above depicts how metadata models are defined, represented, and used in the DANDI ecosystem starting with version 0.12.0 of dandi-schema.
137+
138+
139+
### Benefits
140+
141+
This implementation provides several benefits:
142+
143+
1. **Runtime Consistency**: The schema used by the frontend will always match the one used by the backend, including any vendor-specific configurations.
144+
2. **Simplified Deployment**: No need to manually update JSON Schema files or manage the [dandi-schema](https://github.com/dandi/dandi-schema) repository for storing
145+
JSON Schema representations of the metadata models.
146+
3. **Future-Proofing**: The implementation allows for future support of multiple schema versions if needed.
147+
4. **Reduced Dependencies**: Removes the dependency on external GitHub URLs for schema definitions.
148+
149+
### Immediate Steps and Longer-Term Opportunities
150+
151+
The immediate implementation supports distributing, through an API endpoint, the JSON Schema representations of only the vendor-configurable metadata models currently being used
152+
by the particular instance of dandi-archive. However, the API endpoint has been structured to allow support for multiple versions of JSON Schema representations in the future if needed.
153+
154+
Additionally, the JSON-LD context.json could also be similarly generated and served by the backend if needed in the future.

0 commit comments

Comments
 (0)