Skip to content

Generate jsonschema from pydantic v2#159

Open
joellabes wants to merge 19 commits intomainfrom
generate-jsonschema-from-pydantic-v2
Open

Generate jsonschema from pydantic v2#159
joellabes wants to merge 19 commits intomainfrom
generate-jsonschema-from-pydantic-v2

Conversation

@joellabes
Copy link
Copy Markdown
Collaborator

Picks up where #150 and #157 left off. The purpose of this PR is to add Pydantic representations of dbt's YAML files, which can then be used to automatically generate JSON Schema files.

This was instigated because the Pydantic representations can be used as inputs for AI-powered codegen experiences, but in doing so we've also seen that it's easier/less confusing to write Pydantic models than straight JSON Schema because it's less nested and you can achieve inheritance more simply. (As an example, consider the representations of unit testing - a complex if then else block in JSON Schema, but three simpler classes and a union in Pydantic.)

As far as I can tell, the validation is as good as the original versions. Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to move this to src/generate.py?

These schemas are generated from [pydantic models](https://docs.pydantic.dev/latest/concepts/json_schema/). To make updates, the process is as follows:

1. Create a virtual environment and install the dependencies: `pip install -r requirements.txt`
2. Make changes to the corresponding pydantic models in `src/latest.py`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to reference a directory, not a file

Suggested change
2. Make changes to the corresponding pydantic models in `src/latest.py`
2. Make changes to the corresponding pydantic models in `src/latest`

@@ -0,0 +1 @@
datamodel-code-generator No newline at end of file
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary, right? This package was only necessary to initially transform json-schema to Pydantic, right?

@DevonFulcher
Copy link
Copy Markdown
Contributor

Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

How concerning is this? Is this something we should dig deeper into?

schema_: Optional[str] = Field(None, alias="schema")


class SemanticModel(BaseModel):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agg_time_dimension: Optional[str] = Field(
None, pattern="(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$"
)
create_metric: Optional[bool] = True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra="forbid",
)
time_granularity: TimeGranularity
validity_params: Optional[ValidityParams] = ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typing doesn't seem correct? Empty string is not None | ValidityParams

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is to prevent the autocomplete from outputting null. An alternative is to instead specify the default in the json_schema_extra of the Field, which would mean this default could stay as None. Is the difference semantically important to Instructor?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there are semantic differences, but it causes typing errors:
Screenshot 2024-09-06 at 10 04 05 AM

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh got it, I don't have whatever extension that is so vs code didn't flag it to me. In that case if we keep this we'll have to put everything in the json_schema_extra space instead which is all good

type_params: DimensionTypeParams


class Dimension(RootModel[Union[CategoricalDimension, TimeDimension]]):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joellabes
Copy link
Copy Markdown
Collaborator Author

joellabes commented Sep 6, 2024

Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

How concerning is this? Is this something we should dig deeper into?

Yeah holding off on moving this forward, pending feedback on the loom i made for drew/marco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants