Generate jsonschema from pydantic v2 by joellabes · Pull Request #159 · dbt-labs/dbt-jsonschema

joellabes · 2024-09-02T04:45:13Z

Picks up where #150 and #157 left off. The purpose of this PR is to add Pydantic representations of dbt's YAML files, which can then be used to automatically generate JSON Schema files.

This was instigated because the Pydantic representations can be used as inputs for AI-powered codegen experiences, but in doing so we've also seen that it's easier/less confusing to write Pydantic models than straight JSON Schema because it's less nested and you can achieve inheritance more simply. (As an example, consider the representations of unit testing - a complex if then else block in JSON Schema, but three simpler classes and a union in Pydantic.)

As far as I can tell, the validation is as good as the original versions. Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

….com/dbt-labs/dbt-jsonschema into generate-jsonschema-from-pydantic-v2

DevonFulcher · 2024-09-03T14:10:21Z

generate.py

Did you mean to move this to src/generate.py?

DevonFulcher · 2024-09-03T14:11:15Z

README.md

+These schemas are generated from [pydantic models](https://docs.pydantic.dev/latest/concepts/json_schema/). To make updates, the process is as follows:
+
+1. Create a virtual environment and install the dependencies: `pip install -r requirements.txt`
+2. Make changes to the corresponding pydantic models in `src/latest.py`


I think you meant to reference a directory, not a file

Suggested change

2. Make changes to the corresponding pydantic models in `src/latest.py`

2. Make changes to the corresponding pydantic models in `src/latest`

DevonFulcher · 2024-09-03T14:13:09Z

requirements.txt

@@ -0,0 +1 @@
+datamodel-code-generator


This shouldn't be necessary, right? This package was only necessary to initially transform json-schema to Pydantic, right?

DevonFulcher · 2024-09-03T14:15:43Z

Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

How concerning is this? Is this something we should dig deeper into?

aliceliu · 2024-09-04T18:55:50Z

src/latest/dbt_yml_files.py

+    schema_: Optional[str] = Field(None, alias="schema")
+
+
+class SemanticModel(BaseModel):


Can we follow the same order as https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L163

aliceliu · 2024-09-04T19:02:23Z

src/latest/dbt_yml_files.py

+    agg_time_dimension: Optional[str] = Field(
+        None, pattern="(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$"
+    )
+    create_metric: Optional[bool] = True


I think the default should be None? https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L156

aliceliu · 2024-09-04T19:03:47Z

src/latest/dbt_yml_files.py

+        extra="forbid",
+    )
+    time_granularity: TimeGranularity
+    validity_params: Optional[ValidityParams] = ""


This typing doesn't seem correct? Empty string is not None | ValidityParams

Yeah this is to prevent the autocomplete from outputting null. An alternative is to instead specify the default in the json_schema_extra of the Field, which would mean this default could stay as None. Is the difference semantically important to Instructor?

I don't know if there are semantic differences, but it causes typing errors:

Ahh got it, I don't have whatever extension that is so vs code didn't flag it to me. In that case if we keep this we'll have to put everything in the json_schema_extra space instead which is all good

aliceliu · 2024-09-04T19:05:35Z

src/latest/dbt_yml_files.py

+    type_params: DimensionTypeParams
+
+
+class Dimension(RootModel[Union[CategoricalDimension, TimeDimension]]):


Can we write this as a union like this https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L66

joellabes · 2024-09-06T02:56:32Z

Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

How concerning is this? Is this something we should dig deeper into?

Yeah holding off on moving this forward, pending feedback on the loom i made for drew/marco

aliceliu and others added 11 commits July 25, 2024 12:55

Generate dbt_yml_files from pydantic classes

6484bab

update script, and generate pydantic for all latest files

8ea4b77

add env to gitignore

12573aa

add autogenerated schemas

b99f338

add a few more tests to tests

3fd5b97

remove pyproject toml in favor of requirements.txt

c684478

Incorporate Devon's custom generator on top of Alice's generated models

51e3409

Remove generation for dbt_project.yml which can't be generated

bce1628

Require a character in jinja strings

cb47ad2

Replace optional strings with ""

df5ded7

Handle remaining incorrect defaults

fe143ae

This was referenced Sep 2, 2024

Added RemoveNullsGenerateJsonSchema #157

Closed

Generate dbt_yml_files from pydantic classes #150

Closed

joellabes added 4 commits September 2, 2024 17:00

Adopt courtney's changes from 155

eef936c

actually run codegen

d9dd764

Merge branch 'main' into generate-jsonschema-from-pydantic-v2

2254ebc

less confusing freshness period name

603b8f7

joellabes requested review from DevonFulcher, aliceliu and dave-connors-3 September 2, 2024 05:02

joellabes added 4 commits September 2, 2024 17:05

$schema not schema

a6b4e46

Merge branch 'generate-jsonschema-from-pydantic-v2' of https://github…

7b00793

….com/dbt-labs/dbt-jsonschema into generate-jsonschema-from-pydantic-v2

timegranualirty optional

97e850a

set default too

2f1b25e

DevonFulcher reviewed Sep 3, 2024

View reviewed changes

generate.py

Copy link
Copy Markdown

Contributor

DevonFulcher Sep 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to move this to src/generate.py?

DevonFulcher reviewed Sep 3, 2024

View reviewed changes

aliceliu reviewed Sep 4, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generate jsonschema from pydantic v2#159

Generate jsonschema from pydantic v2#159
joellabes wants to merge 19 commits intomainfrom
generate-jsonschema-from-pydantic-v2

joellabes commented Sep 2, 2024

Uh oh!

DevonFulcher Sep 3, 2024

Uh oh!

DevonFulcher Sep 3, 2024

Uh oh!

DevonFulcher Sep 3, 2024

Uh oh!

DevonFulcher commented Sep 3, 2024

Uh oh!

aliceliu Sep 4, 2024

Uh oh!

aliceliu Sep 4, 2024

Uh oh!

aliceliu Sep 4, 2024

Uh oh!

joellabes Sep 6, 2024

Uh oh!

aliceliu Sep 6, 2024

Uh oh!

joellabes Sep 6, 2024

Uh oh!

aliceliu Sep 4, 2024

Uh oh!

joellabes commented Sep 6, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	2. Make changes to the corresponding pydantic models in `src/latest.py`
	2. Make changes to the corresponding pydantic models in `src/latest`

		@@ -0,0 +1 @@
		datamodel-code-generator No newline at end of file

		schema_: Optional[str] = Field(None, alias="schema")


		class SemanticModel(BaseModel):

		type_params: DimensionTypeParams


		class Dimension(RootModel[Union[CategoricalDimension, TimeDimension]]):

Conversation

joellabes commented Sep 2, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DevonFulcher commented Sep 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joellabes commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joellabes commented Sep 6, 2024 •

edited

Loading