Conversation
….com/dbt-labs/dbt-jsonschema into generate-jsonschema-from-pydantic-v2
There was a problem hiding this comment.
Did you mean to move this to src/generate.py?
| These schemas are generated from [pydantic models](https://docs.pydantic.dev/latest/concepts/json_schema/). To make updates, the process is as follows: | ||
|
|
||
| 1. Create a virtual environment and install the dependencies: `pip install -r requirements.txt` | ||
| 2. Make changes to the corresponding pydantic models in `src/latest.py` |
There was a problem hiding this comment.
I think you meant to reference a directory, not a file
| 2. Make changes to the corresponding pydantic models in `src/latest.py` | |
| 2. Make changes to the corresponding pydantic models in `src/latest` |
| @@ -0,0 +1 @@ | |||
| datamodel-code-generator No newline at end of file | |||
There was a problem hiding this comment.
This shouldn't be necessary, right? This package was only necessary to initially transform json-schema to Pydantic, right?
How concerning is this? Is this something we should dig deeper into? |
| schema_: Optional[str] = Field(None, alias="schema") | ||
|
|
||
|
|
||
| class SemanticModel(BaseModel): |
There was a problem hiding this comment.
Can we follow the same order as https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L163
| agg_time_dimension: Optional[str] = Field( | ||
| None, pattern="(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$" | ||
| ) | ||
| create_metric: Optional[bool] = True |
There was a problem hiding this comment.
I think the default should be None? https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L156
| extra="forbid", | ||
| ) | ||
| time_granularity: TimeGranularity | ||
| validity_params: Optional[ValidityParams] = "" |
There was a problem hiding this comment.
This typing doesn't seem correct? Empty string is not None | ValidityParams
There was a problem hiding this comment.
Yeah this is to prevent the autocomplete from outputting null. An alternative is to instead specify the default in the json_schema_extra of the Field, which would mean this default could stay as None. Is the difference semantically important to Instructor?
There was a problem hiding this comment.
Ahh got it, I don't have whatever extension that is so vs code didn't flag it to me. In that case if we keep this we'll have to put everything in the json_schema_extra space instead which is all good
| type_params: DimensionTypeParams | ||
|
|
||
|
|
||
| class Dimension(RootModel[Union[CategoricalDimension, TimeDimension]]): |
There was a problem hiding this comment.
Can we write this as a union like this https://github.com/dbt-labs/ai-codegen-api/blob/main/src/codegen/llm_output/semantic_model.py#L66
Yeah holding off on moving this forward, pending feedback on the loom i made for drew/marco |

Picks up where #150 and #157 left off. The purpose of this PR is to add Pydantic representations of dbt's YAML files, which can then be used to automatically generate JSON Schema files.
This was instigated because the Pydantic representations can be used as inputs for AI-powered codegen experiences, but in doing so we've also seen that it's easier/less confusing to write Pydantic models than straight JSON Schema because it's less nested and you can achieve inheritance more simply. (As an example, consider the representations of unit testing - a complex
if then elseblock in JSON Schema, but three simpler classes and a union in Pydantic.)As far as I can tell, the validation is as good as the original versions. Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing
[Object object]being output in VS Code, which I think is a bug in the YAML extension.