Skip to content

Commit 8020154

Browse files
committed
add codegen post
1 parent 8d63388 commit 8020154

File tree

1 file changed

+135
-0
lines changed

1 file changed

+135
-0
lines changed
Lines changed: 135 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,135 @@
1+
---
2+
title: "Exploring Code Generation with JsonSchema.Net.CodeGeneration"
3+
date: 2023-10-02 09:00:00 +1200
4+
tags: [json-schema, codegen]
5+
toc: true
6+
pin: false
7+
---
8+
9+
About a month ago, my first foray into the world of code generation was published with the extension library JsonSchema.Net.CodeGeneration. For this post, I'd like to dive into the process a little to show how it works. Hopefully, this will give better insight on how to use it as well.
10+
11+
This library currently serves as an exploration platform for the [JSON Scheam IDL Vocab](https://github.com/json-schema-org/vocab-idl/issues/47) work, which aims to create a new vocabulary designed to help support translating between code and schemas (both ways).
12+
13+
## Extracting type information
14+
15+
The first step in the code generation process is determining what the schema is trying to model. This library uses a complex set of mini-meta-schemas to identify supported patterns.
16+
17+
> A meta-schema is just a schema that validates another schema.
18+
{: .prompt-tip }
19+
20+
For example, [in most languages](https://github.com/json-schema-org/vocab-idl/issues/43), enumerations are basically just named constants. The ideal JSON Schema representation of this would be a schema with an `enum`. So .Net's `System.DayOfWeek` enum could be modelled like this:
21+
22+
```json
23+
{
24+
"title": "DayOfWeek",
25+
"enum": [ "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" ]
26+
}
27+
```
28+
29+
To identify this schema as defining an enumeration, we'd need a meta-schema that looks like this:
30+
31+
```json
32+
{
33+
"type": "object",
34+
"title": "MyEnum",
35+
"properties": {
36+
"enum": {
37+
"type": "array"
38+
}
39+
},
40+
"required": [ "enum" ]
41+
}
42+
```
43+
44+
However, in JSON Schema, an `enum` item can be _any_ JSON value, whereas most languages require strings. So, we also want to ensure that the values of that `enum` are strings.
45+
46+
```json
47+
{
48+
"type": "object",
49+
"title": "MyEnum",
50+
"properties": {
51+
"enum": {
52+
"items": { "type": "string" }
53+
}
54+
},
55+
"required": [ "enum" ]
56+
}
57+
```
58+
59+
> We don't need to include `type` or `uniqueItems` because we know the data is a schema, and its meta-schema (e.g. Draft 2020-12) already has those constraints. We only need to define constraints _on top of_ what the schema's meta-schema defines.
60+
{: .prompt-info }
61+
62+
Now that we have the idea, we can expand this by defining mini-meta-schemas for all of the patterns we want to support. There are some that are pretty easy, only needing the `type` keyword:
63+
64+
- string
65+
- number
66+
- integer
67+
- boolean
68+
69+
And there are some that are a bit more complex:
70+
71+
- arrays
72+
- dictionaries
73+
- custom objects (inheritable and non-inheritable)
74+
75+
And we also want to be able to handle references.
76+
77+
The actual schemas that were used are listed in the [docs](https://docs.json-everything.net/schema/codegen/mini-meta-schemas/). As with any documentation, I hope to keep these up-to-date, but short of that, you can always look at the [source](https://github.com/gregsdennis/json-everything/blob/master/JsonSchema.CodeGeneration/Model/ModelGenerator.cs).
78+
79+
## Building type models
80+
81+
Now that we have the different kinds of schemas that we want to support, we need to represent them in a sort of type model from which we can generate code.
82+
83+
The idea behind the library was to be able to generate multiple code writers that could support just about any language, so .Net's type system (i.e. `System.Type`) isn't quite the right model.
84+
85+
The type model as it stands has the following:
86+
87+
- `TypeModel` - Serves as a base class for the others while also supporting our simple types. This basically just exposes a type name property.
88+
- `EnumModel` - Each value has a name and an integer value derived from the item's index.
89+
- `ArrayModel` - Exposes a property to track the item type.
90+
- `DictionaryModel` - Exposes properties to track key and item types.
91+
- `ObjectModel` - Handles both open and closed varieties. Each property has a name, a type, and whether it can read/write.
92+
93+
Whenever we encounter a subschema or a reference, that represents a new type for us to generate.
94+
95+
Lastly, in order to avoid duplication, we set up some equality for type models.
96+
97+
With this all of the types supported by this library can be modelled. As more patterns are identified, this modelling system can be expanded as needed.
98+
99+
## Writing code
100+
101+
The final step for code generation is the part everyone cares about: actually writing code.
102+
103+
The library defines `ICodeWriter` which exposes two methods:
104+
105+
- `TransformName()` - Takes a JSON string and transforms it into a language-compatible nme.
106+
- `Write()` - Renders a type model into a type declaration in the language.
107+
108+
There's really quite a bit of freedom in how this is implemented. The [built-in C# writer](https://github.com/gregsdennis/json-everything/blob/master/JsonSchema.CodeGeneration/Language/CSharpCodeWriter.cs) branches on the different model types and has private methods to handle each one.
109+
110+
One aspect to writing types that I hadn't thought about when I started writing the library was that there's a difference between writing the usage of a type and writing the declaration of a type. Before, when I thought about code generation, I typically thought it was about writing type declarations: you have a schema, and you generate a class for it. But what I found was that if the properties of an object also use any of the generated types, only the type name needs to be written.
111+
112+
For example, for the `DayOfWeek` enumeration we discussed before, the declaration is
113+
114+
```c#
115+
public enum DayOfWeek
116+
{
117+
Sunday,
118+
Monday,
119+
Tuesday,
120+
Wednesday,
121+
Thursday,
122+
Friday,
123+
Saturday
124+
}
125+
```
126+
127+
But if I have an array of them, I need to generate `DayOfWeek[]`, which only really needs the type name. So my writer needs to be smart enough to write the declaration once and write just the name any time it's used.
128+
129+
There are a couple of other little nuance behaviors that I added in, and I encourage you to read the [docs](https://docs.json-everything.net/schema/codegen/schema-codegen/) on the capabilities.
130+
131+
## Generating a conclusion
132+
133+
Overall, writing this was an enjoyable experience. I found a simple architecture that seems to work well and is also extensible.
134+
135+
My hope is that this library will help inform the IDL Vocab effort back in JSON Schema Land. It's useful having a place to test things.

0 commit comments

Comments
 (0)