|
| 1 | +--- |
| 2 | +title: "Exploring Code Generation with JsonSchema.Net.CodeGeneration" |
| 3 | +date: 2023-10-02 09:00:00 +1200 |
| 4 | +tags: [json-schema, codegen] |
| 5 | +toc: true |
| 6 | +pin: false |
| 7 | +--- |
| 8 | + |
| 9 | +About a month ago, my first foray into the world of code generation was published with the extension library JsonSchema.Net.CodeGeneration. For this post, I'd like to dive into the process a little to show how it works. Hopefully, this will give better insight on how to use it as well. |
| 10 | + |
| 11 | +This library currently serves as an exploration platform for the [JSON Scheam IDL Vocab](https://github.com/json-schema-org/vocab-idl/issues/47) work, which aims to create a new vocabulary designed to help support translating between code and schemas (both ways). |
| 12 | + |
| 13 | +## Extracting type information |
| 14 | + |
| 15 | +The first step in the code generation process is determining what the schema is trying to model. This library uses a complex set of mini-meta-schemas to identify supported patterns. |
| 16 | + |
| 17 | +> A meta-schema is just a schema that validates another schema. |
| 18 | +{: .prompt-tip } |
| 19 | + |
| 20 | +For example, [in most languages](https://github.com/json-schema-org/vocab-idl/issues/43), enumerations are basically just named constants. The ideal JSON Schema representation of this would be a schema with an `enum`. So .Net's `System.DayOfWeek` enum could be modelled like this: |
| 21 | + |
| 22 | +```json |
| 23 | +{ |
| 24 | + "title": "DayOfWeek", |
| 25 | + "enum": [ "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday" ] |
| 26 | +} |
| 27 | +``` |
| 28 | + |
| 29 | +To identify this schema as defining an enumeration, we'd need a meta-schema that looks like this: |
| 30 | + |
| 31 | +```json |
| 32 | +{ |
| 33 | + "type": "object", |
| 34 | + "title": "MyEnum", |
| 35 | + "properties": { |
| 36 | + "enum": { |
| 37 | + "type": "array" |
| 38 | + } |
| 39 | + }, |
| 40 | + "required": [ "enum" ] |
| 41 | +} |
| 42 | +``` |
| 43 | + |
| 44 | +However, in JSON Schema, an `enum` item can be _any_ JSON value, whereas most languages require strings. So, we also want to ensure that the values of that `enum` are strings. |
| 45 | + |
| 46 | +```json |
| 47 | +{ |
| 48 | + "type": "object", |
| 49 | + "title": "MyEnum", |
| 50 | + "properties": { |
| 51 | + "enum": { |
| 52 | + "items": { "type": "string" } |
| 53 | + } |
| 54 | + }, |
| 55 | + "required": [ "enum" ] |
| 56 | +} |
| 57 | +``` |
| 58 | + |
| 59 | +> We don't need to include `type` or `uniqueItems` because we know the data is a schema, and its meta-schema (e.g. Draft 2020-12) already has those constraints. We only need to define constraints _on top of_ what the schema's meta-schema defines. |
| 60 | +{: .prompt-info } |
| 61 | + |
| 62 | +Now that we have the idea, we can expand this by defining mini-meta-schemas for all of the patterns we want to support. There are some that are pretty easy, only needing the `type` keyword: |
| 63 | + |
| 64 | +- string |
| 65 | +- number |
| 66 | +- integer |
| 67 | +- boolean |
| 68 | + |
| 69 | +And there are some that are a bit more complex: |
| 70 | + |
| 71 | +- arrays |
| 72 | +- dictionaries |
| 73 | +- custom objects (inheritable and non-inheritable) |
| 74 | + |
| 75 | +And we also want to be able to handle references. |
| 76 | + |
| 77 | +The actual schemas that were used are listed in the [docs](https://docs.json-everything.net/schema/codegen/mini-meta-schemas/). As with any documentation, I hope to keep these up-to-date, but short of that, you can always look at the [source](https://github.com/gregsdennis/json-everything/blob/master/JsonSchema.CodeGeneration/Model/ModelGenerator.cs). |
| 78 | + |
| 79 | +## Building type models |
| 80 | + |
| 81 | +Now that we have the different kinds of schemas that we want to support, we need to represent them in a sort of type model from which we can generate code. |
| 82 | + |
| 83 | +The idea behind the library was to be able to generate multiple code writers that could support just about any language, so .Net's type system (i.e. `System.Type`) isn't quite the right model. |
| 84 | + |
| 85 | +The type model as it stands has the following: |
| 86 | + |
| 87 | +- `TypeModel` - Serves as a base class for the others while also supporting our simple types. This basically just exposes a type name property. |
| 88 | +- `EnumModel` - Each value has a name and an integer value derived from the item's index. |
| 89 | +- `ArrayModel` - Exposes a property to track the item type. |
| 90 | +- `DictionaryModel` - Exposes properties to track key and item types. |
| 91 | +- `ObjectModel` - Handles both open and closed varieties. Each property has a name, a type, and whether it can read/write. |
| 92 | + |
| 93 | +Whenever we encounter a subschema or a reference, that represents a new type for us to generate. |
| 94 | + |
| 95 | +Lastly, in order to avoid duplication, we set up some equality for type models. |
| 96 | + |
| 97 | +With this all of the types supported by this library can be modelled. As more patterns are identified, this modelling system can be expanded as needed. |
| 98 | + |
| 99 | +## Writing code |
| 100 | + |
| 101 | +The final step for code generation is the part everyone cares about: actually writing code. |
| 102 | + |
| 103 | +The library defines `ICodeWriter` which exposes two methods: |
| 104 | + |
| 105 | +- `TransformName()` - Takes a JSON string and transforms it into a language-compatible nme. |
| 106 | +- `Write()` - Renders a type model into a type declaration in the language. |
| 107 | + |
| 108 | +There's really quite a bit of freedom in how this is implemented. The [built-in C# writer](https://github.com/gregsdennis/json-everything/blob/master/JsonSchema.CodeGeneration/Language/CSharpCodeWriter.cs) branches on the different model types and has private methods to handle each one. |
| 109 | + |
| 110 | +One aspect to writing types that I hadn't thought about when I started writing the library was that there's a difference between writing the usage of a type and writing the declaration of a type. Before, when I thought about code generation, I typically thought it was about writing type declarations: you have a schema, and you generate a class for it. But what I found was that if the properties of an object also use any of the generated types, only the type name needs to be written. |
| 111 | + |
| 112 | +For example, for the `DayOfWeek` enumeration we discussed before, the declaration is |
| 113 | + |
| 114 | +```c# |
| 115 | +public enum DayOfWeek |
| 116 | +{ |
| 117 | + Sunday, |
| 118 | + Monday, |
| 119 | + Tuesday, |
| 120 | + Wednesday, |
| 121 | + Thursday, |
| 122 | + Friday, |
| 123 | + Saturday |
| 124 | +} |
| 125 | +``` |
| 126 | + |
| 127 | +But if I have an array of them, I need to generate `DayOfWeek[]`, which only really needs the type name. So my writer needs to be smart enough to write the declaration once and write just the name any time it's used. |
| 128 | + |
| 129 | +There are a couple of other little nuance behaviors that I added in, and I encourage you to read the [docs](https://docs.json-everything.net/schema/codegen/schema-codegen/) on the capabilities. |
| 130 | + |
| 131 | +## Generating a conclusion |
| 132 | + |
| 133 | +Overall, writing this was an enjoyable experience. I found a simple architecture that seems to work well and is also extensible. |
| 134 | + |
| 135 | +My hope is that this library will help inform the IDL Vocab effort back in JSON Schema Land. It's useful having a place to test things. |
0 commit comments