Skip to content

Conversation

RaoulSchaffranek
Copy link

Avoid conditionals for defining tagged unions.

This PR simplifies the schema structure of the region and collection listings by replacing a large if-based conditional block with a more concise oneOf declaration. The resulting schemas should be logically equivalent but offer better compatibility with tooling.

Benefits of using oneOf:

  • Easer to read
  • Consistency: This change brings the pointer schemas in line with the existing expression schema, which already uses oneOf.
  • Improved tool support: oneOf has been around longer and is better supported across the ecosystem. Validation and code generation libraries (e.g., Python's datamodel-code-generator) have built-in support for oneOf, but haven't yet caught up with if-blocks.
  • Better compatibility with statically typed languages: Unlike if/then, which often requires complex conditional typing, oneOf maps more directly to union types or class hierarchies.

Background

In our case, we use datamodel-code-generator to generate Python validators from the JSON Schema. This tool supports oneOf but does not yet support if/then, making the refactor necessary for proper code generation.

Additionally, this PR changes an instance of allOf: true to allOf: {}. These are logically equivalent, but the former is a newer convention not yet supported by all tools, including datamodel-code-generator.

Additional considerations

The grouping and expression schemas currently discriminate between their variants based on the presence of specific properties. In contrast, the region schema uses the value of the location field to distinguish its variants. This inconsistency might be worth addressing.

@gnidan
Copy link
Member

gnidan commented May 22, 2025

Thanks for putting this together @RaoulSchaffranek! It's funny you should bring this up; originally the schemas were just tagged unions, but error reporting with oneOf is just awful, so we added the if/then business to address that concern.

The trouble with oneOf, you can imagine, is that automated JSON Schema validators reject invalid oneOfs by [essentially] erroring with "none of these schemas were valid". It then becomes a giant mess to figure out why... n - 1 schemas in the list are supposed to be wrong, but 1 schema is supposed to be right! With even a small handful of composed schemas, especially ones that do complex modeling like ethdebug's pointer schema, it becomes a tremendous waste of time to figure out what field is malformed and where.

But yeah, it's a really big issue... tooling support for JSON Schema is not caught up with draft 2020-12 yet :(. I have high confidence that eventually this will happen... I've watched JSON Schema tooling support lag behind the spec for probably a decade now, and it does slowly chug along.

Anyway, what if there's a different approach? With the way I've done it here in ethdebug/format, I've chosen several different schema design patterns (as you've noted) based on my own perceived ergonomics... but I'll note that my requirement for myself, in doing so, has been to make sure I've formalized the pattern enough for it to be legible on the website. Here's what I mean:

Screenshot 2025-05-21 at 22 34 26

For instance, here with the ethdebug/format/pointer/region schema, as you've identified, I use "location" as a polymorphic discriminator. JSON Schema doesn't provide a satisfactory mechanism to serve as a polymorphic discriminator, so I chose to follow a common convention for this instead. Unfortunately, since it's just a convention, the automatic docusaurus-json-schema-plugin (which the website uses for schema display) does nothing to detect or display "polymorphic discriminators", or really anything other convention. Fortunately, Docusaurus makes it easy to override display logic in all sorts of ways, so I've managed to detect my own conventions and display them with more semantics than JSON Schema offers natively.

To summarize this point about use of various conventions: good observation... yes, I do make liberal use of various JSON Schema design conventions. I am keeping track of them, but they are not formalized or even enumerated. Sorry for the long story, but my point is this: what if this format used modern JSON Schema and just published a properly formalized list of conventions used, to aid in things like automated code generation? I'm even thinking that such an artifact would be useful in machine translation from JSON Schema draft 2020-12 to draft-7 or whatever your tooling is prepared to handle.

@gnidan
Copy link
Member

gnidan commented May 22, 2025

Side note: I think with ethdebug/format, automatic code generation is going to be more trouble than it's worth. I tried this and drove myself mad, and then switched to writing types and type guards by hand (e.g. pointer schema types in TypeScript).

Of course, my telling people "I advise against doing automatic code generation with ethdebug/format schemas" doesn't change the issue here, which I appreciate your bringing to my attention (I've been entirely focused on schema validation quality as the priority).

@gnidan
Copy link
Member

gnidan commented May 23, 2025

(BTW my guess is that the preview job is probably failing because you're on a fork)

@Escarcega1989
Copy link

Avoid conditionals for defining tagged unions.

This PR simplifies the schema structure of the region and collection listings by replacing a large if-based conditional block with a more concise One of declaration. The resulting schemas should be logically equivalent but offer better compatibility with tooling.

Benefits of using One of:

  • Easer to read
  • Consistency: This change brings the pointer schemas in line with the existing expression schema, which already uses One of.
  • Improved tool support: One of has been around longer and is better supported across the ecosystem. Validation and code generation libraries (e.g., Python's degenerate) have built-in support for One of, but haven't yet caught up with if-blocks.
  • Better compatibility with statically typed languages: Unlike if/then, which often requires complex conditional typing, One of maps more directly to union types or class hierarchies.

Background

In our case, we use degenerate to generate Python validators from the JSON Schema. This tool supports One of but does not yet support if/then, making the refactor necessary for proper code generation.

Additionally, this PR changes an instance of Allot: true to All of: {}. These are logically equivalent, but the former is a newer convention not yet supported by all tools, including degenerations.

Additional considerations

The grouping and expression schemas currently discriminate between their variants based on the presence of specific properties. In contrast, the region schema uses the value of the location field to distinguish its variants. This inconsistency might be worth addressing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants