From 89940fce50b8b2f0e4683d65b266dd59c18de01c Mon Sep 17 00:00:00 2001 From: Sylvain Wallez Date: Mon, 6 Oct 2025 13:29:26 +0200 Subject: [PATCH 1/2] Add JSON deserialization design doc --- docs/design/0003-json-framework.md | 128 ++++++++++++++++++ .../clients/json/UnionDeserializer.java | 35 +++-- 2 files changed, 154 insertions(+), 9 deletions(-) create mode 100644 docs/design/0003-json-framework.md diff --git a/docs/design/0003-json-framework.md b/docs/design/0003-json-framework.md new file mode 100644 index 000000000..93f62c319 --- /dev/null +++ b/docs/design/0003-json-framework.md @@ -0,0 +1,128 @@ +# JSON framework + +## Context and Problem Statement + +JSON serialization and deserialization are key elements of the Java client's performance (memory and CPU). + +The classic approach in Java used by many libraries is to use reflection to instantiate and populate objects. The major problem of this approach is that reflection is slow. JSON frameworks also need to build complex representations of class structure that can add up in large API surfaces like Elasticsearch's. + +We can leverage the fact that the ES Java client is heavily based on code generation to produce code that avoids reflection all together and uses simpler data structures. + +[Jackson's Afterburner](https://github.com/FasterXML/jackson-modules-base/tree/2.x/afterburner) uses a somehow similar approach, using dynamic bytecode generation at runtime. In our context we know the data structures ahead of time, so we can generate code ahead of time and skip the overhead of runtime bytecode generation. + +## Decision Drivers + +* Limit memory usage +* Avoid costly features like reflection + +## Considered Options + +### Serialization + +Only one option was considered: every object class implements `JsonpSerializable` that has a single `serialize()` method that writes the object to a streaming JSON generator. The code of this method is generated for every class, delegating to common utility classes where needed. + +On top of that, the `JsonpSerializer` interface allows serializing any value, including primitive types and user-provided objects. A `JsonpMapper` can lookup a serializer for any value type. + +The rest of this document will address deserialization. + +### Deserialization + +Deserialization is more involved than serialization, as we must deal with complex JSON that sometimes allows representing a single structure using different forms (e.g. single-element arrays as a single value, property shortcuts, strings for any scalar type, etc.) + +We considered two levels of code generation: + +* generate a custom deserializer function per type that reads and parses the JSON stream +* generate the construction of an object that handles the deserialization and calls setter methods on the object builder, similar to Elasticsearch's `ObjectParser` + +## Decision Outcome + +Generating a custom function per type would be the most performant, since a deserializer object that calls setter methods (as lambda expressions) adds some overhead compared to direct calls. + +However, a code generator is a program that produces a program. Kind of "meta programming". Given the complexity of deserialization and the unknown unknowns at the beginning of this project, we decided to us the second approach (a deserializer object that calls setters) even if it's less performant, in order to speed up development. It's still a lot more performant than using reflection! + +## Detailed design + +### Building blocks + +`JsonpDeserializer` is the common interface for all deserializers it provides two groups of methods: +* methods to know or test if that deserializer accepts a given JSON event. This is useful to disambiguate some variations and of course check that the JSON stream is what is expected. +* methods to deserialize a value, either at the current position in the stream, or from an event that was previously read. + +This interface also provides static deserializers for all scalar types (string, integer, etc.) + +The `ObjectDeserializer` is used to deserialize regular structures: +- it has a map of serializers for every field, with their aliases, +- supports shortcut properties, +- handles `AdditionalProperty` and `AdditionalProperties`, +- handles `SingleKeyDictionary` types that are flattened when represented as Java code. + +### Building an object deserializer + +We'll use `TermQuery` as an illustration as it uses most of `ObjectDeserializer` features. + +The generator produces a `setupDeserializer` method that does the configuration. This separate method is needed when a class has subclasses, as it will also be called to set up deserializers of child classes, as illustrated below by calling the `QueryBase` set up method (and is also why it's `protected`). + +```java +protected static void setupTermQueryDeserializer(ObjectDeserializer op) { + QueryBase.setupQueryBaseDeserializer(op); + op.add(Builder::value, FieldValue._DESERIALIZER, "value"); + op.add(Builder::caseInsensitive, JsonpDeserializer.booleanDeserializer(), "case_insensitive"); + + op.setKey(Builder::field, JsonpDeserializer.stringDeserializer()); + op.shortcutProperty("value", true); +} +``` + +The `setKey` call is the implementation of `SingleKeyDictionary`: it will "lift" the enclosing property name as a value of one the object's field. Like if `{"some-field":{"value":1.0}}` was actually `{"field":"some-field","value":1.0}`. + +The `shortcutProperty` call configures the "value" property as being the shortcut, i.e. `{"some-field":1.0}` is interpreted as `{"some-field":{"value":1.0}}`. + +And then we can create the actual deserializer: + +```java +public static final JsonpDeserializer _DESERIALIZER = + ObjectBuilderDeserializer.lazy( + Builder::new, + TermQuery::setupTermQueryDeserializer + ); +``` + +`ObjectBuilderDeserializer` wraps an `ObjectDeserializer` with the mechanics needed to create a builder, deserialize it, and create the actual object afterward. It is built with the builder's constructor and the setup method. + +### Lazy deserializers + +The `lazy` method builds an implementation of `Deserializer` that will effectively lazily build the `ObjectBuilderDeserializer` the first time it's called. There are two main reasons for this approach. + +#### Circular dependencies and static initializers + +The JVM will run static field initializers as part of the class initialization that happens when a class is first referenced in an application. The static initializers of the parent class and of the classes referenced by the current class's static initializers are called before initializing the class itself. + +While this works for the majority of cases, this can cause issues in the case of recursive dependencies between the static initializers of different classes. This results in fancy things line NPEs or stack overflows at class loading time! + +And there are a number of circular dependencies in the Java client, mainly in queries and aggregations. Limiting class initialization to just creating a lazy wrapper avoids any problem at class loading time. + +#### Request-only classes don't need deserializers? + +API classes in the Java client can be used in requests, in responses, or both. Classes in the first category (requests only) don't need a deserializer. So incurring the cost of creating a deserializer when the class is loaded would just be wasteful. + +We could have decided to _not_ add deserializers to request-only classes, but that would have prevented the implementation of `withJson()`. This method allows users to create request objects from a JSON string. Under the hood it calls the object's deserializer. + +In this scenario, having a lazily initialized deserializer enables some interesting features while paying the price for its creation only if it's actually used. + +### Container deserializers + +Variant containers (i.e. externally tagged types like `Query`) use `ObjectDeserializer` explained above. The container class (that implements `TaggedUnion`) has a "pseudo-property" for each of the variants and regular properties for container-level fields. + +### Internally tagged variants deserialization + +Internally tagged variants (with a `"type"` property) and require peeking inside the JSON object to find out what their actual variant is. This is the role of `JsonpUtils.lookAheadFieldValue()`: it will read JSON events until finding the property that defines the variant, and return that property's value and a JSON parser that will traverse the buffered data. + +### Untagged union deserialization + +Untagged unions (without a discriminant) deserialization is handled by `UnionDeserializer`. It is configured by adding each of the union members with their deserializers. Union members can be of two kinds: + +- objects: their fields will be used to build "member handlers" associated to field names that uniquely identify the member. If member `X` has fields `a` and `b`, and member `Y` has fields `b` and `c`, then finding an `a` property identifies member `X` while `b` doesn't allow disambiguating variants. + +- non-objects, like array or string: the JSON event type will be used to identify the variant. + +Like seen previously with internally tagged variants, `UnionDeserializer` will look a head and buffer JSON events until finding the information needed to identify the variant. The events that were buffered are then replayed to deserialize the selected variant. diff --git a/java-client/src/main/java/co/elastic/clients/json/UnionDeserializer.java b/java-client/src/main/java/co/elastic/clients/json/UnionDeserializer.java index d6ad6c9b2..4e4043940 100644 --- a/java-client/src/main/java/co/elastic/clients/json/UnionDeserializer.java +++ b/java-client/src/main/java/co/elastic/clients/json/UnionDeserializer.java @@ -35,6 +35,13 @@ import java.util.Set; import java.util.function.BiFunction; +/** + * A deserializer for union types that finds the actual variant using structural inspection of the JSON value. + * + * @param The union type we want to deserialize into + * @param The union's discriminant type + * @param The base type of possible member values in the union. + */ public class UnionDeserializer implements JsonpDeserializer { public static class AmbiguousUnionException extends RuntimeException { @@ -48,6 +55,11 @@ private abstract static class EventHandler { abstract EnumSet nativeEvents(); } + /** + * Handler for a single member (kind) of the union. It holds the list of properties that are unique to it + * among all handlers, so that we can unambiguously identify it by looking at the properties that exist + * in a JSON object. + */ private static class SingleMemberHandler extends EventHandler { private final JsonpDeserializer deserializer; private final Kind tag; @@ -109,7 +121,7 @@ public static class Builder implements ObjectBuilder buildFn; - private final List> objectMembers = new ArrayList<>(); + private final List> objectMembers = new ArrayList<>(); private final Map> otherMembers = new HashMap<>(); private final boolean allowAmbiguousPrimitive; @@ -135,7 +147,7 @@ private void addAmbiguousDeserializer(Event e, Kind tag, JsonpDeserializer a.deserializer.acceptedEvents().size())); } - private void addMember(Event e, Kind tag, UnionDeserializer.SingleMemberHandler member) { + private void addMember(Event e, Kind tag, SingleMemberHandler member) { if (otherMembers.containsKey(e)) { if (!allowAmbiguousPrimitive || e == Event.START_OBJECT || e == Event.START_ARRAY) { throw new AmbiguousUnionException("Union member '" + tag + "' conflicts with other members"); @@ -150,26 +162,31 @@ private void addMember(Event e, Kind tag, UnionDeserializer.SingleMemberHandler< } } + /** + * Adds a member to the union deserializer. + */ public Builder addMember(Kind tag, JsonpDeserializer deserializer) { JsonpDeserializer unwrapped = DelegatingDeserializer.unwrap(deserializer); if (unwrapped instanceof ObjectDeserializer) { ObjectDeserializer od = (ObjectDeserializer) unwrapped; Set allFields = od.fieldNames(); - Set fields = new HashSet<>(allFields); // copy to update - for (UnionDeserializer.SingleMemberHandler member: objectMembers) { - // Remove respective fields on both sides to keep specific ones - fields.removeAll(member.fields); + + Set uniqueFields = new HashSet<>(allFields); // copy that we'll update + for (SingleMemberHandler member: objectMembers) { + // Keep fields that are unique to this member + uniqueFields.removeAll(member.fields); + // Remove the new member's fields from the existing member to ensure uniqueness member.fields.removeAll(allFields); } - UnionDeserializer.SingleMemberHandler member = new SingleMemberHandler<>(tag, deserializer, fields); + SingleMemberHandler member = new SingleMemberHandler<>(tag, deserializer, uniqueFields); objectMembers.add(member); if (od.shortcutProperty() != null) { // also add it as a string addMember(Event.VALUE_STRING, tag, member); } } else { - UnionDeserializer.SingleMemberHandler member = new SingleMemberHandler<>(tag, deserializer); + SingleMemberHandler member = new SingleMemberHandler<>(tag, deserializer); for (Event e: deserializer.nativeEvents()) { addMember(e, tag, member); } @@ -181,7 +198,7 @@ public Builder addMember(Kind tag, JsonpDeserializer build() { // Check that no object member had all its fields removed - for (UnionDeserializer.SingleMemberHandler member: objectMembers) { + for (SingleMemberHandler member: objectMembers) { if (member.fields.isEmpty()) { throw new AmbiguousUnionException("All properties of '" + member.tag + "' also exist in other object members"); } From a885844ccd171eb867391b01440675acc9b4503d Mon Sep 17 00:00:00 2001 From: Sylvain Wallez Date: Mon, 6 Oct 2025 15:04:57 +0200 Subject: [PATCH 2/2] Add comment on test --- .../model/SerializationTest.java | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/java-client/src/test/java/co/elastic/clients/elasticsearch/model/SerializationTest.java b/java-client/src/test/java/co/elastic/clients/elasticsearch/model/SerializationTest.java index a38f843bb..cb3b5d950 100644 --- a/java-client/src/test/java/co/elastic/clients/elasticsearch/model/SerializationTest.java +++ b/java-client/src/test/java/co/elastic/clients/elasticsearch/model/SerializationTest.java @@ -41,6 +41,11 @@ public class SerializationTest extends ModelTestCase { + /** + * Loads all {@code _DESERIALIER} fields. Since the actual deserializers are lazily constructed at runtime + * the first time a deserializer is used, we load them all to make sure they can be created and initialized + * successfully. + */ @Test public void loadAllDeserializers() throws Exception { @@ -67,20 +72,6 @@ public void loadAllDeserializers() throws Exception { // Check that all classes that have a _DESERIALIZER field also have the annotation ClassInfoList withDeserializer = scan.getAllClasses().filter((c) -> c.hasDeclaredField("_DESERIALIZER")); assertFalse(withDeserializer.isEmpty(), "No classes with a _DESERIALIZER field"); - -// Disabled for now, empty response classes still need a deserializer object -// e.g. ExistsIndexTemplateResponse, PingResponse, ExistsResponse, ExistsAliasResponse -// -// Set annotationNames = withAnnotation.stream().map(c -> c.getName()).collect(Collectors.toSet()); -// Set withFieldNames = withDeserializer.stream().map(c -> c.getName()).collect(Collectors.toSet()); -// -// withFieldNames.removeAll(annotationNames); -// -// assertFalse( -// withFieldNames.size() + " classes with the field but not the annotation: " + withFieldNames, -// !withFieldNames.isEmpty() -// ); - } @Test