Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes how default null values are serialized for complex field types (LIST/MAP), storing them as JSON strings to ensure consistent deserialization across Java/Scala JSON readers.
Changes:
- Serialize LIST/MAP
defaultNullValueas a compact JSON string instead of embedding JSON nodes. - Add unit tests validating complex default null value round-tripping and textual JSON storage.
- Refactor existing tests to use static TestNG assertions and
FieldSpec.DataTypeimport.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java | Serializes complex default-null values as strings and adds error handling. |
| pinot-spi/src/test/java/org/apache/pinot/spi/data/SchemaSerializationTest.java | Adds/updates tests to validate complex default-null value serialization and round-trips. |
| String serialized = jsonObject.toString(); | ||
| assertTrue(serialized.contains("\"defaultNullValue\":\"[1,2,3]\"")); |
There was a problem hiding this comment.
These assertions depend on exact JSON string escaping, compacting, and field rendering, which can be brittle across Jackson versions/config. Since the test already validates defaultNullValueNode.isTextual() and its textValue(), consider removing the serialized.contains(...) checks or rewriting them to assert via parsing (e.g., parse serialized back to a JsonNode and assert defaultNullValue is textual with the expected textValue()).
| serialized = jsonObject.toString(); | ||
| assertTrue(serialized.contains("\"defaultNullValue\":\"[\\\"a\\\",\\\"b\\\",\\\"c\\\"]\"")); |
There was a problem hiding this comment.
These assertions depend on exact JSON string escaping, compacting, and field rendering, which can be brittle across Jackson versions/config. Since the test already validates defaultNullValueNode.isTextual() and its textValue(), consider removing the serialized.contains(...) checks or rewriting them to assert via parsing (e.g., parse serialized back to a JsonNode and assert defaultNullValue is textual with the expected textValue()).
| serialized = jsonObject.toString(); | ||
| assertTrue(serialized.contains("\"defaultNullValue\":\"{\\\"a\\\":1,\\\"b\\\":2}\"")); |
There was a problem hiding this comment.
These assertions depend on exact JSON string escaping, compacting, and field rendering, which can be brittle across Jackson versions/config. Since the test already validates defaultNullValueNode.isTextual() and its textValue(), consider removing the serialized.contains(...) checks or rewriting them to assert via parsing (e.g., parse serialized back to a JsonNode and assert defaultNullValue is textual with the expected textValue()).
pinot-spi/src/test/java/org/apache/pinot/spi/data/SchemaSerializationTest.java
Show resolved
Hide resolved
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17685 +/- ##
============================================
+ Coverage 55.61% 55.63% +0.01%
Complexity 721 721
============================================
Files 2479 2479
Lines 140436 140442 +6
Branches 22375 22378 +3
============================================
+ Hits 78110 78133 +23
+ Misses 55734 55717 -17
Partials 6592 6592
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
gortiz
left a comment
There was a problem hiding this comment.
I don't think this is a good idea, but I can accept it, given that FieldSpec is currently using Jackson in an unconventional way. TBH I would prefer to keep the current code (maybe fixing the setter) and if needed, provide our own serde for json4s
When reading an Object from JSON, java will read it as java object, while scala will read it as scala object. In order to have consistent behavior, store default null value as serialized string to ensure value is always read as java List/Map.
I assume you mean json4s. I don't know who uses that, but I don't think we should change the representation we use just because someone uses a Scala library to parse that. They can:
- Serialize/deserialize using Jackson
- Provide their own custom serializer/deserializer
If we want to support json4s, which is something I think we should discuss, we can provide custom serializer and deserializers for it, but I don't think we should change our serialized JSON format just because there is a tool that is unable to deserialize our standard format. So instead of serializing something as:
{
"key": [1,2,3, "text"]
}We serialize it as
{
"key": "[1, 2, 3, \"text\"]"
}Which is a horrible design, breaks compatibility if someone else is already parsing that string (without Jackson), it is more expensive to parse, more difficult to write parsers and more difficult to read.
| case LIST: | ||
| jsonNode.put(key, JsonUtils.objectToJsonNode(_defaultNullValue)); | ||
| try { | ||
| jsonNode.put(key, JsonUtils.objectToString(_defaultNullValue)); |
There was a problem hiding this comment.
If we end up using this code, shouldn't we use getStringValue(object) here?
There was a problem hiding this comment.
Technically we can, but this gives a better exception message, and we can short-circuit the object type check
01b8506 to
b840ce3
Compare
Jackie-Jiang
left a comment
There was a problem hiding this comment.
I've modified the getStringValue() method so that it can handle JSON list and map
| case LIST: | ||
| jsonNode.put(key, JsonUtils.objectToJsonNode(_defaultNullValue)); | ||
| try { | ||
| jsonNode.put(key, JsonUtils.objectToString(_defaultNullValue)); |
There was a problem hiding this comment.
Technically we can, but this gives a better exception message, and we can short-circuit the object type check
For complex field type, serialize the default null value into string format so that it is safe to read
When reading an
Objectfrom JSON, java will read it as java object, while scala will read it as scala object. In order to have consistent behavior, store default null value as serialized string to ensure value is always read as java List/Map.Behavior change
Non-default default null value for complex fields will be stored as serialized string.