Skip to content

Fix complex field ser de#17685

Open
Jackie-Jiang wants to merge 2 commits intoapache:masterfrom
Jackie-Jiang:fix_complex_field_ser_de
Open

Fix complex field ser de#17685
Jackie-Jiang wants to merge 2 commits intoapache:masterfrom
Jackie-Jiang:fix_complex_field_ser_de

Conversation

@Jackie-Jiang
Copy link
Contributor

For complex field type, serialize the default null value into string format so that it is safe to read

When reading an Object from JSON, java will read it as java object, while scala will read it as scala object. In order to have consistent behavior, store default null value as serialized string to ensure value is always read as java List/Map.

Behavior change

Non-default default null value for complex fields will be stored as serialized string.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes how default null values are serialized for complex field types (LIST/MAP), storing them as JSON strings to ensure consistent deserialization across Java/Scala JSON readers.

Changes:

  • Serialize LIST/MAP defaultNullValue as a compact JSON string instead of embedding JSON nodes.
  • Add unit tests validating complex default null value round-tripping and textual JSON storage.
  • Refactor existing tests to use static TestNG assertions and FieldSpec.DataType import.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java Serializes complex default-null values as strings and adds error handling.
pinot-spi/src/test/java/org/apache/pinot/spi/data/SchemaSerializationTest.java Adds/updates tests to validate complex default-null value serialization and round-trips.

Comment on lines +434 to +435
String serialized = jsonObject.toString();
assertTrue(serialized.contains("\"defaultNullValue\":\"[1,2,3]\""));
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions depend on exact JSON string escaping, compacting, and field rendering, which can be brittle across Jackson versions/config. Since the test already validates defaultNullValueNode.isTextual() and its textValue(), consider removing the serialized.contains(...) checks or rewriting them to assert via parsing (e.g., parse serialized back to a JsonNode and assert defaultNullValue is textual with the expected textValue()).

Copilot uses AI. Check for mistakes.
Comment on lines +449 to +450
serialized = jsonObject.toString();
assertTrue(serialized.contains("\"defaultNullValue\":\"[\\\"a\\\",\\\"b\\\",\\\"c\\\"]\""));
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions depend on exact JSON string escaping, compacting, and field rendering, which can be brittle across Jackson versions/config. Since the test already validates defaultNullValueNode.isTextual() and its textValue(), consider removing the serialized.contains(...) checks or rewriting them to assert via parsing (e.g., parse serialized back to a JsonNode and assert defaultNullValue is textual with the expected textValue()).

Copilot uses AI. Check for mistakes.
Comment on lines +467 to +468
serialized = jsonObject.toString();
assertTrue(serialized.contains("\"defaultNullValue\":\"{\\\"a\\\":1,\\\"b\\\":2}\""));
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions depend on exact JSON string escaping, compacting, and field rendering, which can be brittle across Jackson versions/config. Since the test already validates defaultNullValueNode.isTextual() and its textValue(), consider removing the serialized.contains(...) checks or rewriting them to assert via parsing (e.g., parse serialized back to a JsonNode and assert defaultNullValue is textual with the expected textValue()).

Copilot uses AI. Check for mistakes.
@codecov-commenter
Copy link

codecov-commenter commented Feb 12, 2026

Codecov Report

❌ Patch coverage is 61.53846% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.63%. Comparing base (b4e48d5) to head (b840ce3).

Files with missing lines Patch % Lines
...main/java/org/apache/pinot/spi/data/FieldSpec.java 66.66% 4 Missing ⚠️
...cal/recordtransformer/SanitizationTransformer.java 0.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17685      +/-   ##
============================================
+ Coverage     55.61%   55.63%   +0.01%     
  Complexity      721      721              
============================================
  Files          2479     2479              
  Lines        140436   140442       +6     
  Branches      22375    22378       +3     
============================================
+ Hits          78110    78133      +23     
+ Misses        55734    55717      -17     
  Partials       6592     6592              
Flag Coverage Δ
integration 0.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 55.60% <61.53%> (+0.01%) ⬆️
java-21 55.59% <61.53%> (+<0.01%) ⬆️
temurin 55.63% <61.53%> (+0.01%) ⬆️
unittests 55.63% <61.53%> (+0.01%) ⬆️
unittests1 55.63% <61.53%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@gortiz gortiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a good idea, but I can accept it, given that FieldSpec is currently using Jackson in an unconventional way. TBH I would prefer to keep the current code (maybe fixing the setter) and if needed, provide our own serde for json4s

When reading an Object from JSON, java will read it as java object, while scala will read it as scala object. In order to have consistent behavior, store default null value as serialized string to ensure value is always read as java List/Map.

I assume you mean json4s. I don't know who uses that, but I don't think we should change the representation we use just because someone uses a Scala library to parse that. They can:

  • Serialize/deserialize using Jackson
  • Provide their own custom serializer/deserializer

If we want to support json4s, which is something I think we should discuss, we can provide custom serializer and deserializers for it, but I don't think we should change our serialized JSON format just because there is a tool that is unable to deserialize our standard format. So instead of serializing something as:

{
  "key": [1,2,3, "text"]
}

We serialize it as

{
  "key": "[1, 2, 3, \"text\"]"
}

Which is a horrible design, breaks compatibility if someone else is already parsing that string (without Jackson), it is more expensive to parse, more difficult to write parsers and more difficult to read.

case LIST:
jsonNode.put(key, JsonUtils.objectToJsonNode(_defaultNullValue));
try {
jsonNode.put(key, JsonUtils.objectToString(_defaultNullValue));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we end up using this code, shouldn't we use getStringValue(object) here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we can, but this gives a better exception message, and we can short-circuit the object type check

@Jackie-Jiang Jackie-Jiang force-pushed the fix_complex_field_ser_de branch from 01b8506 to b840ce3 Compare February 13, 2026 00:11
Copy link
Contributor Author

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the getStringValue() method so that it can handle JSON list and map

case LIST:
jsonNode.put(key, JsonUtils.objectToJsonNode(_defaultNullValue));
try {
jsonNode.put(key, JsonUtils.objectToString(_defaultNullValue));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically we can, but this gives a better exception message, and we can short-circuit the object type check

@Jackie-Jiang
Copy link
Contributor Author

@gortiz You may refer to the fix attempt in #17553 which explains the problem better.
Serializing the value as string is more robust, but I've made it able to understand JSON list/map when deserialized in java code with jackson.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants