Skip to content

Conversation

@suddendust
Copy link
Contributor

Support for SELECT on Nested JSON Fields

Summary

Adds support for selecting nested fields from JSONB columns in PostgreSQL flat collections using a new JsonIdentifierExpression type.

Scope: STRING and STRING_ARRAY types only.

Key Changes

  • New: JsonIdentifierExpression for type-aware JSONB field access with field paths (e.g., props->brand, props->seller->city)
  • Enhanced: FlatPostgresFieldTransformer to translate JSON paths to PostgreSQL -> / ->> operators
  • Updated: PostgresDataAccessorIdentifierExpressionVisitor to handle JSON type to PostgreSQL type conversion
  • Extended: PostgresColTransformer for type-aware casting on JSONB fields

Limitations

  • Only STRING and STRING_ARRAY types are supported for now.

@codecov
Copy link

codecov bot commented Oct 13, 2025

Codecov Report

❌ Patch coverage is 93.54839% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.08%. Comparing base (e0ada8e) to head (f93da60).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...y/v1/transformer/FlatPostgresFieldTransformer.java 78.26% 4 Missing and 1 partial ⚠️
...tore/expression/impl/JsonIdentifierExpression.java 95.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #240      +/-   ##
============================================
+ Coverage     79.88%   80.08%   +0.19%     
- Complexity     1100     1134      +34     
============================================
  Files           213      215       +2     
  Lines          5400     5484      +84     
  Branches        473      481       +8     
============================================
+ Hits           4314     4392      +78     
- Misses          760      764       +4     
- Partials        326      328       +2     
Flag Coverage Δ
integration 80.08% <93.54%> (+0.19%) ⬆️
unit 57.34% <87.09%> (+0.45%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

/**
* Expression representing a nested field within a JSONB column in flat Postgres collections.
*
* <p>Example: JsonIdentifierExpression.of("customAttr", List.of("myAttribute"), "STRING");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anything more than 2 arguments, especially if 2 of them are of the same type, I'd prefer using a builder (to make the code less-error prone due to the incorrect argument order).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash Since we've removed the type arg, it only has 2 arguments now, both of different types (so incorrect arg order is not possible).


String columnName; // e.g., "customAttr" (the actual JSONB column)
List<String> jsonPath; // e.g., ["myAttribute"] (path within the JSONB)
String jsonType; // "STRING" or "STRING_ARRAY"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why this would be required? Can the client parse the JSON to whatever type they want? That way, the clients can also handle complex objects.

If we cannot avoid this, I'd make it an enum, instead of a string.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash The jsonType determines which PostgreSQL operator and casting to use. For example:

-- STRING type: Uses ->> (extracts as TEXT)
SELECT props->>'brand' FROM table WHERE props->>'brand' = 'Nike'

-- STRING_ARRAY type: Uses -> (keeps as JSONB)
SELECT props->'colors' FROM table WHERE props->'colors' @> '["red"]'

This keeps the query generation logic consistent for flat collections consistent with what we have today for nested. Do you see a better approach here (maybe always extract as a json rather than text)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand how we achieve it today with nested Postgres table without getting the additional type information from the clients/callers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash As discussed yesterday, the type isn't needed.


public static JsonIdentifierExpression of(
final String columnName, final List<String> jsonPath, final String jsonType) {
// Construct full name for compatibility: "customAttr.myAttribute"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are column names and JSON paths, they are not fed-in via. PreparedStatement. Hence, this could lead to SQL injection attacks. Could you please add sanity checks everywhere? This is applicable all throughout "flat collections", even for regular IdentifierExpression perhaps. For nested collection, it may not have been an issue since the column name and the table name are hardcoded in the library.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash Any strategies you know of around this? Also, shall we create a separate story for this to not balloon the scope of this store too much?

Copy link
Contributor

@suresh-prakash suresh-prakash Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simply ensuring the column names and jsonPaths do not contain any special characters other than an underscore and dot might be a good one to start with.

shall we create a separate story for this to not balloon the scope of this store too much?

When it comes to security aspects, I guess, it's better to act on it as soon as possible. I don't think the work involved would be much anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash Gotcha!

@EqualsAndHashCode(callSuper = true)
public class JsonIdentifierExpression extends IdentifierExpression {

String columnName; // e.g., "customAttr" (the actual JSONB column)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this already a part of the parent class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is. Will remove this.

public FieldToPgColumn transform(
IdentifierExpression expression, Map<String, String> pgColMapping) {
// Check if this is a JsonIdentifierExpression with explicit metadata
if (expression instanceof JsonIdentifierExpression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd completely avoid the need for instanceof through the visitor pattern as much as possible for a cleaner code structure. The problem with instanceof is whenever we change a logic in one place, we might forget to make a similar change in the other place with instanceof. But, with visitor, all relevant changes are nicely packaged together in a single class (high cohession).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed comment! I am using visitors now to avoid the instanceof check.


// If this is a JsonIdentifierExpression, use its type instead of the visitor's type
Type typeToUse = this.type;
if (expression instanceof JsonIdentifierExpression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.
Having the instanceof checks at different places increases coupling across classes, leading to code maintenance overhead in order to keep it error-free.

.buildFieldAccessorWithCast(fieldToPgColumn, typeToUse);
}

private Type convertJsonTypeToPostgresType(String jsonType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make the Type enum generic to document store (not specific to Postgres) and re-use it?

I'd prefer avoiding the type at all if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, we'll get rid of type for now.

@suddendust suddendust changed the title [PoC] Support for SELECT on nested JSON fields Support for SELECT on nested JSON fields Oct 17, 2025
@suddendust suddendust changed the title Support for SELECT on nested JSON fields Support for SELECT on nested JSON fields in Flat Collections Oct 17, 2025
Copy link
Contributor

@suresh-prakash suresh-prakash left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suddendust
Copy link
Contributor Author

@suresh-prakash Regarding this, I was thinking of creating a new issue for it, since the changes are at multiple places, blowing up the PR size. Wdyt?

@suddendust
Copy link
Contributor Author

@suresh-prakash Nvm, I just read the other comment. Will be adding this in this PR itself!

@suddendust
Copy link
Contributor Author

suddendust commented Oct 17, 2025

@suresh-prakash Can you review: 4a01d1f. We can make the different static vars config driven as well, but for starting, I feel this should be okay?

@suresh-prakash suresh-prakash merged commit 1e3c868 into hypertrace:main Oct 17, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants