-
Notifications
You must be signed in to change notification settings - Fork 11
Support for SELECT on nested JSON fields in Flat Collections #240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for SELECT on nested JSON fields in Flat Collections #240
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #240 +/- ##
============================================
+ Coverage 79.88% 80.08% +0.19%
- Complexity 1100 1134 +34
============================================
Files 213 215 +2
Lines 5400 5484 +84
Branches 473 481 +8
============================================
+ Hits 4314 4392 +78
- Misses 760 764 +4
- Partials 326 328 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| /** | ||
| * Expression representing a nested field within a JSONB column in flat Postgres collections. | ||
| * | ||
| * <p>Example: JsonIdentifierExpression.of("customAttr", List.of("myAttribute"), "STRING"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For anything more than 2 arguments, especially if 2 of them are of the same type, I'd prefer using a builder (to make the code less-error prone due to the incorrect argument order).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash Since we've removed the type arg, it only has 2 arguments now, both of different types (so incorrect arg order is not possible).
|
|
||
| String columnName; // e.g., "customAttr" (the actual JSONB column) | ||
| List<String> jsonPath; // e.g., ["myAttribute"] (path within the JSONB) | ||
| String jsonType; // "STRING" or "STRING_ARRAY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why this would be required? Can the client parse the JSON to whatever type they want? That way, the clients can also handle complex objects.
If we cannot avoid this, I'd make it an enum, instead of a string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash The jsonType determines which PostgreSQL operator and casting to use. For example:
-- STRING type: Uses ->> (extracts as TEXT)
SELECT props->>'brand' FROM table WHERE props->>'brand' = 'Nike'
-- STRING_ARRAY type: Uses -> (keeps as JSONB)
SELECT props->'colors' FROM table WHERE props->'colors' @> '["red"]'
This keeps the query generation logic consistent for flat collections consistent with what we have today for nested. Do you see a better approach here (maybe always extract as a json rather than text)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to understand how we achieve it today with nested Postgres table without getting the additional type information from the clients/callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash As discussed yesterday, the type isn't needed.
|
|
||
| public static JsonIdentifierExpression of( | ||
| final String columnName, final List<String> jsonPath, final String jsonType) { | ||
| // Construct full name for compatibility: "customAttr.myAttribute" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these are column names and JSON paths, they are not fed-in via. PreparedStatement. Hence, this could lead to SQL injection attacks. Could you please add sanity checks everywhere? This is applicable all throughout "flat collections", even for regular IdentifierExpression perhaps. For nested collection, it may not have been an issue since the column name and the table name are hardcoded in the library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash Any strategies you know of around this? Also, shall we create a separate story for this to not balloon the scope of this store too much?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simply ensuring the column names and jsonPaths do not contain any special characters other than an underscore and dot might be a good one to start with.
shall we create a separate story for this to not balloon the scope of this store too much?
When it comes to security aspects, I guess, it's better to act on it as soon as possible. I don't think the work involved would be much anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash Gotcha!
| @EqualsAndHashCode(callSuper = true) | ||
| public class JsonIdentifierExpression extends IdentifierExpression { | ||
|
|
||
| String columnName; // e.g., "customAttr" (the actual JSONB column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already a part of the parent class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. Will remove this.
| public FieldToPgColumn transform( | ||
| IdentifierExpression expression, Map<String, String> pgColMapping) { | ||
| // Check if this is a JsonIdentifierExpression with explicit metadata | ||
| if (expression instanceof JsonIdentifierExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd completely avoid the need for instanceof through the visitor pattern as much as possible for a cleaner code structure. The problem with instanceof is whenever we change a logic in one place, we might forget to make a similar change in the other place with instanceof. But, with visitor, all relevant changes are nicely packaged together in a single class (high cohession).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed comment! I am using visitors now to avoid the instanceof check.
|
|
||
| // If this is a JsonIdentifierExpression, use its type instead of the visitor's type | ||
| Type typeToUse = this.type; | ||
| if (expression instanceof JsonIdentifierExpression) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
Having the instanceof checks at different places increases coupling across classes, leading to code maintenance overhead in order to keep it error-free.
| .buildFieldAccessorWithCast(fieldToPgColumn, typeToUse); | ||
| } | ||
|
|
||
| private Type convertJsonTypeToPostgresType(String jsonType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make the Type enum generic to document store (not specific to Postgres) and re-use it?
I'd prefer avoiding the type at all if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, we'll get rid of type for now.
This reverts commit 1d2dbb8.
…b_nested_selections
suresh-prakash
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything looks good except https://github.com/hypertrace/document-store/pull/240/files#r2428016668
|
@suresh-prakash Regarding this, I was thinking of creating a new issue for it, since the changes are at multiple places, blowing up the PR size. Wdyt? |
|
@suresh-prakash Nvm, I just read the other comment. Will be adding this in this PR itself! |
|
@suresh-prakash Can you review: 4a01d1f. We can make the different static vars config driven as well, but for starting, I feel this should be okay? |
Support for SELECT on Nested JSON Fields
Summary
Adds support for selecting nested fields from JSONB columns in PostgreSQL flat collections using a new
JsonIdentifierExpressiontype.Scope: STRING and STRING_ARRAY types only.
Key Changes
JsonIdentifierExpressionfor type-aware JSONB field access with field paths (e.g.,props->brand,props->seller->city)FlatPostgresFieldTransformerto translate JSON paths to PostgreSQL->/->>operatorsPostgresDataAccessorIdentifierExpressionVisitorto handle JSON type to PostgreSQL type conversionPostgresColTransformerfor type-aware casting on JSONB fieldsLimitations
STRINGandSTRING_ARRAYtypes are supported for now.