Skip to content

Conversation

@suddendust
Copy link
Contributor

@suddendust suddendust commented Oct 6, 2025

Description

This PR adds support for un-nesting top-level array fields in flat PG collection. Currently, it assumes that the column defined in UnnestExpression is an array. Un-nesting JSON arrays in currently not support in flat collections.

Testing

Have added DocStoreQueryV1Test#testFlatPostgresCollectionUnnestTags to test the behaviour.

Checklist:

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@codecov
Copy link

codecov bot commented Oct 6, 2025

Codecov Report

❌ Patch coverage is 83.54430% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.87%. Comparing base (111e083) to head (e209d88).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...1/vistors/PostgresFilterTypeExpressionVisitor.java 67.50% 4 Missing and 9 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #239      +/-   ##
============================================
+ Coverage     79.73%   79.87%   +0.13%     
  Complexity     1099     1099              
============================================
  Files           213      213              
  Lines          5344     5396      +52     
  Branches        455      473      +18     
============================================
+ Hits           4261     4310      +49     
+ Misses          763      760       -3     
- Partials        320      326       +6     
Flag Coverage Δ
integration 79.87% <83.54%> (+0.13%) ⬆️
unit 56.89% <26.58%> (-0.45%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suddendust suddendust changed the title [Draft] Support for Unnest on Top-Level Array Fields in Flat Postgres Collection Support for Unnest on Top-Level Array Fields in Flat Postgres Collection Oct 8, 2025
suresh-prakash
suresh-prakash previously approved these changes Oct 8, 2025
// From looking at the data:
// - "hygiene" appears in docs 1, 5, 8 = 3 times
// - "personal-care" appears in docs 1, 3 = 2 times
// - "grooming" appears in docs 6, 7 = 2 times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice. Thanks a lot for these inline comments. 🙂

return String.format(
"EXISTS (SELECT 1 FROM jsonb_array_elements(COALESCE(%s, '[]'::jsonb)) AS \"%s\" WHERE %s)",
parsedLhs, alias, parsedFilter);
if (isFlatCollection) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this comment is coming pretty late.
The more and more of these conditions make me feel that it would be better to separate the "FlatCollection Visitors" and "NestedCollection Visitors" in different classes backed by some factory, avoiding a lot of these checks. More than the checks, it'll be much cleaner to maintain/debug.

Having said that, I leave it upto you if the distinction really makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this makes sense. Let me see how much of this can be refactored. Thanks!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this comment is coming pretty late. The more and more of these conditions make me feel that it would be better to separate the "FlatCollection Visitors" and "NestedCollection Visitors" in different classes backed by some factory, avoiding a lot of these checks. More than the checks, it'll be much cleaner to maintain/debug.

Having said that, I leave it upto you if the distinction really makes sense.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@puneet-traceable @suresh-prakash I did a bit of refactoring and the changes are substantial. How about we merge this PR and take that up in a separate one? To keep things cleaner.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. That works.

postgresQueryParser.getQuery().getFilter(), postgresQueryParser);

if (StringUtils.isNotEmpty(unnestFilters) && mainFilter.isPresent()) {
return Optional.of(unnestFilters + " AND " + mainFilter.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Prefer StringBuilder, StringBuffer or String.format()

@suddendust
Copy link
Contributor Author

suddendust commented Oct 13, 2025

@suresh-prakash @puneet-traceable For queries like SELECT item, price FROM <implicit_collection> WHERE ANY(numbers) = 10, the parser tries to cast the array into the apt type (SELECT 1 FROM unnest(COALESCE(%s, ARRAY[]%s) in PostgresFilterTypeExpressionVisitor). This type inference fails when:

  1. Filter is a LogicalExpression (AND/OR) rather than RelationalExpression.
  2. Filter compares against an IdentifierExpression instead of a ConstantExpression.

We need a better way to do the type cast, but don't really see any other way. Your thoughts?

@suresh-prakash suresh-prakash merged commit 5c65ef6 into hypertrace:main Oct 15, 2025
6 checks passed
@suresh-prakash
Copy link
Contributor

@suresh-prakash @puneet-traceable For queries like SELECT item, price FROM <implicit_collection> WHERE ANY(numbers) = 10, the parser tries to cast the array into the apt type (SELECT 1 FROM unnest(COALESCE(%s, ARRAY[]%s) in PostgresFilterTypeExpressionVisitor). This type inference fails when:

  1. Filter is a LogicalExpression (AND/OR) rather than RelationalExpression.
  2. Filter compares against an IdentifierExpression instead of a ConstantExpression.

We need a better way to do the type cast, but don't really see any other way. Your thoughts?

  1. Filter is a LogicalExpression (AND/OR) rather than RelationalExpression.

I guess, we probably hadn't seen such use-cases yet. Wondering if it would make sense to break the logical expression into component relational expressions before inferring type for each composed filter. 🤔

  1. Filter compares against an IdentifierExpression instead of a ConstantExpression.

Are there any use cases for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants