Skip to content

Conversation

@GurtejSohi
Copy link
Contributor

@GurtejSohi GurtejSohi commented Apr 1, 2025

Description

This PR has changes to add support for self-joins with sub-query (represented by SubQueryJoinExpression) in MongoDB.
Note that postgres impl of FromTypeExpressionVisitor for visiting SubQueryJoinExpression throws an UnsupportedOperationException. Currently, only the mongo implementation supports this operation.

Testing

Added an integration test for testing the mongo impl of self-join with sub-query. It also demonstrates its sample usage

@GurtejSohi GurtejSohi marked this pull request as ready for review April 9, 2025 21:15
@GurtejSohi GurtejSohi changed the title feat: add support for self-joins feat: add support for self-joins with sub-query Apr 9, 2025
{"date":"2014-03-01T09:00:00Z","item":"Mirror","quantity":1},
{"date":"2014-04-04T11:21:39.736Z","item":"Shampoo","quantity":20},
{"date":"2016-02-06T20:20:13Z","item":"Soap","quantity":5}
] No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Empty line at the end of this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

[
{"date":"2015-09-10T08:43:00Z","item":"Comb","quantity":10},
{"date":"2014-03-01T09:00:00Z","item":"Mirror","quantity":1},
{"date":"2014-04-04T11:21:39.736Z","item":"Shampoo","quantity":20},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we format this file as JSON?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

*/
@Value
public class AliasedIdentifierExpression extends IdentifierExpression {
String contextAlias;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just alias?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alias sounds like we are giving the field name as alias (like in SelectionSpec. If you look at the example above, this is not the alias for the field, but the alias for the context in which we are referring the field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.


@Override
public String toString() {
return String.format("JOIN (%s) AS %s ON %s", subQuery, subQueryAlias, joinCondition);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit:

Suggested change
return String.format("JOIN (%s) AS %s ON %s", subQuery, subQueryAlias, joinCondition);
return String.format("JOIN (%s) AS %s ON (%s)", subQuery, subQueryAlias, joinCondition);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

List.of(
query -> singleton(getFilterClause(query, Query::getFilter)),
MongoFromTypeExpressionParser::getFromClauses,
query -> new MongoFromTypeExpressionParser(this).getFromClauses(query),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any way to avoid passing down the this pointer? It seems to be going away from the standard way.

public BasicDBObject visit(LogicalExpression expression) {
BasicDBObject letClause = new BasicDBObject();
for (FilterTypeExpression operand : expression.getOperands()) {
letClause.putAll((Map<String, Object>) operand.accept(this));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We seem to be casting BasicDBObject into a Map. This could be confusing (probably compiler might be generating a warning).

Can we either convert everything to BasicDBObject or everything to Map?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed the return type to map


@Override
public BasicDBObject visit(ConstantExpression expression) {
return new BasicDBObject();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this deliberate?
If this is not expected here, why not extend the MongoSelectTypeExpressionParser?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is deliberate as this visitor is expected to go through SelectTypeExpressions and only return the fields that have to be declared in the let clause. So any other field other than AliasedIdentifierExpression has to be ignored but it should not throw an error in that case and for the case of AliasedIdentifierExpression it should return the field name that we need to add to the let clause.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. This may be a common use-case. So, can we create a common base class that returns default (empty map/BasicDBObject) for every implementation and then override it?

That way

  1. The default returning shared base-class can be re-used
  2. Implementation here would look concise

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

AliasedIdentifierExpression aliasedExpression, String subQueryAlias) {
BasicDBObject letClause = new BasicDBObject();
if (aliasedExpression.getContextAlias().equals(subQueryAlias)) {
letClause.put(aliasedExpression.getName(), "$" + aliasedExpression.getName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the AliasedExpression contains a nested field like "a.b".
Does it need escaping/encoding?

It would be beneficial to add an integration for the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The field name of the aliasedExpression is given by the user and if we can make an assumption that the user won't give a dot in the field name then we should be fine as we are only using the field name here and not the context alias.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused. Does it allow me to pick the value of "a.b" and alias it as "c"?

In other words, are you saying that the alias can't contain a dot, but the identifier name can?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad, I misunderstood your previous comment. I went through it again and it made sense this time. I've updated the implementation to encode the variable name so that it can handle nested fields also. I've also added an integration test for it.

* Subquery join expression is not supported by default. Override this method to support it.
*/
default <T> T visit(SubQueryJoinExpression subQueryJoinExpression) {
throw new UnsupportedOperationException("This operation is not supported");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically, we should force all the implementors of visitor interface to implement all the methods.

Whether to throw exception or not should be the implementor's responsibility.

We probably created the base class
MongoSelectTypeExpressionParser for a similar purpose (without having a default implementation in the interface).

The reason being, it may make sense to implement this in other implementations (other than MongoDB).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

<T> T visit(final IdentifierExpression expression);

default <T> T visit(final AliasedIdentifierExpression expression) {
throw new UnsupportedOperationException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Not a typical visitor flow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated it

*/
@Value
public class AliasedIdentifierExpression extends IdentifierExpression {
String contextAlias;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense.

return key.replace("\\u002e", FIELD_SEPARATOR).replace("\\u0024", PREFIX).replace("\\\\", "\\");
}

public static String encodeVariableName(final String variableName) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use the existing "encode" method in MongoUtils?

Copy link
Contributor Author

@GurtejSohi GurtejSohi Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Encode method in MongoUtils also adds backslashes. Backslashes are allowed in keys but not in variable names, as mentioned here - https://www.mongodb.com/docs/manual/reference/aggregation-variables/.
So I had to introduce a new method to encode variable names.

Comment on lines +57 to +67
public Map<String, Object> visit(KeyExpression expression) {
return Collections.emptyMap();
}

@Override
public Map<String, Object> visit(ArrayRelationalFilterExpression expression) {
return Collections.emptyMap();
}

@Override
public Map<String, Object> visit(DocumentArrayFilterExpression expression) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to override these, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that it might be confusing because I put the inner class in between the outer class methods.
Did you think that these are methods of the inner class? If so, please note that these are outer class methods and since it implements filter type expression it has to implement these methods as well.

To avoid confusion, I've moved the inner class below the outer class methods

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah. That was the confusion. Makes sense to keep the inner classes towards the end.

@codecov
Copy link

codecov bot commented Apr 11, 2025

Codecov Report

Attention: Patch coverage is 79.41176% with 42 lines in your changes missing coverage. Please review.

Project coverage is 79.61%. Comparing base (577aaac) to head (f7a0118).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...mongo/query/MongoAggregationPipelineConverter.java 87.87% 1 Missing and 7 partials ⚠️
...e/expression/impl/AliasedIdentifierExpression.java 66.66% 2 Missing and 4 partials ⚠️
...go/query/parser/MongoEmptySelectionTypeParser.java 28.57% 5 Missing ⚠️
...tore/mongo/query/parser/MongoLetClauseBuilder.java 81.81% 3 Missing and 1 partial ⚠️
...tstore/expression/impl/SubQueryJoinExpression.java 75.00% 2 Missing ⚠️
...ypertrace/core/documentstore/mongo/MongoUtils.java 33.33% 1 Missing and 1 partial ⚠️
...transformer/PostgresSelectionQueryTransformer.java 0.00% 2 Missing ⚠️
...v1/transformer/PostgresUnnestQueryTransformer.java 0.00% 2 Missing ⚠️
.../documentstore/mongo/query/MongoQueryExecutor.java 88.88% 1 Missing ⚠️
.../documentstore/mongo/query/parser/AliasParser.java 0.00% 1 Missing ⚠️
... and 9 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #220      +/-   ##
============================================
- Coverage     79.73%   79.61%   -0.12%     
- Complexity     1043     1048       +5     
============================================
  Files           199      204       +5     
  Lines          4945     5083     +138     
  Branches        410      416       +6     
============================================
+ Hits           3943     4047     +104     
- Misses          711      740      +29     
- Partials        291      296       +5     
Flag Coverage Δ
integration 79.61% <79.41%> (-0.12%) ⬇️
unit 56.32% <33.33%> (-1.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

Test Results

 40 files  ±0   40 suites  ±0   31s ⏱️ -1s
246 tests ±0  246 ✅ ±0  0 💤 ±0  0 ❌ ±0 
502 runs  +2  502 ✅ +2  0 💤 ±0  0 ❌ ±0 

Results for commit f7a0118. ± Comparison against base commit 577aaac.

@GurtejSohi GurtejSohi merged commit 015ee5c into main Apr 11, 2025
5 of 7 checks passed
@GurtejSohi GurtejSohi deleted the add-support-for-self-joins branch April 11, 2025 07:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants