-
Notifications
You must be signed in to change notification settings - Fork 11
Refactor Postgres Field Transformers and Collections #236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #236 +/- ##
============================================
+ Coverage 79.35% 79.69% +0.33%
- Complexity 1078 1097 +19
============================================
Files 208 212 +4
Lines 5276 5328 +52
Branches 453 450 -3
============================================
+ Hits 4187 4246 +59
+ Misses 770 762 -8
- Partials 319 320 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@suddendust Please ensure that the test coverage is there. |
| flatStructureCollection != null | ||
| && flatStructureCollection.equals(context.getTableIdentifier().getTableName()); | ||
|
|
||
| context.getPgColTransformer() instanceof FlatPostgresFieldTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is the context getting set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'd prefer avoiding "instanceof" checks as much as possible (except the object is an instance of the Object class itself). Would it be possible to do it from the enum?
"instanceof" seems to violate the Liskov's Substitution Principle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, the same can be achieve using overriding if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@puneet-traceable The context is set by the callers. For example, check PostgresFilterTypeExpressionVisitor#visit.
| import org.hypertrace.core.documentstore.postgres.query.v1.parser.filter.nonjson.field.PostgresInRelationalFilterParserNonJsonField; | ||
| import org.hypertrace.core.documentstore.postgres.query.v1.transformer.FlatPostgresFieldTransformer; | ||
|
|
||
| public class PostgresRelationalFilterParserFactoryImpl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we create a similar factory for Flat collection? and use that in flat collection flow?
| } | ||
|
|
||
| private PostgresQueryParser createParser(Query query) { | ||
| return new PostgresQueryParser(tableIdentifier, PostgresQueryExecutor.transformAndLog(query)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to create thos object every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, parser maintains state on a per-query level. This is how it is today as well.
| * compatibility) | ||
| * @return the corresponding collection impl | ||
| */ | ||
| default Collection getCollectionForType(String collectionName, DocumentType documentType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add comment on other methods as well.
for eg: what would be the behavior of getCollection(), createCollection()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see a corresponding implementation change in MongoDatastore.
If we upgrade document-store in entity-service, is the expectation that we would getCollection for mongodb collection and getCollectionForType for postgres collection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
How is this supposed to be used by the clients/consumers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.is the expectation that we would getCollection for mongodb collection and getCollectionForType for postgres collection?
That's the expectation right now, yes. Now that I think of it, a better approach would be to perhaps, for Mongo, to return an instance of MongoCollection for type NESTED and an IllegalArgumentException otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash This API is intended to be primarily used for SQL datastores like PG in which we can store both flat and nested documents. For Mongo, since we only support nested, it'll throw an IllegalArgumentException for FLAT doc types (as per this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, I am a bit confused here still. The consumers are the ones who are choosing which of the Postgres FLAT/NESTED to use. So, why would the clients again ask the library which one is being used. I mean, the clients would already have this information with them, right?
|
|
||
| // Verify total count matches expected (each document contributes its tag count) | ||
| int totalTags = tagCounts.values().stream().mapToInt(Integer::intValue).sum(); | ||
| assertTrue(totalTags > 0, "Total tag count should be greater than 0"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Perhaps a personal coding style. Messages are generally helpful if it can't be easily inferred from the assertion failure. In all these places, I feel the test failure is good enough to capture the message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@suresh-prakash I personally prefer having clear messages, but would rather stick to what norms we follow today. What do you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, not a big deal either ways. I just found it redundant since we can easily infer the failure from the line numbers.
| * compatibility) | ||
| * @return the corresponding collection impl | ||
| */ | ||
| default Collection getCollectionForType(String collectionName, DocumentType documentType) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
How is this supposed to be used by the clients/consumers?
| if (fromClause.isPresent()) { | ||
| startIndexOfSelection = fromClause.get().length(); | ||
| sqlBuilder.append(fromClause.get()); | ||
| // Update transformer with unnest column mappings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a TODO comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment isn't relevant anymore, removed.
| flatStructureCollection != null | ||
| && flatStructureCollection.equals(context.getTableIdentifier().getTableName()); | ||
|
|
||
| context.getPgColTransformer() instanceof FlatPostgresFieldTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I'd prefer avoiding "instanceof" checks as much as possible (except the object is an instance of the Object class itself). Would it be possible to do it from the enum?
"instanceof" seems to violate the Liskov's Substitution Principle.
| boolean isFirstClassField = | ||
| flatStructureCollection != null | ||
| && flatStructureCollection.equals(context.getTableIdentifier().getTableName()); | ||
| boolean isFirstClassField = context.getPgColTransformer() instanceof FlatPostgresCollection; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha! Added a getDocumentType to PostgresPGTransformer to determine what kind of documents that transformer is handling (flat vs nested).
| flatStructureCollection != null | ||
| && flatStructureCollection.equals(context.getTableIdentifier().getTableName()); | ||
|
|
||
| context.getPgColTransformer() instanceof FlatPostgresFieldTransformer; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, the same can be achieve using overriding if possible.
|
|
||
| @ParameterizedTest | ||
| @ArgumentsSource(PostgresProvider.class) | ||
| void testFlatPostgresCollectionFindAll(String dataStoreName) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If all these tests are specific to Postgres, worth nesting it under a nested class with @Nested annotation. That way, all related tests would be grouped together.
Also, it would be better to move all the tests (classes and methods) to the beginning of the class (before all the private methods).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion! I have moved all flat PG tests to the nested class FlatPostgresCollectionTest.
|
@suresh-prakash @puneet-traceable Requesting a re-review, thanks! |
Overview
This pull request refactors the Postgres field transformers to improve modularity and maintainability. It also introduces a clearer distinction between two different kinds of Postgres collections (
FlatvsNested), each with its own field transformation strategies.Key Changes
Currently,
FieldToPGTransformerhandles transformation both for flat and nested PG collections in a single block of code. This responsibility is now divided b/w two classes:FlatPostgresFieldTransformer: Handles flat PG collections. Fields names are simply returned as-is.NestedPostgresColTransformer: Handles the case in which we store the entire doc inside thedocumentcolumn. Field transformations look likedocument -> attributes ->....Distinct Postgres Collection Types
In line with the similar reasoning as above, we now have two Postgres collections,
FlatPostgresCollectionandNestedPostgresCollection, each handling their own query methods.Changes to PostgresQueryParser:
PostgresQueryParsernow accepts the transformer directly in its constructor. If no transformer is supplied, it falls back to theNestedPostgresColTransformer.API Changes
We now have a new
Datastore#getCollectionForTypemethod that can be used to return the PG collection of a particular type.Checklist