Skip to content

Conversation

@suddendust
Copy link
Contributor

@suddendust suddendust commented Sep 24, 2025

Overview

This pull request refactors the Postgres field transformers to improve modularity and maintainability. It also introduces a clearer distinction between two different kinds of Postgres collections (Flat vs Nested), each with its own field transformation strategies.

Key Changes

  1. Refactored Transformer Logic

Currently, FieldToPGTransformer handles transformation both for flat and nested PG collections in a single block of code. This responsibility is now divided b/w two classes:

  • FlatPostgresFieldTransformer: Handles flat PG collections. Fields names are simply returned as-is.
  • NestedPostgresColTransformer: Handles the case in which we store the entire doc inside the document column. Field transformations look like document -> attributes ->....
  1. Distinct Postgres Collection Types
    In line with the similar reasoning as above, we now have two Postgres collections, FlatPostgresCollection and NestedPostgresCollection, each handling their own query methods.

  2. Changes to PostgresQueryParser: PostgresQueryParser now accepts the transformer directly in its constructor. If no transformer is supplied, it falls back to the NestedPostgresColTransformer.

API Changes

We now have a new Datastore#getCollectionForType method that can be used to return the PG collection of a particular type.

Checklist

  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • Any dependent changes have been merged and published in downstream modules

@codecov
Copy link

codecov bot commented Sep 24, 2025

Codecov Report

❌ Patch coverage is 83.08824% with 23 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.69%. Comparing base (3cc020b) to head (4f5e40d).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...ore/documentstore/postgres/PostgresCollection.java 77.55% 8 Missing and 3 partials ⚠️
...trace/core/documentstore/mongo/MongoDatastore.java 0.00% 3 Missing ⚠️
...documentstore/postgres/FlatPostgresCollection.java 81.81% 2 Missing ⚠️
...core/documentstore/postgres/PostgresDatastore.java 60.00% 1 Missing and 1 partial ⚠️
...y/v1/transformer/NestedPostgresColTransformer.java 91.66% 1 Missing and 1 partial ⚠️
...a/org/hypertrace/core/documentstore/Datastore.java 0.00% 1 Missing ⚠️
...ter/PostgresNotContainsRelationalFilterParser.java 0.00% 0 Missing and 1 partial ⚠️
...er/filter/PostgresNotInRelationalFilterParser.java 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #236      +/-   ##
============================================
+ Coverage     79.35%   79.69%   +0.33%     
- Complexity     1078     1097      +19     
============================================
  Files           208      212       +4     
  Lines          5276     5328      +52     
  Branches        453      450       -3     
============================================
+ Hits           4187     4246      +59     
+ Misses          770      762       -8     
- Partials        319      320       +1     
Flag Coverage Δ
integration 79.69% <83.08%> (+0.33%) ⬆️
unit 57.45% <59.55%> (+0.57%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@suddendust suddendust changed the title [Draft] Refactor Postgres Field Transformers Refactor Postgres Field Transformers and Collections Sep 24, 2025
@puneet-traceable
Copy link

@suddendust Please ensure that the test coverage is there.

flatStructureCollection != null
&& flatStructureCollection.equals(context.getTableIdentifier().getTableName());

context.getPgColTransformer() instanceof FlatPostgresFieldTransformer;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is the context getting set?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'd prefer avoiding "instanceof" checks as much as possible (except the object is an instance of the Object class itself). Would it be possible to do it from the enum?
"instanceof" seems to violate the Liskov's Substitution Principle.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the same can be achieve using overriding if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@puneet-traceable The context is set by the callers. For example, check PostgresFilterTypeExpressionVisitor#visit.

import org.hypertrace.core.documentstore.postgres.query.v1.parser.filter.nonjson.field.PostgresInRelationalFilterParserNonJsonField;
import org.hypertrace.core.documentstore.postgres.query.v1.transformer.FlatPostgresFieldTransformer;

public class PostgresRelationalFilterParserFactoryImpl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we create a similar factory for Flat collection? and use that in flat collection flow?

}

private PostgresQueryParser createParser(Query query) {
return new PostgresQueryParser(tableIdentifier, PostgresQueryExecutor.transformAndLog(query));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to create thos object every time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, parser maintains state on a per-query level. This is how it is today as well.

* compatibility)
* @return the corresponding collection impl
*/
default Collection getCollectionForType(String collectionName, DocumentType documentType) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please add comment on other methods as well.
for eg: what would be the behavior of getCollection(), createCollection()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a corresponding implementation change in MongoDatastore.
If we upgrade document-store in entity-service, is the expectation that we would getCollection for mongodb collection and getCollectionForType for postgres collection?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
How is this supposed to be used by the clients/consumers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.is the expectation that we would getCollection for mongodb collection and getCollectionForType for postgres collection?

That's the expectation right now, yes. Now that I think of it, a better approach would be to perhaps, for Mongo, to return an instance of MongoCollection for type NESTED and an IllegalArgumentException otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash This API is intended to be primarily used for SQL datastores like PG in which we can store both flat and nested documents. For Mongo, since we only support nested, it'll throw an IllegalArgumentException for FLAT doc types (as per this PR).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I am a bit confused here still. The consumers are the ones who are choosing which of the Postgres FLAT/NESTED to use. So, why would the clients again ask the library which one is being used. I mean, the clients would already have this information with them, right?


// Verify total count matches expected (each document contributes its tag count)
int totalTags = tagCounts.values().stream().mapToInt(Integer::intValue).sum();
assertTrue(totalTags > 0, "Total tag count should be greater than 0");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Perhaps a personal coding style. Messages are generally helpful if it can't be easily inferred from the assertion failure. In all these places, I feel the test failure is good enough to capture the message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suresh-prakash I personally prefer having clear messages, but would rather stick to what norms we follow today. What do you suggest?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, not a big deal either ways. I just found it redundant since we can easily infer the failure from the line numbers.

* compatibility)
* @return the corresponding collection impl
*/
default Collection getCollectionForType(String collectionName, DocumentType documentType) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
How is this supposed to be used by the clients/consumers?

if (fromClause.isPresent()) {
startIndexOfSelection = fromClause.get().length();
sqlBuilder.append(fromClause.get());
// Update transformer with unnest column mappings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a TODO comment?

Copy link
Contributor Author

@suddendust suddendust Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment isn't relevant anymore, removed.

flatStructureCollection != null
&& flatStructureCollection.equals(context.getTableIdentifier().getTableName());

context.getPgColTransformer() instanceof FlatPostgresFieldTransformer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I'd prefer avoiding "instanceof" checks as much as possible (except the object is an instance of the Object class itself). Would it be possible to do it from the enum?
"instanceof" seems to violate the Liskov's Substitution Principle.

boolean isFirstClassField =
flatStructureCollection != null
&& flatStructureCollection.equals(context.getTableIdentifier().getTableName());
boolean isFirstClassField = context.getPgColTransformer() instanceof FlatPostgresCollection;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! Added a getDocumentType to PostgresPGTransformer to determine what kind of documents that transformer is handling (flat vs nested).

flatStructureCollection != null
&& flatStructureCollection.equals(context.getTableIdentifier().getTableName());

context.getPgColTransformer() instanceof FlatPostgresFieldTransformer;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, the same can be achieve using overriding if possible.


@ParameterizedTest
@ArgumentsSource(PostgresProvider.class)
void testFlatPostgresCollectionFindAll(String dataStoreName) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all these tests are specific to Postgres, worth nesting it under a nested class with @Nested annotation. That way, all related tests would be grouped together.

Also, it would be better to move all the tests (classes and methods) to the beginning of the class (before all the private methods).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I have moved all flat PG tests to the nested class FlatPostgresCollectionTest.

@suddendust
Copy link
Contributor Author

@suresh-prakash @puneet-traceable Requesting a re-review, thanks!

@suresh-prakash suresh-prakash merged commit f0e2bef into hypertrace:main Oct 6, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants