Skip to content

feat: add countdistinct() aggregate function #4957

Draft
datavaultbuilder wants to merge 4 commits into
PostgREST:mainfrom
datavaultbuilder:feat/countdistinct-aggregate
Draft

feat: add countdistinct() aggregate function #4957
datavaultbuilder wants to merge 4 commits into
PostgREST:mainfrom
datavaultbuilder:feat/countdistinct-aggregate

Conversation

@datavaultbuilder

Copy link
Copy Markdown

Use Case: If I want to know the customers from the order table I need to do a count distinct

Solution: adding new count(distinct field) as countdistinct() and not as count(distinct). I took decision not to need to import params in the aggregates. But if you think count(distinct) would have been better: it would be about 10-15 lines more code and possible. But as I'm not familiar with the code the change looked more risky with count(distinct) as I couldn't foresee if I could potentially create some performance or memory issue.

Tested to work, that is is gated if aggs are off. tested against collisions with tables and columns named countdistincts.

Returns based on postgres rules name count(). (i.e. not countdistinct)

Added tests for the different cases if the amounts are correct.

Docs Updated.

This is my first PR so I'm open for feedback.

added count distinct as countdistinct agg. tested to work, that is is gated if aggs are off. tested against collisions with tables and columns named countdistincts. Returns based on postgres rules name count().
@steve-chavez

Copy link
Copy Markdown
Member

Motivation on #915 (comment). Please check if you've followed https://github.com/PostgREST/postgrest/blob/main/CONTRIBUTING.md#ai-policy.

Added count(distinct) as
  countdistinct aggregate. Tested to work, and that it is gated when aggregates are off. Tested against collisions with
  tables and columns named countdistinct. Returns type based on Postgres rules for count(). Extended test coverage to paths including spread path.
Added count(distinct) as countdistinct aggregate. Tested to work, and that it is gated when aggregates are off. Tested against collisions with tables and columns named countdistinct. Returns type based on Postgres rules for count().
@datavaultbuilder datavaultbuilder force-pushed the feat/countdistinct-aggregate branch from 233a4f5 to 3142d4c Compare May 27, 2026 17:58
@wolfgangwalther

Copy link
Copy Markdown
Member

adding new count(distinct field) as countdistinct()

So, the reason we are limited to the few aggregates as avg, min, max, sum and count right now is, I believe, not that we'd never want to allow others - but more that we haven't found a good way to solve some of the challenges around these, yet.

In the general case, the distinct modifier is useful for other aggregates than count, too. With that in mind, I think making this part of the aggregate "name" is the wrong way to do it.

I'm not sure whether it would work to implement this as a "modifier" as count.distinct or count().distinct or so?

@datavaultbuilder

Copy link
Copy Markdown
Author

Hi Wolfgang

Thank you for your feedback.

I'm really open what you think is usefull to rewrite the code. That is fast done. Just tell me what fits better in the general architecture.
Even I have never found a use case for sum distinct and avg distinct in some database they were implemented - including Postgres. So could be that somebody is interested in future to implement them but I would do that only if somebody explains what it is used in the real world for so I can add as well a meaningful test case.

Just a gut fealing about: count.distinct or count().distinct => they would be still different with the sql count(distinct ). For the first one would it be: count.distinct() ? For the second: count().distinct => just personal preference: we would then have 2 cases: one with modifier and one without - not impossible but would opt probably for count.distinct().

Just tell me what the prefered way is and I update my PR.

Regards
Petr

@steve-chavez

Copy link
Copy Markdown
Member

I'm not sure whether it would work to implement this as a "modifier" as count.distinct or count().distinct or so?
but would opt probably for count.distinct().

We already have a syntax for operators modifiers which unfortunately does not seem usable for aggregates modifiers. IMO count.distinct() seems like the better of choice of the above but that would be inconsistent.

Another idea could be using count~ for count distinct, we're still not using the url safe tilde ~. Before I thought about using it as a shorthand for negated operators (~eq instead of not.eq) but maybe it can fit for this case too?

@wolfgangwalther

Copy link
Copy Markdown
Member

Another idea could be using count~ for count distinct, we're still not using the url safe tilde ~. Before I thought about using it as a shorthand for negated operators (~eq instead of not.eq) but maybe it can fit for this case too?

I'm opposed to that. A tilde is really just something, but imho not semantically connected to the distinct operation. It will be hard to understand - and additionally it's limiting, in case we'll want to support other modifiers.

We already have a syntax for operators modifiers which unfortunately does not seem usable for aggregates modifiers. IMO count.distinct() seems like the better of choice of the above but that would be inconsistent.

I think a better analogy for the distinct operation is actually !inner. Here's why: We have long thought about aggregated function calls and resource embeddings as a very similar thing. We made a huge step forward with computed relationships. (note: embeddings, computed or not, always have an aggregation step as well...).

What !inner does is bad - because it also modifies the top-level rows. We had discussed elsewhere that we should have ideally done it differently.

But could we make count!distinct() work?

@steve-chavez

Copy link
Copy Markdown
Member

But could we make count!distinct() work?

Agree with count!distinct(), now it's consistent.

What !inner does is bad - because it also modifies the top-level rows. We had discussed elsewhere that we should have ideally done it differently.

The idea was to replace !inner with null filtering on embeds, but IIRC there were some incompatibilities.

@wolfgangwalther

Copy link
Copy Markdown
Member

Marking as draft until the feedback has been addressed and/or it has been confirmed that the count!distinct() syntax will actually work.

@wolfgangwalther wolfgangwalther marked this pull request as draft June 26, 2026 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants