Skip to content

Missing Use Case: DISTINCT / DISTINCT ON #425

@pikaju

Description

@pikaju

Overview

DISTINCT eliminates duplicate rows from query results. PostgreSQL extends this with DISTINCT ON (columns), which returns one row per unique combination of the specified columns — a concise alternative to window functions for "first per group" queries. Toasty has no support for either form.

Use Cases

1. Unique Values for Filter Dropdowns

Populating a UI dropdown with the set of distinct values present in a column.

SELECT DISTINCT country FROM users ORDER BY country;
SELECT DISTINCT category FROM products WHERE active = true;

Without DISTINCT, the application must fetch all rows and deduplicate in memory — wasteful when a table has millions of rows but only dozens of distinct values.

2. Distinct Tags / Labels Across Records

Collecting unique tags from a join table or array column.

SELECT DISTINCT tag
FROM post_tags
WHERE post_id IN (SELECT id FROM posts WHERE author_id = 5)
ORDER BY tag;

3. First Row Per Group with DISTINCT ON (PostgreSQL)

DISTINCT ON keeps only the first row for each distinct value of the specified columns, according to the ORDER BY.

-- Most recent order per customer
SELECT DISTINCT ON (customer_id)
    customer_id, id AS order_id, total, created_at
FROM orders
ORDER BY customer_id, created_at DESC;

-- Latest sensor reading per device
SELECT DISTINCT ON (device_id)
    device_id, value, recorded_at
FROM readings
ORDER BY device_id, recorded_at DESC;

This is equivalent to a ROW_NUMBER() ... WHERE rn = 1 window function query but far more concise.

4. Counting Distinct Values

COUNT(DISTINCT ...) is a common aggregation pattern (this also ties into the aggregation gap).

-- How many unique customers ordered this month
SELECT COUNT(DISTINCT customer_id)
FROM orders
WHERE created_at >= DATE_TRUNC('month', NOW());

Current Limitation in Toasty

  • No DISTINCT modifier on select statements in the AST.
  • No DISTINCT ON support.
  • No COUNT(DISTINCT ...) expression.
  • The query builder always returns all matching rows with no deduplication option.

Possible Directions

  • Add a .distinct() modifier on query builders that sets a flag on the generated SELECT.
  • For PostgreSQL, support .distinct_on(field) that emits DISTINCT ON (column).
  • Allow .count_distinct(field) as a scalar aggregation method.
  • For DynamoDB, DISTINCT has no server-side equivalent — the engine would need to deduplicate client-side, or reject with a clear error for large result sets.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions