-
Notifications
You must be signed in to change notification settings - Fork 67
Description
Overview
DISTINCT eliminates duplicate rows from query results. PostgreSQL extends this with DISTINCT ON (columns), which returns one row per unique combination of the specified columns — a concise alternative to window functions for "first per group" queries. Toasty has no support for either form.
Use Cases
1. Unique Values for Filter Dropdowns
Populating a UI dropdown with the set of distinct values present in a column.
SELECT DISTINCT country FROM users ORDER BY country;
SELECT DISTINCT category FROM products WHERE active = true;Without DISTINCT, the application must fetch all rows and deduplicate in memory — wasteful when a table has millions of rows but only dozens of distinct values.
2. Distinct Tags / Labels Across Records
Collecting unique tags from a join table or array column.
SELECT DISTINCT tag
FROM post_tags
WHERE post_id IN (SELECT id FROM posts WHERE author_id = 5)
ORDER BY tag;3. First Row Per Group with DISTINCT ON (PostgreSQL)
DISTINCT ON keeps only the first row for each distinct value of the specified columns, according to the ORDER BY.
-- Most recent order per customer
SELECT DISTINCT ON (customer_id)
customer_id, id AS order_id, total, created_at
FROM orders
ORDER BY customer_id, created_at DESC;
-- Latest sensor reading per device
SELECT DISTINCT ON (device_id)
device_id, value, recorded_at
FROM readings
ORDER BY device_id, recorded_at DESC;This is equivalent to a ROW_NUMBER() ... WHERE rn = 1 window function query but far more concise.
4. Counting Distinct Values
COUNT(DISTINCT ...) is a common aggregation pattern (this also ties into the aggregation gap).
-- How many unique customers ordered this month
SELECT COUNT(DISTINCT customer_id)
FROM orders
WHERE created_at >= DATE_TRUNC('month', NOW());Current Limitation in Toasty
- No
DISTINCTmodifier on select statements in the AST. - No
DISTINCT ONsupport. - No
COUNT(DISTINCT ...)expression. - The query builder always returns all matching rows with no deduplication option.
Possible Directions
- Add a
.distinct()modifier on query builders that sets a flag on the generatedSELECT. - For PostgreSQL, support
.distinct_on(field)that emitsDISTINCT ON (column). - Allow
.count_distinct(field)as a scalar aggregation method. - For DynamoDB,
DISTINCThas no server-side equivalent — the engine would need to deduplicate client-side, or reject with a clear error for large result sets.