Investigate using approx_top_count for other high-cardinality queries

## Background

In PR #116 (lambda enhancements), we solved an OOM issue on high-cardinality breakdown refinement queries by using ClickHouse's `approx_top_count` function. The technique uses bounded memory to find top-N candidates, then computes exact counts via an `IN` filter, with a `UNION ALL` to get full-dataset totals.

The query pattern:

```sql
WITH top_dims AS (
  SELECT tupleElement(pair, 1) AS dim
  FROM (
    SELECT arrayJoin(approx_top_count(N)(col)) AS pair
    FROM table WHERE ...
  )
)
SELECT col AS dim, count() AS cnt, ...
FROM table WHERE ... AND col IN (SELECT dim FROM top_dims)
GROUP BY dim ORDER BY cnt DESC LIMIT N
UNION ALL
SELECT '' AS dim, count() AS cnt, ...
FROM table WHERE ...
```

This ran in ~4s on 244M rows with 83M unique `message` values, vs the regular `GROUP BY` which OOMed at the 4GB memory limit.

## Investigation Areas

- **Main CDN dashboard high-cardinality breakdowns**: The main dashboard has high-cardinality facets (hosts, URLs, user agents, client IPs, referers, forwarded hosts, redirect locations) that currently use sampling for both initial and refinement passes. Could `approx_top_count` replace the refinement sampling for these too, giving exact counts without OOM risk?

- **Facet search**: `facet-search-initial.sql` and `facet-search-pattern.sql` do `GROUP BY` queries that could benefit from bounded-memory top-N for very high-cardinality columns.

- **Log investigation queries**: `investigate-facet.sql` does `GROUP BY` on arbitrary facets — high-cardinality ones could benefit.

- **Preview breakdowns**: Time range selection preview queries currently use sampling. Could `approx_top_count` provide better accuracy for short preview windows?

## Current Implementation

- SQL template: `sql/queries/breakdown-approx-top.sql`
- Routing logic: `isApproxTopRefinement()` in `js/breakdowns/index.js`
- Totals extraction: `extractApproxTopTotals()` in `js/breakdowns/index.js`
- Tests: `js/breakdowns/approx-top.test.js`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate using approx_top_count for other high-cardinality queries #127

Background

Investigation Areas

Current Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Investigate using approx_top_count for other high-cardinality queries #127

Description

Background

Investigation Areas

Current Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions