Commit 6e28280
perf(eap): Add arrayExists filter for attribute name queries (#7832)
Add arrayExists conditions to the WHERE clause when filtering trace item
attributes by substring match. This allows ClickHouse to filter rows in
the PREWHERE stage before loading large attribute arrays, significantly
reducing memory usage and preventing OOM issues.
## Background
The `endpoint_trace_item_attribute_names` endpoint currently filters
attributes using `arrayFilter` in the SELECT clause. This approach
causes ClickHouse to load and process all rows (including large
attribute arrays) before filtering, which leads to OOM issues when many
rows don't contain matching attributes.
## Changes
This PR adds a helper function `_add_substring_match_optimization()`
that adds `arrayExists` conditions to the WHERE clause based on the
attribute type:
- **TYPE_STRING** → `arrayExists(x -> like(x, pattern),
attributes_string)`
- **TYPE_FLOAT/DOUBLE/INT** → `arrayExists(x -> like(x, pattern),
attributes_float)`
- **TYPE_BOOLEAN** → `arrayExists(x -> like(x, pattern),
attributes_bool)`
- **TYPE_UNSPECIFIED** → OR condition checking all three arrays
The optimization is applied after the existing `hasAll` bloom filter
optimization and before the `item_type` check, allowing both filters to
work together.
## How It Works
**Before:**
```sql
SELECT DISTINCT(arrayJoin(arrayFilter(attr -> ... AND like(attr.2, '%span.%'), ...)))
FROM table WHERE project_id = 1 AND date >= ...
-- ❌ Loads ALL rows and their full attribute arrays, then filters in SELECT
```
**After:**
```sql
SELECT DISTINCT(arrayJoin(arrayFilter(attr -> ... AND like(attr.2, '%span.%'), ...)))
FROM table
WHERE project_id = 1 AND date >= ...
AND arrayExists(x -> like(x, '%span.%'), attributes_string) -- ✅ NEW!
-- ✅ ClickHouse filters in PREWHERE before loading arrays
```
## Performance Impact
- Reduces memory usage by filtering rows before loading large attribute
arrays
- Leverages ClickHouse's automatic PREWHERE optimization
- Only processes rows that actually contain matching attributes
- Follows the same pattern as the existing `hasAll` optimization (lines
177-184)
- No impact when `value_substring_match` is empty (guard clause prevents
optimization)
## Testing
All existing tests in
`tests/web/rpc/v1/test_endpoint_trace_item_attribute_names.py` pass
without modification, validating backward compatibility for:
- String, float, double, and boolean attribute types
- Substring matching across all types
- Empty results handling
- Co-occurring attributes with intersection filters
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>1 parent 1d09b82 commit 6e28280
1 file changed
+43
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
| 24 | + | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
108 | 145 | | |
109 | 146 | | |
110 | 147 | | |
| |||
169 | 206 | | |
170 | 207 | | |
171 | 208 | | |
172 | | - | |
| 209 | + | |
173 | 210 | | |
174 | 211 | | |
175 | 212 | | |
| |||
183 | 220 | | |
184 | 221 | | |
185 | 222 | | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
186 | 226 | | |
187 | 227 | | |
188 | 228 | | |
| |||
0 commit comments