Skip to content

Commit 6e28280

Browse files
volokluevclaude
andauthored
perf(eap): Add arrayExists filter for attribute name queries (#7832)
Add arrayExists conditions to the WHERE clause when filtering trace item attributes by substring match. This allows ClickHouse to filter rows in the PREWHERE stage before loading large attribute arrays, significantly reducing memory usage and preventing OOM issues. ## Background The `endpoint_trace_item_attribute_names` endpoint currently filters attributes using `arrayFilter` in the SELECT clause. This approach causes ClickHouse to load and process all rows (including large attribute arrays) before filtering, which leads to OOM issues when many rows don't contain matching attributes. ## Changes This PR adds a helper function `_add_substring_match_optimization()` that adds `arrayExists` conditions to the WHERE clause based on the attribute type: - **TYPE_STRING** → `arrayExists(x -> like(x, pattern), attributes_string)` - **TYPE_FLOAT/DOUBLE/INT** → `arrayExists(x -> like(x, pattern), attributes_float)` - **TYPE_BOOLEAN** → `arrayExists(x -> like(x, pattern), attributes_bool)` - **TYPE_UNSPECIFIED** → OR condition checking all three arrays The optimization is applied after the existing `hasAll` bloom filter optimization and before the `item_type` check, allowing both filters to work together. ## How It Works **Before:** ```sql SELECT DISTINCT(arrayJoin(arrayFilter(attr -> ... AND like(attr.2, '%span.%'), ...))) FROM table WHERE project_id = 1 AND date >= ... -- ❌ Loads ALL rows and their full attribute arrays, then filters in SELECT ``` **After:** ```sql SELECT DISTINCT(arrayJoin(arrayFilter(attr -> ... AND like(attr.2, '%span.%'), ...))) FROM table WHERE project_id = 1 AND date >= ... AND arrayExists(x -> like(x, '%span.%'), attributes_string) -- ✅ NEW! -- ✅ ClickHouse filters in PREWHERE before loading arrays ``` ## Performance Impact - Reduces memory usage by filtering rows before loading large attribute arrays - Leverages ClickHouse's automatic PREWHERE optimization - Only processes rows that actually contain matching attributes - Follows the same pattern as the existing `hasAll` optimization (lines 177-184) - No impact when `value_substring_match` is empty (guard clause prevents optimization) ## Testing All existing tests in `tests/web/rpc/v1/test_endpoint_trace_item_attribute_names.py` pass without modification, validating backward compatibility for: - String, float, double, and boolean attribute types - Substring matching across all types - Empty results handling - Co-occurring attributes with intersection filters Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent 1d09b82 commit 6e28280

File tree

1 file changed

+43
-3
lines changed

1 file changed

+43
-3
lines changed

snuba/web/rpc/v1/endpoint_trace_item_attribute_names.py

Lines changed: 43 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@
2121
from snuba.query import OrderBy, OrderByDirection, SelectedExpression
2222
from snuba.query.data_source.simple import Storage
2323
from snuba.query.dsl import Functions as f
24-
from snuba.query.dsl import and_cond, column, in_cond, not_cond
25-
from snuba.query.expressions import Expression, Lambda
24+
from snuba.query.dsl import and_cond, column, in_cond, not_cond, or_cond
25+
from snuba.query.expressions import Argument, Expression, FunctionCall, Lambda
2626
from snuba.query.logical import Query
2727
from snuba.query.query_settings import HTTPQuerySettings
2828
from snuba.reader import Row
@@ -105,6 +105,43 @@ def get_co_occurring_attributes_date_condition(
105105
)
106106

107107

108+
def _add_substring_match_optimization(
109+
request: TraceItemAttributeNamesRequest,
110+
condition: Expression,
111+
) -> FunctionCall | Expression:
112+
"""Add arrayExists to WHERE clause to filter rows before loading arrays.
113+
114+
This reduces memory usage by only processing rows with matching attributes.
115+
Similar to the hasAll optimization, this allows ClickHouse to use PREWHERE
116+
and filter rows before loading large attribute arrays.
117+
"""
118+
if not request.value_substring_match:
119+
return condition
120+
121+
pattern = f"%{request.value_substring_match}%"
122+
like_lambda = Lambda(None, ("x",), f.like(Argument(None, "x"), pattern))
123+
124+
if request.type == AttributeKey.Type.TYPE_STRING:
125+
return and_cond(condition, f.arrayExists(like_lambda, column("attributes_string")))
126+
elif request.type in (
127+
AttributeKey.Type.TYPE_FLOAT,
128+
AttributeKey.Type.TYPE_DOUBLE,
129+
AttributeKey.Type.TYPE_INT,
130+
):
131+
return and_cond(condition, f.arrayExists(like_lambda, column("attributes_float")))
132+
elif request.type == AttributeKey.Type.TYPE_BOOLEAN:
133+
return and_cond(condition, f.arrayExists(like_lambda, column("attributes_bool")))
134+
else: # TYPE_UNSPECIFIED - check all arrays with OR
135+
return and_cond(
136+
condition,
137+
or_cond(
138+
f.arrayExists(like_lambda, column("attributes_string")),
139+
f.arrayExists(like_lambda, column("attributes_float")),
140+
f.arrayExists(like_lambda, column("attributes_bool")),
141+
),
142+
)
143+
144+
108145
def get_co_occurring_attributes(
109146
request: TraceItemAttributeNamesRequest,
110147
) -> SnubaRequest:
@@ -169,7 +206,7 @@ def get_co_occurring_attributes(
169206
sample=None,
170207
)
171208

172-
condition = and_cond(
209+
condition: Expression = and_cond(
173210
project_id_and_org_conditions(request.meta),
174211
get_co_occurring_attributes_date_condition(request),
175212
)
@@ -183,6 +220,9 @@ def get_co_occurring_attributes(
183220
),
184221
)
185222

223+
# Optimization: Add arrayExists to WHERE clause to filter rows before loading arrays
224+
condition = _add_substring_match_optimization(request, condition)
225+
186226
if request.meta.trace_item_type != TraceItemType.TRACE_ITEM_TYPE_UNSPECIFIED:
187227
condition = and_cond(f.equals(column("item_type"), request.meta.trace_item_type), condition)
188228

0 commit comments

Comments
 (0)