Use JSONB containment for GIN-friendly EQ filters#3646
Open
sambhav wants to merge 4 commits intoKinto:mainfrom
Open
Use JSONB containment for GIN-friendly EQ filters#3646sambhav wants to merge 4 commits intoKinto:mainfrom
sambhav wants to merge 4 commits intoKinto:mainfrom
Conversation
Rewrite _format_conditions to emit `data @> '{"field": value}'` instead
of `data->'field' = 'value'::jsonb` for equality filters on scalar data
fields (str, int, float, bool, None). This is semantically equivalent
for scalars but enables GIN index acceleration when a
`gin(data jsonb_path_ops)` index exists on the objects table.
Array and object EQ values still use the arrow extraction path to
preserve exact equality semantics (containment uses superset matching
for non-scalars).
Also normalizes _format_sorting JSONB accessor expressions to match
_format_conditions format (removing redundant parentheses around
placeholders), ensuring expression indexes work for both filter and
sort queries.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite CONTAINS filters to use `data @> '{"field": [values]}'` instead
of `data->'field' @> '[values]'`. Top-level containment allows a GIN
index on the data column to accelerate these queries. The jsonb_typeof
guard is no longer needed since containment already returns false when
the field is not an array.
Add documentation to the Storage class docstring describing the
recommended GIN index, what it accelerates, what it doesn't, and
approximate sizing.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand the GIN index documentation with three options: 1. Recommended: partial index with WHERE resource_name = 'record' (smallest, scoped to actual records, works with psycopg2) 2. Basic: partial index with WHERE NOT deleted only (driver-independent fallback) 3. Composite: btree_gin extension with parent_id + resource_name in the GIN index (single index scan, no BitmapAnd needed) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests covering: - CONTAINS fallback for id/modified fields (covers dead-code branch) - EQ with falsy scalars: None, empty string, 0, False - EQ with deeply nested fields (a.b.c) - EQ with empty arrays/objects (must NOT use containment) - CONTAINS with numeric arrays and object elements - CONTAINS with nested fields These tests verify that the @> rewrite produces identical behavior to the old arrow extraction form across all data types and edge cases. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_format_conditionsto emitdata @> '{"field": value}'(JSONB containment) instead ofdata->'field' = 'value'::jsonbfor EQ filters on scalar data fields (str, int, float, bool, None)data @> '{"field": [values]}') instead of sub-expression containment (data->'field' @> '[values]'), removing the now-redundantjsonb_typeofguard@>uses superset matching for non-scalars)_format_sortingJSONB accessor expressions to match_format_conditionsformat (removes redundant parentheses)Why this matters
The
@>containment operator is the only JSONB operator that GIN indexes accelerate. Previously, EQ useddata->'field' = valueand CONTAINS useddata->'field' @> value— neither form can use a GIN index ondata. By rewriting both to top-leveldata @> '{"field": value}', a single GIN index accelerates all equality and array-contains queries across all collections and fields:Without this index, the query rewrites have zero performance impact — they're semantically equivalent to the old form. The index is intentionally not auto-created; it's documented as an optional optimization for large deployments.
What the GIN index accelerates:
?status=active→data @> '{"status": "active"}'?person.name=Alice→data @> '{"person": {"name": "Alice"}}'?contains_colors=red→data @> '{"colors": ["red"]}'What it does NOT accelerate:
min_,max_,gt_,lt_)contains_any_(uses&&array overlap)Test plan
🤖 Generated with Claude Code