Skip to content

Use JSONB containment for GIN-friendly EQ filters#3646

Open
sambhav wants to merge 4 commits intoKinto:mainfrom
sambhav:gin-containment-optimization
Open

Use JSONB containment for GIN-friendly EQ filters#3646
sambhav wants to merge 4 commits intoKinto:mainfrom
sambhav:gin-containment-optimization

Conversation

@sambhav
Copy link
Contributor

@sambhav sambhav commented Feb 15, 2026

Summary

  • Rewrites _format_conditions to emit data @> '{"field": value}' (JSONB containment) instead of data->'field' = 'value'::jsonb for EQ filters on scalar data fields (str, int, float, bool, None)
  • Rewrites CONTAINS filters to use top-level containment (data @> '{"field": [values]}') instead of sub-expression containment (data->'field' @> '[values]'), removing the now-redundant jsonb_typeof guard
  • Array/object EQ values keep using arrow extraction to preserve exact equality semantics (@> uses superset matching for non-scalars)
  • Normalizes _format_sorting JSONB accessor expressions to match _format_conditions format (removes redundant parentheses)
  • Documents the recommended GIN index in the Storage class docstring

Why this matters

The @> containment operator is the only JSONB operator that GIN indexes accelerate. Previously, EQ used data->'field' = value and CONTAINS used data->'field' @> value — neither form can use a GIN index on data. By rewriting both to top-level data @> '{"field": value}', a single GIN index accelerates all equality and array-contains queries across all collections and fields:

CREATE INDEX CONCURRENTLY idx_objects_data_gin
    ON objects USING gin (data jsonb_path_ops)
    WHERE NOT deleted;

Without this index, the query rewrites have zero performance impact — they're semantically equivalent to the old form. The index is intentionally not auto-created; it's documented as an optional optimization for large deployments.

What the GIN index accelerates:

  • ?status=activedata @> '{"status": "active"}'
  • ?person.name=Alicedata @> '{"person": {"name": "Alice"}}'
  • ?contains_colors=reddata @> '{"colors": ["red"]}'

What it does NOT accelerate:

  • Range filters (min_, max_, gt_, lt_)
  • LIKE/text search
  • contains_any_ (uses && array overlap)
  • Sorting on JSONB fields

Test plan

  • 19 new unit tests for SQL generation (EQ scalars, EQ arrays/objects, CONTAINS, CONTAINS_ANY, nested fields, non-EQ operators, id/modified exclusion, sorting normalization)
  • 2 new integration tests for array/object exact equality
  • All 183 existing non-PostgreSQL storage tests pass
  • All 64 filter/sort resource tests pass

🤖 Generated with Claude Code

sambhav and others added 4 commits February 15, 2026 20:57
Rewrite _format_conditions to emit `data @> '{"field": value}'` instead
of `data->'field' = 'value'::jsonb` for equality filters on scalar data
fields (str, int, float, bool, None). This is semantically equivalent
for scalars but enables GIN index acceleration when a
`gin(data jsonb_path_ops)` index exists on the objects table.

Array and object EQ values still use the arrow extraction path to
preserve exact equality semantics (containment uses superset matching
for non-scalars).

Also normalizes _format_sorting JSONB accessor expressions to match
_format_conditions format (removing redundant parentheses around
placeholders), ensuring expression indexes work for both filter and
sort queries.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rewrite CONTAINS filters to use `data @> '{"field": [values]}'` instead
of `data->'field' @> '[values]'`. Top-level containment allows a GIN
index on the data column to accelerate these queries. The jsonb_typeof
guard is no longer needed since containment already returns false when
the field is not an array.

Add documentation to the Storage class docstring describing the
recommended GIN index, what it accelerates, what it doesn't, and
approximate sizing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Expand the GIN index documentation with three options:

1. Recommended: partial index with WHERE resource_name = 'record'
   (smallest, scoped to actual records, works with psycopg2)
2. Basic: partial index with WHERE NOT deleted only
   (driver-independent fallback)
3. Composite: btree_gin extension with parent_id + resource_name
   in the GIN index (single index scan, no BitmapAnd needed)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add tests covering:
- CONTAINS fallback for id/modified fields (covers dead-code branch)
- EQ with falsy scalars: None, empty string, 0, False
- EQ with deeply nested fields (a.b.c)
- EQ with empty arrays/objects (must NOT use containment)
- CONTAINS with numeric arrays and object elements
- CONTAINS with nested fields

These tests verify that the @> rewrite produces identical behavior
to the old arrow extraction form across all data types and edge cases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant