Skip to content

Fix purge_deleted() to only delete tombstones on PostgreSQL backend#3644

Open
sambhav wants to merge 1 commit intoKinto:mainfrom
sambhav:fix-purge-deleted-tombstones-only
Open

Fix purge_deleted() to only delete tombstones on PostgreSQL backend#3644
sambhav wants to merge 1 commit intoKinto:mainfrom
sambhav:fix-purge-deleted-tombstones-only

Conversation

@sambhav
Copy link
Contributor

@sambhav sambhav commented Feb 15, 2026

Problem

The purge_deleted() method's base class contract and docstring specify that it should delete "all deleted object tombstones" (line 271 in kinto/core/storage/__init__.py). The memory backend correctly operates only on tombstone storage (self._cemetery), never touching live objects in self._store.

However, the PostgreSQL backend's DELETE queries lacked a deleted = TRUE filter in both execution paths:

Before Path

The DELETE query filtered only by parent_id, resource_name, and timestamp:

DELETE FROM objects
WHERE parent_id = :parent_id
      AND resource_name = :resource_name
      AND as_epoch(last_modified) < :before

This deletes ALL objects (live AND tombstones) older than before.

Max Retained Path

The ROW_NUMBER window function ranked ALL objects together:

WITH ranked AS (
    SELECT id AS objid, parent_id, resource_name,
           ROW_NUMBER() OVER (
               PARTITION BY parent_id, resource_name
               ORDER BY last_modified DESC
           ) AS rn
    FROM objects   -- NO deleted filter
)
DELETE FROM objects WHERE id IN (...)

If max_retained = 100 and you have 80 live + 50 tombstones, it keeps the 100 most recent regardless of type and deletes 30 — potentially deleting live records.

This means calling purge_deleted() on the PostgreSQL backend could silently destroy live data. This is a correctness bug that can result in data loss when running the purge_deleted maintenance script.

Solution

Added AND deleted = TRUE filters to both execution paths to ensure only tombstones are deleted:

Before Path

DELETE FROM objects
WHERE parent_id = :parent_id
      AND resource_name = :resource_name
      AND deleted = TRUE          -- NEW
      AND as_epoch(last_modified) < :before

Max Retained Path

WITH ranked AS (
    SELECT id AS objid, parent_id, resource_name,
           ROW_NUMBER() OVER (...) AS rn
    FROM objects
    WHERE deleted = TRUE           -- NEW: Only rank tombstones
)
DELETE FROM objects WHERE id IN (...)

This aligns the PostgreSQL backend with:

  • The base class contract: "Delete all deleted object tombstones"
  • The memory backend implementation (only operates on self._cemetery)
  • The documented behavior and user expectations

Testing

Added three comprehensive test cases to BaseTestStorage that verify the fix:

  1. test_purge_deleted_with_before_only_deletes_tombstones

    • Creates 2 live records and 2 tombstones
    • Calls purge_deleted() with a before timestamp
    • Verifies only tombstones are deleted and all live records remain intact
  2. test_purge_deleted_with_max_retained_only_affects_tombstones

    • Creates 3 live records and 5 tombstones
    • Calls purge_deleted() with max_retained=2
    • Verifies only tombstones are purged (3 removed, 2 retained)
    • Verifies all 3 live records remain untouched
  3. test_purge_deleted_without_before_only_deletes_tombstones

    • Creates interleaved mix of 2 live records and 2 tombstones
    • Calls purge_deleted() without parameters to purge all tombstones
    • Verifies only the 2 tombstones are deleted and both live records remain

These tests exercise both code paths and verify that live records are never affected by purge_deleted() operations.

Impact

Risk: Very low

  • This aligns the PostgreSQL backend with the documented behavior
  • The memory backend already implements this correctly
  • Existing tests only call purge_deleted() on actual tombstones, so they continue to pass
  • No schema migration required (SQL-only change)

Effect: Prevents silent data loss in production environments when running maintenance scripts

Files Changed

  • kinto/core/storage/postgresql/__init__.py - Added deleted = TRUE filters to both code paths
  • kinto/core/storage/testing.py - Added 3 regression test cases

🤖 Generated with Claude Code

@sambhav sambhav force-pushed the fix-purge-deleted-tombstones-only branch from 0c50549 to 3a46d98 Compare February 15, 2026 16:24
The purge_deleted() method's base class contract and docstring specify
that it deletes "tombstones" (soft-deleted objects). The memory backend
correctly operates only on tombstone storage (self._cemetery). However,
the PostgreSQL backend's DELETE queries lacked a deleted = TRUE filter,
meaning purge_deleted with a before timestamp or max_retained would
delete both live records and tombstones.

This can result in silent data loss when running the purge_deleted
maintenance script.

What changed:
- Added AND deleted = TRUE to the before-path DELETE query
- Added WHERE deleted = TRUE to the max_retained-path CTE so the
  ranking window only considers tombstones
- Added three test cases that create live records + tombstones, call
  purge_deleted, and verify only tombstones are removed

Test coverage:
- test_purge_deleted_with_before_only_deletes_tombstones: Tests the
  before parameter path with mixed live/deleted objects
- test_purge_deleted_with_max_retained_only_affects_tombstones: Tests
  the max_retained parameter with 3 live + 5 tombstones
- test_purge_deleted_without_before_only_deletes_tombstones: Tests
  purging all tombstones while preserving live records

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@sambhav sambhav force-pushed the fix-purge-deleted-tombstones-only branch from 3a46d98 to aa52ced Compare February 15, 2026 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant