Skip to content

Detect asset-level deletion in backfill staleness checks#603

Open
rakhiagr wants to merge 9 commits intolinkedin:masterfrom
rakhiagr:rakhi/fix-gap4-asset-level-deletion
Open

Detect asset-level deletion in backfill staleness checks#603
rakhiagr wants to merge 9 commits intolinkedin:masterfrom
rakhiagr:rakhi/fix-gap4-asset-level-deletion

Conversation

@rakhiagr
Copy link
Copy Markdown
Contributor

Summary

  • When an entity is deleted via deleteAll()/softDeleteAsset(), deleted_ts = NOW() is set on the entity row. batchGetUnion() always filters by AND deleted_ts IS NULL, making asset-deleted entities invisible — indistinguishable from "never existed" — so shouldBackfill() unconditionally allows stale backfills, resurrecting deleted entities.
  • Fix: when batchGetUnion() returns empty in the new-schema path, perform a fallback query to check deleted_ts. If the entity is asset-deleted, return a synthetic soft-deleted EbeanMetadataAspect with the deletion timestamp, so shouldBackfill() correctly compares emitTime > deletionTimestamp and rejects stale events.

RFC: https://docs.google.com/document/d/1zOKrxbm_zGxTpMwJ7Wer7owY-YcI0nXDvY6DWRGXj0M/edit?tab=t.0#heading=h.nwcowzxl6in9

Depends on: #602 (Gap 1 + Gap 2 fix)

Backward Compatibility

  • The fallback query only runs when batchGetUnion() returns empty in the NEW_SCHEMA_ONLY path — no impact on existing read flows
  • Entities that are not asset-deleted are unaffected (fallback returns null, existing behavior preserved)

Testing Done

Unit tests added:

  • testGetAssetDeletionTimestampSql — SQL generation for normal mode
  • testGetAssetDeletionTimestampSqlTestMode — SQL generation for test mode
  • testGetAssetDeletionTimestampReturnsNullForActiveEntity — active entity returns null
  • testGetAssetDeletionTimestampReturnsNullForNonExistentEntity - non-existent entity returns null
  • testGetAssetDeletionTimestampReturnsTimestampForDeletedEntity — asset-deleted entity returns timestamp
  • testGetAssetDeletionTimestampInvisibleToBatchGetUnion — confirms batchGetUnion returns empty but getAssetDeletionTimestamp finds the deleted entity

Checklist

rakhiagr and others added 9 commits March 5, 2026 18:14
Tests cover:
- Stale emitTime (50 < deletion time 100) → no-op
- Newer emitTime (200 >= deletion time 100) → write succeeds (resurrection)
- Missing emitTime with soft-deleted aspect → no-op

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The else-if already implies isSoftDeleted is true since the first
branch checked !isSoftDeleted. Simplifies to just `else if (oldValue == null)`.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…boundary test

- Change >= to > in soft-deleted emitTime comparison for consistency with
  the non-deleted branch. Events at the exact same ms as deletion are treated
  as stale (likely in-flight when delete occurred).
- Add log.warn when a backfill has valid emitTime but the soft-deleted entry
  has no deletion timestamp — makes silent no-ops diagnosable.
- Add boundary test: emitTime == deletionTimestamp → no-op.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…JSON

Previously, soft-deleted aspects stored only {"gma_deleted":true} with no
deletion metadata. The shouldBackfill() logic (Gap 1 fix) compared emitTime
against the entity-level lastmodifiedon column, which gets corrupted by
unrelated aspect updates on the same entity.

Changes (new schema only):
- SoftDeletedAspect.pdl: Add optional deleted_timestamp and deleted_by fields
- EbeanLocalAccess: Write enriched JSON with deletion timestamp and actor
- EBeanDAOUtils.readSqlRow: Extract per-aspect deleted_timestamp from JSON,
  fall back to entity-level lastmodifiedon for legacy rows
- EBeanDAOUtils.isSoftDeletedAspect: Use parsing-based detection instead of
  exact string equals, so enriched JSON is correctly identified
- Tests for buildDeletedValue, isSoftDeletedMetadata, enriched write path

Old schema and dual schema paths are unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When an entity is deleted via deleteAll()/softDeleteAsset(), the entity
row's deleted_ts is set to NOW(). batchGetUnion() filters by
deleted_ts IS NULL, making these rows invisible and indistinguishable
from "never existed" — allowing stale backfills to resurrect them.

Fix: When batchGetUnion() returns empty in the new-schema path, query
deleted_ts for the URN. If the entity is asset-deleted, return a
synthetic soft-deleted EbeanMetadataAspect with the deletion timestamp
so shouldBackfill() can correctly compare emitTime > deletionTimestamp.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant