Skip to content

Updating method for create(), update() and delete() to handle soft delete by setting deleted_ts#549

Merged
kotkar-pallavi merged 4 commits intolinkedin:masterfrom
kotkar-pallavi:pkotkar/deletedTsInCreate
Jul 8, 2025
Merged

Updating method for create(), update() and delete() to handle soft delete by setting deleted_ts#549
kotkar-pallavi merged 4 commits intolinkedin:masterfrom
kotkar-pallavi:pkotkar/deletedTsInCreate

Conversation

@kotkar-pallavi
Copy link
Copy Markdown
Contributor

@kotkar-pallavi kotkar-pallavi commented Jun 19, 2025

This is PR#3 in a series of PRs which will contain code changes to support usage of deleted_ts column in DAO methods for read and write paths.
Prev:
PR#1 - #539
PR#2 - #544

Summary

  1. Updated create implementation
    • Update the SQL for create to follow a pattern like so:
INSERT INTO metadata_entity_foo (urn, a_urn, lastmodifiedon, lastmodifiedby,a_aspectfoo) 
VALUES (:urn, :a_urn, :lastmodifiedon, :lastmodifiedby,:aspect0) 
ON DUPLICATE KEY UPDATE a_aspectfoo= :aspect0,  deleted_ts = IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), deleted_ts);
  1. Updated update implementation
    • If this asset was previously marked for deletion and still not purged, updating the new aspect values and update delete_ts will be set to NULL. ref: updated SQL
  2. Updated delete implementation

Testing Done

  • Added unit tests
  • Modified unit tests
  • ./gradlew build

Checklist

* @return a collection of the deleted aspects (their value before deletion), each wrapped in an instance of
* {@link ASPECT_UNION}
*/
protected Collection<ASPECT_UNION> deleteCommon(@Nonnull URN urn,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleting this, because reusing the delete() implementation for deleting all aspects (this)
instead of doing Hard Asset deletion (deleting entire row)

// Given: changeLog is disabled
assertFalse(_enableChangeLog);
// When: updateWithOptimisticLocking is called
try {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed these tests for New Schema, since now we will overwrite the aspect value during create even if it was marked as soft deleted using $gma_deleted.

ref: https://github.com/linkedin/datahub-gma/pull/549/files#diff-56020705a401d66f63fe0a2b1afd740f431e55a0ecc2a08fbb4c1ed65da3f559R312

Copy link
Copy Markdown
Contributor

@jphui jphui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed about half the files, super busy today, will take a look at the rest in a sec

If this asset was previously marked for deletion and still not purged, updating the new aspect values and update delete_ts will be set to NULL. ref: updated SQL

Does this mean that "if this record was deleted but not removed from the db ("not purged") and we do update(), then mark the record as not deleted"?

  • Does this go through the upsert check as well?

Comment on lines +1059 to +1061
} catch (NullPointerException e) {
log.warn("Aspect {} for urn {} does not exist", aspectClass.getName(), urn);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this throw on both:

  1. aspect class is not a supported aspect of the asset type?
  2. aspect class is empty in the db?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. i think aspect class not supported aspect of the asset type will throw a ModelValidationException from here: SQLSchemaUtils.java#L225

  2. it will handle aspect class empty in db.
    this scenario is possible for delete asset - user does not specify any aspect names, at service level we populate all aspect class names for delete call.

results.forEach((key, value) -> {
// Check if aspect value present to avoid null pointer exception
DeleteResult deleteResult = new DeleteResult(value, key);
deletedAspects.add(unwrapDeleteResult(urn, deleteResult, auditStamp, trackingContext, ChangeType.DELETE_ALL));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think that TODO is more for ChangeType.DELETE_ALL vs DELETE. This was related to previous Delete impl when we were doing a hard delete. I will remove that TODO, it is not relevant anymore.

  2. We are using unwrapDeleteResult to send events to HS to delete entire document. So i think with the ChangeType.DELETE_ALL we do not need to compare old and new value. please let me know if I am missing something.

Comment on lines +234 to +238
deletedTsCheckForCreate.append(" = :aspect").append(i);
if (i != classNames.size() - 1) {
deletedTsCheckForCreate.append(", ");
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for loop tested in unit testing at all? (is it easy to test?)

Copy link
Copy Markdown
Contributor Author

@kotkar-pallavi kotkar-pallavi Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good callout. this code will be used when more than asset with more than 1 aspect is being in created.
Updated test testCreateAfterAssetMarkedDeleted to create asset with 2 aspects and verify both aspect values with get.

sample vale from unit test:

INSERT INTO metadata_entity_foo (urn, a_urn, lastmodifiedon, lastmodifiedby,a_aspectfoo, a_aspectbar) 
VALUES (:urn, :a_urn, :lastmodifiedon, :lastmodifiedby,:aspect0, :aspect1) 
ON DUPLICATE KEY UPDATE a_aspectfoo = :aspect0, a_aspectbar = :aspect1, deleted_ts = 
    IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), NULL);

@kotkar-pallavi
Copy link
Copy Markdown
Contributor Author

Reviewed about half the files, super busy today, will take a look at the rest in a sec

If this asset was previously marked for deletion and still not purged, updating the new aspect values and update delete_ts will be set to NULL. ref: updated SQL

Does this mean that "if this record was deleted but not removed from the db ("not purged") and we do update(), then mark the record as not deleted"?

Yes, deleted_ts will be set to NULL and only the aspect being updated/created will be updated(!) If there were other aspects previously deleted they will continue to be marked deleted.

  • Does this go through the upsert check as well?

@jphui do you mean the upsert check that is done at the service level based on annotations?

@kotkar-pallavi kotkar-pallavi requested a review from jphui June 20, 2025 23:48
@jphui
Copy link
Copy Markdown
Contributor

jphui commented Jul 7, 2025

Does this go through the upsert check as well?
@jphui do you mean the upsert check that is done at the service level based on annotations?

Yeah more of a sanity check than anything but for cases where upsert is disabled (?) this soft-deletion impl will work properly?

Comment on lines +1345 to +1348
* @return a map of the deleted aspects (their value before deletion), each wrapped in an instance of
* {@link ASPECT_UNION}
*/
protected abstract Map<Class<? extends RecordTemplate>, Optional<? extends RecordTemplate>> permanentDelete(@Nonnull URN urn,
@Nonnull Set<Class<? extends RecordTemplate>> aspectClasses, @Nullable AuditStamp auditStamp,
int maxTransactionRetry, @Nullable IngestionTrackingContext trackingContext, boolean isTestMode);
protected abstract int permanentDelete(@Nonnull URN urn, boolean isTestMode);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update documentation here

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the reasoning behind changing the return type from object to int? what does this help achieve?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, in permanent delete we will only be marking the asset as deleted by setting deleted_ts=NOW(). actual aspect level (sofl) deletion happens in BaseLocalDao.deleteAll() and the map of aspect objects will be returned from there.

here int is for retuning how many db rows were updated (techincally will olways be 1 since 1row=1asset record).

Comment on lines +664 to +666
// do a get to ensure the urn does not already exist
// if exists and deletedTs is null, then throw an exception
// if exists and deletedTs is not null, then update the deletedTs to null and create records
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the inline doc! is this just explaining what the next line "create" will do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, describing high level behavior for create.
i added some comments closer to impl in EBeanLocalAccess.create

* @param oldTimestamp old time stamp for optimistic lock checking
* @param ingestionTrackingContext the ingestionTrackingContext of the MCE responsible for calling this update
* @param isTestMode whether the test mode is enabled or not
* @param softDeleteOverwrite whether to overwrite soft deleted aspects marked with $gma_deleted
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this is set to false? What is the expected behavior at the db level?

Could you add some documentation here clarifying this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing out. i've updated documentation.
i am using softDeleteOverwrite to main same old behavior for old schema. softDeleteOverwrite=false for old schema since we dont have deleted_ts on old schema.

all soft deletion related code is in new schema only since we have added the deleted_ts column only in new schema tables.

Comment on lines +83 to +86
// deleted_ts check on create Statement SQL. This is used to set the deleted_ts to a non-null value
// If a record that is NOT marked for deletion is attempted to be created again, an exception will be thrown
public static final String DELETED_TS_DUPLICATE_KEY_CHECK = "ON DUPLICATE KEY UPDATE ";
public static final String DELETED_TS_CONDITIONAL_VALUE_SET = ", deleted_ts = IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), NULL);";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah this might be the answer to my earlier question about the interaction with upsert?


private static final String SQL_UPDATE_ASPECT_TEMPLATE_WITH_SOFT_DELETE_OVERWRITE =
"UPDATE %s SET %s = :metadata, lastmodifiedon = :lastmodifiedon, lastmodifiedby = :lastmodifiedby "
+ "WHERE urn = :urn ;";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need single quotes for the urn like before? or is this different because of the :?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

single quotes not needed for this parameterized sql string. we do a sqlUpdate.setParameter("urn", urn.toString())

* @param aspectClass aspect class
* @param isTestMode whether the test mode is enabled or not
* Create Update with optimistic locking SQL statement. The SQL UPDATE use old_timestamp as a compareAndSet to check
* if the current update is made on an unchange record. For example: UPDATE table WHERE modifiedon = :oldTimestamp.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the current update is made on an unchange record

What does this mean? That the update will not be made if the record is unchanged?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm not sure. the comment is not changed by me.

Comment on lines +311 to +317
if (softDeleteOverwrite) {
return String.format(urnExtraction ? SQL_UPDATE_ASPECT_WITH_URN_TEMPLATE_WITH_SOFT_DELETE_OVERWRITE
: SQL_UPDATE_ASPECT_TEMPLATE_WITH_SOFT_DELETE_OVERWRITE, tableName, columnName, columnName, columnName);
} else {
return String.format(urnExtraction ? SQL_UPDATE_ASPECT_WITH_URN_TEMPLATE : SQL_UPDATE_ASPECT_TEMPLATE, tableName,
columnName, columnName, columnName);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the differences, SOFT_DELETE does not set deleted_ts to NULL.

Is this basically saying that if softDeleteOverwrite is enabled, it will "not un-delete" something by NOT modifying the deleted_ts column?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am using softDeleteOverwrite to differentiate between updates in old and new schema.
i have added deleted_ts column in all new schema tables.

since we have migrated away from old schema i am keeping the previous impl as is for old schema. old schema does not have deleted_ts because it was like a versioned log. we can clean up this conditional when we remove all the old schema code.

// deleted_ts check on create Statement SQL. This is used to set the deleted_ts to a non-null value
// If a record that is NOT marked for deletion is attempted to be created again, an exception will be thrown
public static final String DELETED_TS_DUPLICATE_KEY_CHECK = "ON DUPLICATE KEY UPDATE ";
public static final String DELETED_TS_CONDITIONAL_VALUE_SET = ", deleted_ts = IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), NULL);";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does CAST('DuplicateKeyException' AS UNSIGNED) do?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am using it to throw an exception from the IF block.
IF deleted_ts in not null (asset not deleted) and we try to Create asset with same urn, then throw an exception.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test for test Create before asset marked deleted to test the "throws error if create() is run on something that has deleted_ts = null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@jphui jphui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing comments and clarifying!

This is a relatively major change, make sure to give a heads up to the team for what commits to metadata-models contain this :)

Will do my best to take into account how this will affect (if at all) my soft-delete impl (#534)

@kotkar-pallavi kotkar-pallavi merged commit 8f17ced into linkedin:master Jul 8, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants