Updating method for create(), update() and delete() to handle soft delete by setting deleted_ts#549
Conversation
…lete by setting deleted_ts
| * @return a collection of the deleted aspects (their value before deletion), each wrapped in an instance of | ||
| * {@link ASPECT_UNION} | ||
| */ | ||
| protected Collection<ASPECT_UNION> deleteCommon(@Nonnull URN urn, |
There was a problem hiding this comment.
deleting this, because reusing the delete() implementation for deleting all aspects (this)
instead of doing Hard Asset deletion (deleting entire row)
| // Given: changeLog is disabled | ||
| assertFalse(_enableChangeLog); | ||
| // When: updateWithOptimisticLocking is called | ||
| try { |
There was a problem hiding this comment.
removed these tests for New Schema, since now we will overwrite the aspect value during create even if it was marked as soft deleted using $gma_deleted.
jphui
left a comment
There was a problem hiding this comment.
Reviewed about half the files, super busy today, will take a look at the rest in a sec
If this asset was previously marked for deletion and still not purged, updating the new aspect values and update delete_ts will be set to NULL. ref: updated SQL
Does this mean that "if this record was deleted but not removed from the db ("not purged") and we do update(), then mark the record as not deleted"?
- Does this go through the upsert check as well?
| } catch (NullPointerException e) { | ||
| log.warn("Aspect {} for urn {} does not exist", aspectClass.getName(), urn); | ||
| } |
There was a problem hiding this comment.
Will this throw on both:
- aspect class is not a supported aspect of the asset type?
- aspect class is empty in the db?
There was a problem hiding this comment.
-
i think aspect class not supported aspect of the asset type will throw a ModelValidationException from here: SQLSchemaUtils.java#L225
-
it will handle aspect class empty in db.
this scenario is possible for delete asset - user does not specify any aspect names, at service level we populate all aspect class names for delete call.
| results.forEach((key, value) -> { | ||
| // Check if aspect value present to avoid null pointer exception | ||
| DeleteResult deleteResult = new DeleteResult(value, key); | ||
| deletedAspects.add(unwrapDeleteResult(urn, deleteResult, auditStamp, trackingContext, ChangeType.DELETE_ALL)); |
There was a problem hiding this comment.
Few questions / checks:
- seems there's a TODO related to this: https://jarvis.corp.linkedin.com/codesearch/result/?name=KafkaMetadataEventProducerUtil.java&path=metadata-models%2Fmetadata-dao-impl%2Fkafka-producer%2Fsrc%2Fmain%2Fjava%2Fcom%2Flinkedin%2Fmetadata%2Fdao%2Fproducer&reponame=linkedin-multiproduct%2Fmetadata-models#99
- are there any remaining AI's here that could affect this impl?
- I notice the implementation for
unwrapDeleteResult()is different for the emission: https://jarvis.corp.linkedin.com/codesearch/result/?name=BaseLocalDAO.java&path=datahub-gma%2Fdao-api%2Fsrc%2Fmain%2Fjava%2Fcom%2Flinkedin%2Fmetadata%2Fdao&reponame=linkedin%2Fdatahub-gma#819
- is it OK there is no equality check?
There was a problem hiding this comment.
-
I think that TODO is more for ChangeType.DELETE_ALL vs DELETE. This was related to previous Delete impl when we were doing a hard delete. I will remove that TODO, it is not relevant anymore.
-
We are using
unwrapDeleteResultto send events to HS to delete entire document. So i think with theChangeType.DELETE_ALLwe do not need to compare old and new value. please let me know if I am missing something.
| deletedTsCheckForCreate.append(" = :aspect").append(i); | ||
| if (i != classNames.size() - 1) { | ||
| deletedTsCheckForCreate.append(", "); | ||
| } | ||
| } |
There was a problem hiding this comment.
Is this for loop tested in unit testing at all? (is it easy to test?)
There was a problem hiding this comment.
good callout. this code will be used when more than asset with more than 1 aspect is being in created.
Updated test testCreateAfterAssetMarkedDeleted to create asset with 2 aspects and verify both aspect values with get.
sample vale from unit test:
INSERT INTO metadata_entity_foo (urn, a_urn, lastmodifiedon, lastmodifiedby,a_aspectfoo, a_aspectbar)
VALUES (:urn, :a_urn, :lastmodifiedon, :lastmodifiedby,:aspect0, :aspect1)
ON DUPLICATE KEY UPDATE a_aspectfoo = :aspect0, a_aspectbar = :aspect1, deleted_ts =
IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), NULL);
Yes, deleted_ts will be set to NULL and only the aspect being updated/created will be updated(!) If there were other aspects previously deleted they will continue to be marked deleted.
@jphui do you mean the upsert check that is done at the service level based on annotations? |
Yeah more of a sanity check than anything but for cases where upsert is disabled (?) this soft-deletion impl will work properly? |
| * @return a map of the deleted aspects (their value before deletion), each wrapped in an instance of | ||
| * {@link ASPECT_UNION} | ||
| */ | ||
| protected abstract Map<Class<? extends RecordTemplate>, Optional<? extends RecordTemplate>> permanentDelete(@Nonnull URN urn, | ||
| @Nonnull Set<Class<? extends RecordTemplate>> aspectClasses, @Nullable AuditStamp auditStamp, | ||
| int maxTransactionRetry, @Nullable IngestionTrackingContext trackingContext, boolean isTestMode); | ||
| protected abstract int permanentDelete(@Nonnull URN urn, boolean isTestMode); |
There was a problem hiding this comment.
what's the reasoning behind changing the return type from object to int? what does this help achieve?
There was a problem hiding this comment.
now, in permanent delete we will only be marking the asset as deleted by setting deleted_ts=NOW(). actual aspect level (sofl) deletion happens in BaseLocalDao.deleteAll() and the map of aspect objects will be returned from there.
here int is for retuning how many db rows were updated (techincally will olways be 1 since 1row=1asset record).
| // do a get to ensure the urn does not already exist | ||
| // if exists and deletedTs is null, then throw an exception | ||
| // if exists and deletedTs is not null, then update the deletedTs to null and create records |
There was a problem hiding this comment.
thanks for the inline doc! is this just explaining what the next line "create" will do?
There was a problem hiding this comment.
yes, describing high level behavior for create.
i added some comments closer to impl in EBeanLocalAccess.create
| * @param oldTimestamp old time stamp for optimistic lock checking | ||
| * @param ingestionTrackingContext the ingestionTrackingContext of the MCE responsible for calling this update | ||
| * @param isTestMode whether the test mode is enabled or not | ||
| * @param softDeleteOverwrite whether to overwrite soft deleted aspects marked with $gma_deleted |
There was a problem hiding this comment.
What happens if this is set to false? What is the expected behavior at the db level?
Could you add some documentation here clarifying this?
There was a problem hiding this comment.
thanks for pointing out. i've updated documentation.
i am using softDeleteOverwrite to main same old behavior for old schema. softDeleteOverwrite=false for old schema since we dont have deleted_ts on old schema.
all soft deletion related code is in new schema only since we have added the deleted_ts column only in new schema tables.
| // deleted_ts check on create Statement SQL. This is used to set the deleted_ts to a non-null value | ||
| // If a record that is NOT marked for deletion is attempted to be created again, an exception will be thrown | ||
| public static final String DELETED_TS_DUPLICATE_KEY_CHECK = "ON DUPLICATE KEY UPDATE "; | ||
| public static final String DELETED_TS_CONDITIONAL_VALUE_SET = ", deleted_ts = IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), NULL);"; |
There was a problem hiding this comment.
ah this might be the answer to my earlier question about the interaction with upsert?
|
|
||
| private static final String SQL_UPDATE_ASPECT_TEMPLATE_WITH_SOFT_DELETE_OVERWRITE = | ||
| "UPDATE %s SET %s = :metadata, lastmodifiedon = :lastmodifiedon, lastmodifiedby = :lastmodifiedby " | ||
| + "WHERE urn = :urn ;"; |
There was a problem hiding this comment.
do you need single quotes for the urn like before? or is this different because of the :?
There was a problem hiding this comment.
single quotes not needed for this parameterized sql string. we do a sqlUpdate.setParameter("urn", urn.toString())
| * @param aspectClass aspect class | ||
| * @param isTestMode whether the test mode is enabled or not | ||
| * Create Update with optimistic locking SQL statement. The SQL UPDATE use old_timestamp as a compareAndSet to check | ||
| * if the current update is made on an unchange record. For example: UPDATE table WHERE modifiedon = :oldTimestamp. |
There was a problem hiding this comment.
if the current update is made on an unchange record
What does this mean? That the update will not be made if the record is unchanged?
There was a problem hiding this comment.
umm not sure. the comment is not changed by me.
| if (softDeleteOverwrite) { | ||
| return String.format(urnExtraction ? SQL_UPDATE_ASPECT_WITH_URN_TEMPLATE_WITH_SOFT_DELETE_OVERWRITE | ||
| : SQL_UPDATE_ASPECT_TEMPLATE_WITH_SOFT_DELETE_OVERWRITE, tableName, columnName, columnName, columnName); | ||
| } else { | ||
| return String.format(urnExtraction ? SQL_UPDATE_ASPECT_WITH_URN_TEMPLATE : SQL_UPDATE_ASPECT_TEMPLATE, tableName, | ||
| columnName, columnName, columnName); | ||
| } |
There was a problem hiding this comment.
Based on the differences, SOFT_DELETE does not set deleted_ts to NULL.
Is this basically saying that if softDeleteOverwrite is enabled, it will "not un-delete" something by NOT modifying the deleted_ts column?
There was a problem hiding this comment.
i am using softDeleteOverwrite to differentiate between updates in old and new schema.
i have added deleted_ts column in all new schema tables.
since we have migrated away from old schema i am keeping the previous impl as is for old schema. old schema does not have deleted_ts because it was like a versioned log. we can clean up this conditional when we remove all the old schema code.
| // deleted_ts check on create Statement SQL. This is used to set the deleted_ts to a non-null value | ||
| // If a record that is NOT marked for deletion is attempted to be created again, an exception will be thrown | ||
| public static final String DELETED_TS_DUPLICATE_KEY_CHECK = "ON DUPLICATE KEY UPDATE "; | ||
| public static final String DELETED_TS_CONDITIONAL_VALUE_SET = ", deleted_ts = IF(deleted_ts IS NULL, CAST('DuplicateKeyException' AS UNSIGNED), NULL);"; |
There was a problem hiding this comment.
What does CAST('DuplicateKeyException' AS UNSIGNED) do?
There was a problem hiding this comment.
i am using it to throw an exception from the IF block.
IF deleted_ts in not null (asset not deleted) and we try to Create asset with same urn, then throw an exception.
There was a problem hiding this comment.
Is there a test for test Create before asset marked deleted to test the "throws error if create() is run on something that has deleted_ts = null?
There was a problem hiding this comment.
I have updated the test testCreateDuplicateAsset in EBeanLocalAccessTest : https://github.com/linkedin/datahub-gma/pull/549/files#diff-63658517b46ef7b2ad5215d35c53107f0938dbe4aaf4379d18209496bebc356cR476
jphui
left a comment
There was a problem hiding this comment.
Thanks for addressing comments and clarifying!
This is a relatively major change, make sure to give a heads up to the team for what commits to metadata-models contain this :)
Will do my best to take into account how this will affect (if at all) my soft-delete impl (#534)
This is PR#3 in a series of PRs which will contain code changes to support usage of deleted_ts column in DAO methods for read and write paths.
Prev:
PR#1 - #539
PR#2 - #544
Summary
Testing Done
Checklist