-
Notifications
You must be signed in to change notification settings - Fork 22
topological order for deletes #439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
topological order for deletes #439
Conversation
heyronhay
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of suggestions!
datadog_sync/utils/resource_utils.py
Outdated
| sorter.prepare() | ||
| # If prepare() succeeds, no cycles | ||
| return None | ||
| except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like an overly broad catch, is there a more specific error we can catch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, fixed to pull in specifically the CycleError
| await r_class._send_action_metrics("delete", _id, Status.FAILURE.value) | ||
| self.config.logger.error(f"error deleting resource {resource_type} with id {_id}: {str(e)}") | ||
| finally: | ||
| # Mark as done in cleanup sorter if it exists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to test if it was actually deleted successfully first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yes, but I think that's a bigger restructure. Calling done(q_item) frees up the sorter so it doesn't hang forever. If we have a dashboard->monitor->slo dependency chain and the dashboard fails to get deleted then the calls to delete the monitor and slo will still happen (and will both fail). That's no different from today. To actively skip those other delete calls we need to start tracking every dependency chain I think. It would be better of course, just a bigger change.
What does this PR do?
This fix is to delete items in the correct order. We've always created them in the correct order, for example if a Dashboard depends on an SLO being there we create the SLO before creating the Dashboard. The opposite wasn't true though, we didn't always delete the Dashboard before deleting the SLO.
Description of the Change
Topological sort for deleting things.