feat: task logs for database operations#36
Conversation
mmols
left a comment
There was a problem hiding this comment.
This is working well for me - just a few questions.
| return nil, fmt.Errorf("failed to list tasks: %w", err) | ||
| } | ||
| for _, t := range tasks { | ||
| // We want to leave the delete task in place so that API clients can |
There was a problem hiding this comment.
I wonder if our default behavior should be to leave these tasks in the system, and handle task cleanup through a separate function / story where a retention is configurable.
If the user deletes a database, the current situation is that there is little context / logs other than the main Control Plane log.
Is there a technical reason to not keep older tasks around?
There was a problem hiding this comment.
I did this because it felt like a strange user experience to see all of the old tasks if you recreate a database with the same identifier - especially since there's no other mechanism to remove them. That said, I might be biased because I reuse identifiers in order to make development easier. I don't know if that will be common for end-users.
If the user deletes a database, the current situation is that there is little context / logs other than the main Control Plane log.
The way this works is that the delete task is left in place, but all other tasks are removed. You're still able to list and fetch tasks for the database, but you'll only see the delete task.
If you then recreate the database with the same ID, the delete task is removed before the database is created.
Is there a technical reason to not keep older tasks around?
The only technical reason to clean these up is that they consume Etcd capacity, which is limited. But, these don't take up much space relative to the workflow records, so I mainly did this for user experience.
There was a problem hiding this comment.
I thought more about this, and I've changed it so that tasks are not deleted when the database is deleted. They are still deleted on create, so it will clear out old tasks if you reuse a database identifier.
There was a problem hiding this comment.
That seems like a good compromise - I tested this and it meshes with what I'd expect. I think re-using the identifier won't be common for most users (compared to omitting it and having it generated).
| return a.TaskSvc.AddLogLine(ctx, databaseID, taskID, msg) | ||
| } | ||
|
|
||
| msg := fmt.Sprintf("%s %s on host %s", verb, resource.Identifier(), a.Config.HostID) |
There was a problem hiding this comment.
I think we can retrieve the resource ID once and reuse it in multiple places within the function, instead of calling resource.Identifier() repeatedly. What do you think?"
There was a problem hiding this comment.
Thanks! I'm refactoring this a little bit for the structured logging change, so I'll make sure to include this.
There was a problem hiding this comment.
I did this as part of the change to structured logging in ea72b00.
| func (a *Activities) LogTaskEvent(ctx context.Context, input *LogTaskEventInput) (*LogTaskEventOutput, error) { | ||
| logger := activity.Logger(ctx).With("database_id", input.DatabaseID.String()) | ||
| logger.Info("updating database state") | ||
|
|
There was a problem hiding this comment.
We could add an early return to skip processing if len(input.Messages) == 0 to avoid unnecessary work. What do you think?
There was a problem hiding this comment.
Sure! That makes sense.
There was a problem hiding this comment.
I've added this check to the logTaskEvent function that invokes this activity in 1950037.
Also renames `initiate-database-backup` to `backup-database-node` to be consistent with other operations and to leave room for a potential `backup-database` operation. PLAT-33
Modifies database operation workflows to take a task ID and update the task status during processing. PLAT-33
Adds some already-existing fields to the task APIs and an `include_timestamp` option to the task log API. PLAT-33
Our task logs have a separate `timestamp` field, so disabling these timestamps will make the task logs easier to consume consistently. PLAT-33
92172f9 to
bba8a1b
Compare
Summary
With this PR, each database operation has an associated task that can be tracked via the API. I've modified each operation endpoint to return a task alongside their normal response.
I've divided it up into more granular commits that can (hopefully!) be reviewed individually.
Details
In order to make this behavior consistent for deletes, the delete task is not deleted when a database is deleted, and it can still be fetched via API after the database is deleted. This way, API clients wont' get a 404 if they're slow to request the delete task, and they can track the delete process through to completion. The delete task will, however, get deleted if you recreate a new database with the same ID.
The backup and restore operations already had tasks associated with them, so I've introduced the concept of "parent tasks". Those existing tasks now have a
parent_idthat's set to the main operation's task ID.I've also added optional timestamps to the task log API. You can enable them with a new
include_timestampsparameter.Since I was already making so many API changes (including adding a response type for the backup operation) I renamed
initiate-database-backuptobackup-database-nodeto be consistent with the other endpoints. This leaves room for a potentialbackup-databaseendpoint that backs up all nodes.Remember to run
restish api syncfor each control plane server to get the latest API changes in your CLI.PLAT-33