Skip to content

Conversation

@scotwells
Copy link
Contributor

Summary

This PR makes additional improvements to platform-wide querying performance, adjusts the clickhouse audit log schema to use the correct timestamp for the request, adds support for querying by the user's UID, and adjusts the user-scoped projection to use the user's UID value instead of the username.

Details

  • Filter by User's UID - Filtering by UID can be valuable to filter down to a specific user using a stable identifier instead of an email which can be changed by the user. UIDs are also only in place for users of the platform. Internal components that authenticate with certificates do not have UIDs. This gives us a clean way of filtering out internal components from audit logs.
  • Request Received Timestamp - I swapped to using the .requestReceivedTimestamp field of the audit log to represent the audit log's timestamp since it's the timestamp when the request was received by the apiserver. The .stageTimestamp is used by the collection pipeline to calculate delays in the pipeline because the timestamp indicates when the audit log was generated by the apiserver.
  • User UID for user scope - I swapped to using the user's UID as the filtering / sorting column when querying the audit log system through the user scope since the UID is the stable identifier for the user and is the value that's provided in the user's extra information.
  • Hourly timestamp buckets - Updated all projections to use the same hourly time bucketing introduced in fix: optimize performance for platform-wide querying #23.

Relates to datum-cloud/enhancements#536

When a user is requesting their own audit logs using a user scoped
query the system should be searching audit logs by the user's UID
instead of the username because username can change over time.

This change introduces a new projection that takes advantage of a new
`user_uid` field that was introduced in the table schema to support
performant queries that filter for a specific user ID.

I've also adjusted the projections to use the same hourly bucketing
approach used on the main table to see if we can get some additional
performance benefits.
This change allows users to filter audit log results by a user's UID
with the `user.uid` field.
The `requestReceivedTimestamp` indicates when the request was received
by the apiserver. This timestamp is what we should use as the audit log
timestamp since it indicates when the request was made by the client.

The `stageTimestamp` indicates when the stage of the request generated
the audit log entry which happens after the request is received. We
still use the `stageTimestamp` to measure delays in the collection
pipeline so we know how long it takes for a record to be collected from
the apiserver and sent to Clickhouse.
@scotwells scotwells merged commit 67ce864 into main Jan 15, 2026
4 checks passed
@scotwells scotwells deleted the feat/scope-user-by-id branch January 15, 2026 21:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants