Skip to content

Conversation

@Mantisus
Copy link
Collaborator

@Mantisus Mantisus commented Nov 11, 2025

Description

  • This PR adds new buffer tables to improve handling of metadata records. The key change is that updates to metadata are now accumulated in buffer and applied when get_metadata is called. With the old behavior, metadata records were updated instantly within a transaction. This led to waiting for locks to be released in high-concurrency situations.

Issues

@Mantisus Mantisus self-assigned this Nov 11, 2025
Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please PR type without an exclamation mark

@Mantisus Mantisus changed the title perf!: Optimize metadata records processing in 'SqlStorageClient` perf: Optimize metadata records processing in 'SqlStorageClient` Nov 11, 2025
@Mantisus Mantisus marked this pull request as ready for review November 17, 2025 14:03
@Mantisus Mantisus requested review from janbuchar and vdusek November 17, 2025 14:03
@janbuchar
Copy link
Collaborator

Interesting! I'd imagine that transactions consisting of e.g., an insertion to the dataset_items table and an update to dataset metadata wouldn't lock the metadata table for that long - you can commit right after the update to metadata.

Also, the buffering approach is faster because the buffer table gets a row for each increment and those get compacted later on, correct?

@Mantisus
Copy link
Collaborator Author

update to dataset metadata wouldn't lock the metadata table for that long

They will create many short-lived locks. And with a large number of clients with high concurrency inserting new records, this effect will accumulate.
This is exactly what @ericvg97 pointed out - #1533 (comment)

Although, of course, the strongest impact on RequestQueue

Yes, insert operations into the buffer table are quite fast. And then we can simply apply the result of the aggregations to update the metadata record.

@Mantisus Mantisus changed the title perf: Optimize metadata records processing in 'SqlStorageClient` perf: Optimize metadata records processing in SqlStorageClient Nov 18, 2025
@janbuchar
Copy link
Collaborator

update to dataset metadata wouldn't lock the metadata table for that long

They will create many short-lived locks. And with a large number of clients with high concurrency inserting new records, this effect will accumulate. This is exactly what @ericvg97 pointed out - #1533 (comment)

Although, of course, the strongest impact on RequestQueue

I see, thanks. And is there any chance that the lock is held for too long because of how we work with sqlalchemy? In other words, would it be better if we just executed sql such as insert ...; update ...; commit in one go? If yes, it might be worth trying before adding three new tables to the whole thing.

@Mantisus
Copy link
Collaborator Author

it might be worth trying before adding three new tables to the whole thing.

I will test this approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Updating request queue metada performs full table scan in SQL storage

3 participants