Skip to content

Replace LOCK TABLE statements with data-modifying CTEs in the stored procedures used by the PostgreSQL storage backend#3757

Closed
robstradling wants to merge 1 commit intogoogle:masterfrom
robstradling:replace_table_locks_with_data_modifying_ctes
Closed

Replace LOCK TABLE statements with data-modifying CTEs in the stored procedures used by the PostgreSQL storage backend#3757
robstradling wants to merge 1 commit intogoogle:masterfrom
robstradling:replace_table_locks_with_data_modifying_ctes

Conversation

@robstradling
Copy link
Contributor

The LOCK TABLE statement pairs in the two PostgreSQL functions at https://github.com/google/trillian/blob/master/storage/postgresql/schema/storage.sql#L163 are intended to ensure that each function call is an atomic operation, which is necessary for the duplicates detection in both functions to be guaranteed to work correctly. However, with our new logs that use the Trillian PostgreSQL storage backend, we've encountered a number of situations where database sessions seem to be deadlocking due to these LOCK TABLE statements. This is presumably because two consecutive LOCK TABLE statements are not an atomic operation.

In search of a better solution, TIL about data-modifying CTEs. The PR rewrites each of the two functions as a single SQL statement with data-modifying CTEs. Since each SQL statement is an atomic operation, this approach achieves the desired result without the same timing-related risk of locking other sessions.

Checklist

…procedures used by the PostgreSQL storage backend
@roger2hk
Copy link
Contributor

/gcbrun

@roger2hk
Copy link
Contributor

As this change involves both the database stored procedures and temp table create queries, is there any deployment procedure? Should the database stored procedures be updated before the application deployment?

@robstradling
Copy link
Contributor Author

I'm abandoning this PR. I was wrong that "database sessions seem to be deadlocking". Further research found that the problem was "dangling transactions" - i.e., where a transaction is left open on the DB even after the client has gone away. PR #3758 (also abandoned) describes my efforts to deal with that problem.

Also, although data-modifying CTEs are atomic in the sense of holding a consistent view of the database throughout each step of the CTE, I found that they don't force a consistent view of the relevant tables onto other sessions. This is the purpose of the explicit LOCK TABLE statements in the two postgres functions, and this table locking is necessary to ensure that the duplicates detection works correctly.

For my use case (CT), I realised that adding an optimisation for single-leaf batches would really help performance (see PR #3769).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants