-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Currently each commit involves a write of transaction entries to the transaction log(s). This is accomplished with a write mutex surrounding a writev call to the transaction log file. Based on my performance testing, I would estimate that the writev calls are taking about 5μs each. And since these are in a mutex, they must be performed sequentially (regardless of how many threads we are using), limiting our transaction log performance to roughly 200K writes / second. There are some applications with write requirements closer to a million writes / second, so eventually improving this performance would be beneficial.
One possible optimization is to move the writev call outside the write mutex, and only "stage" the writes in the mutex. We would need to eliminate the O_APPEND flag and do writes with exact offsets so that multiple writev calls could be done in parallel. However, I would guess this is simply going to move the contention into the kernel, and may not significantly improve performance (although possibly worth trying).
I believe it is more likely that higher performance could be achieved by batching parallel transaction writes. Here is rough outline of a proposed commit process with transaction log write batching:
- On the start of the commit, queue all of the transaction entries to be written into a queue in TransactionLogStore (using a mutex for this queue). Record the ending position of these entries once they are written to the transaction log (computed from the size of entries and existing queued entries).
- Perform the RocksDB commit.
- Check to see if the transaction log has written this commit's queued entries. If so, commit is done.
- If not, acquire write mutex.
- Again, check to see commit's queued entries have been written and if so, commit is done. Otherwise, construct an iovecs of all queued entries (from this commit and any other commits that have entries in the store queue), and then remove all the entries transferred to the iovecs. Then perform the
writevcall for all the queued entries.