Skip to content

[server][improve] Add WAL cache to optimize replication.#794

Open
dao-jun wants to merge 8 commits intooxia-db:mainfrom
dao-jun:dev/add_wal_cache
Open

[server][improve] Add WAL cache to optimize replication.#794
dao-jun wants to merge 8 commits intooxia-db:mainfrom
dao-jun:dev/add_wal_cache

Conversation

@dao-jun
Copy link
Contributor

@dao-jun dao-jun commented Oct 27, 2025

Add WAL LogEntry cache to improve replication.
Bypass page-cache and eliminate deserialization overhead when tailing-read the WAL.

Under ideal conditions, the oxia_server_wal_read_latency_milliseconds_sum can be 0.
image

Test the WAL via wal-perf
before:
image
after:
image

The Read/Write throughput increase about 32%

Signed-off-by: dao-jun <daojun@apache.org>
Signed-off-by: dao-jun <daojun@apache.org>
Signed-off-by: dao-jun <daojun@apache.org>
Signed-off-by: dao-jun <daojun@apache.org>
Signed-off-by: dao-jun <daojun@apache.org>
Copy link
Collaborator

@merlimat merlimat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a cool addition. just a couple of comments

)

const (
logEntryCacheSize int = 32
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better if we could set weighter and use max number of bytes instead of entries here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved. But I don't understand what does set weighter mean, could you please explain in detail?

Signed-off-by: dao-jun <daojun@apache.org>
Signed-off-by: dao-jun <daojun@apache.org>
@dao-jun
Copy link
Contributor Author

dao-jun commented Oct 28, 2025

image

Perf test after address review comments, the performance is still OK

Copy link
Member

@mattisonchao mattisonchao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

It would be better if you could consider this.

Oxia is a sharding systems. we need to consider more of cache memory control. currently, every shard has their own WAL. and I am not sure if we will go or when we will go for sharding WAL(IMO, we should go to avoid mMap cost by many opened segment). but anyway, we should pay attention on the cost.

  • We need a global cache to avoid shards_num * 2MB. (100 shards = 200MiB, 1_000 shards ~ 2GiB). this is still very useful when we migrate to sharding WAL.

@dao-jun
Copy link
Contributor Author

dao-jun commented Oct 28, 2025

LGTM +1

It would be better if you could consider this.

Oxia is a sharding systems. we need to consider more of cache memory control. currently, every shard has their own WAL. and I am not sure if we will go or when we will go for sharding WAL(IMO, we should go to avoid mMap cost by many opened segment). but anyway, we should pay attention on the cost.

  • We need a global cache to avoid shards_num * 2MB. (100 shards = 200MiB, 1_000 shards ~ 2GiB). this is still very useful when we migrate to sharding WAL.

I've considered this. Even in future single WAL instances, we still strive to distribute memory evenly among each shard. Otherwise, there will still be cache penetration, which is not much different from the current implementation

@mattisonchao
Copy link
Member

mattisonchao commented Oct 28, 2025

I've considered this. Even in future single WAL instances, we still strive to distribute memory evenly among each shard. Otherwise, there will still be cache penetration, which is not much different from the current implementation

well... after deep thinking. I think the write cache could not help us very much in this case. because reader always happened after the data sync. If the write traffic is very large, the 2MiB buffer will never work as expected.

Indeed, your benchmark proof some improvement, but that logic is different as oxia. let me change some logics to make it match the implementation of Oxia. Plus, you could also use your implementation in the cluster benchmarking to see if any improvement on oxia_server_wal_read_latency_milliseconds_sum .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants