- 
                Notifications
    You must be signed in to change notification settings 
- Fork 12k
RIP 20 Streaming Tiered Storage Optimize
Current State: Proposed
Shepherds: [email protected]
Mailing List Discussion: [email protected]
Pull Request: #PR_NUMBER
Released: <released_version>
Correlations: RIP-18 Metadata management architecture upgrade,RIP-11 Evolution of The Next Decade Architecture for RocketMQ, Thought of The Evolution of The Next Decade Architecture for RocketMQ
As user business increase ten times at peek times, such as breaking event happens/shopping festival, and decrease after the peek time to regular, which may lead to some issues
- We have to expand brokers resource in case of storage capacity bottleneck.
- We have to expand storage capacity in case of broker cpu bottleneck.
- After expand brokers/storage, we cant shrink these resources, which lead 9 times resource waste.
- Benchmark test dledger broker-cluster, when leader reach cpu bottleneck, follower only utilize 10% of cpu resource.
- If store one topic 1 year messages, all topics store 1 year
- If store one topic 1 year, local storage is expensive
- If 1 queue consume 100 times, lead to the leader node bottleneck, while other node idle
- When create new queues, it may choose the busiest broker
Implement the next generation rocketmq architechture, make the boundaries of compute and storage layer specific, support stream and tiered storage.
We will not directly implement multi-raft protocol on RocketMQ.
Nothing specific.
New architecture graph:
- Make clear compute and storage layer interface, focus on compute logic, such as transaction, delay, filter
- Handle produce/consume/admin(rocketmq_protocol) api
- Stateless
- Use the state-of-the-art Angelia rpc call storage read/write api
- Support dynamic add and shrink broker node
- Support read replica customized, crack hot queue read imblance problem
- Implement storage api for compute node
- Multi-raft implement
- Support hdfs/S3/GCP
- Support dynamic add and shrink storage node
- Storage support multi-tenant, like 100w queue
- Support topic-level retention time
- Hash(queue) to a assigned raft group, which broker determine to call the queue-leader node
- Detect storage node failure, notify to broker update queueTostorageNode map
- Implement a intellgent module, which use machine learning to decide new queue assign to which node, and detect unhealthy node.
- Support multi communication protocol such as tcp, http2
- Support multi serilization protocol such as json, Avro
compute layer
SendMessageProcessor#asyncSendMessage
    this.brokerController.getMessageStore().asyncPutMessage(msgInner);
storage layer
DefaultMessageStore#putMessage
    1)commitLog storage
	CommitLog#asyncPutMessage
		mappedFile.appendMessage(msg, this.appendMessageCallback);
    2)Dledger storage
    DLedgerCommitLog#asyncPutMessage
        #io.openmessaging.storage.dledger.DLedgerServer
        dLedgerServer.handleAppend(request);         
            dLedgerStore.appendAsLeader(dLedgerEntry);
            return dLedgerEntryPusher.waitAck(resEntry, false);
    
    3)5.0 seperate storage
    SeperateStorage#asyncPutMessage
        client:appendMessage()  #wait for leader and follower write half nodes
Public interface MessageStore {
#basic api
putMessage(final MessageExtBrokerInner msg)
flush()
getMessage(final long offset, final int size)
...
#startup and cache for storage node
load()
...
#failover
recover(long maxPhyOffsetOfConsumeQueue)
...
#manage api
getMaxOffset()
...
for si
- 
Are backward and forward compatibility taken into consideration? the old producer/consumer/admin protocol not change, remains backward compatible 
- 
Are there deprecated APIs? Remove deprecated pull consumer apis. 
- How does alternatives solve the issue you proposed?
Use bookkeeper as stream storage engine
- Pros and Cons of alternatives
The advantages and disadvantages of using bookkeeper are as follows
Advantage: Simple implementation and short development cycle
Disadvantages: bookkeeper use zookkeeper store ledger metadata, The introduction of third-party components requires maintenance of two systems, add maintaince complexity
The introduction of third-party components is always avoided by RocketMQ, so the proposal of using bookkeeper(which depends zookkeeper) as metadata storage is not adopted.
Copyright © 2016~2022 The Apache Software Foundation.
- Home
- RocketMQ Improvement Proposal
- User Guide
- Community
