-
Notifications
You must be signed in to change notification settings - Fork 24
Add interface to add client data rpc #846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add interface to add client data rpc #846
Conversation
bf220c9 to
b069b44
Compare
| // becoming a leader. | ||
|
|
||
| RD_LOGD(NO_TRACE_ID, "become_leader_cb: setting traffic_ready_lsn from {} to {}", current_gate, new_gate); | ||
| m_listener->on_become_leader(m_group_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious about the purpose of on_become_xxx. Should we move it to the beginning of become_xxx_cb to call the upper callback first, similar to how handle_commit does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on_become_leader and on_become_follower will be used for homeobject scrubbing. when scrubbing is on-going, if leader switch happens, the old leader will become follower and thus on_become_follower will be called on this node, where we stop the scrubbing thread that will request scrub results from the other two members. one of the follower will become leader and thus on_become_leader will be called on this node, where we will read the scrub superblk and start the scrubbing thread to request scrub result from the other two memebers.
Should we move it to the beginning
no need to do this now, what we want is to make sure that we can be notified with the leader change event, it does not matter it is called in the beginning or end.
|
Can we do a POC regarding the required communication primitives We can to the POC of HO in HS UT (which is easier to implement RepldevListener), that gives team better understanding the requirements |
|
I am now working on the POC code with this change to see if this is workable |
b069b44 to
2ee228f
Compare
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #846 +/- ##
==========================================
- Coverage 56.51% 48.48% -8.03%
==========================================
Files 108 110 +2
Lines 10300 12772 +2472
Branches 1402 6137 +4735
==========================================
+ Hits 5821 6193 +372
+ Misses 3894 2493 -1401
- Partials 585 4086 +3501 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
d8fa192 to
0915f3e
Compare
0915f3e to
5ec0d54
Compare
|
this PR add three apis into repl_dev(exposes nuraft_mesg data service to upper layer) : I have written a POC to test this in homeobject layer, where the deserialized VSM are transfered through the data channel and then compared with leader . please see you can build homestore using this PR to create a local homestore conan repo and then use the following branch to build homeobject , and then run homestore_scrubber_test to see how it works. @xiaoxichen @Besroy any suggestion for adding this three api? |
| data_service_request_handler_t const& request_handler) = 0; | ||
|
|
||
| // send a unidirectional data rpc to dest with request_name and cli_buf | ||
| virtual nuraft_mesg::NullAsyncResult data_request_unidirectional(nuraft_mesg::destination_t const& dest, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a generic interface we probably cannot use nuraft_mesg::XXXX,
we can use either
replica_member_info
or
replica_id_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using destination_t = std::variant< peer_id_t, role_regex, svr_id_t >
destination_t support several type of input params, this is more adaptive,
maybe other upper layer(not homeobject) wants to use svr_id_t or role_regex .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can re-encapsulate or redefine a type that is fine... but not exposing a nuraft type. it breaks the boundary.
HO doesnt need to care if HS is using nuraft_mesg for data_request_unidirectional, or use its own grpc, or use http, tcp , ftp, whatever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree and actually I have already tried to expose a clean api, but it is a little complicated.
coming to (http, tcp , ftp, whatever.), when failing to transfer data, we need to tell the upper layer the detailed failure info(why fails). so IMO, no need to do such a high level re-encapsulatation. since we actually use grpc to transfer data in the bottom layer, when grpc error happens, we should return this error to upper layer so that it can have a clear idea why grpc fails.
however, in nuraft_mesg, data_service_request_bidirectional never return a grpc error, and always change a grpc error to nuraft error with grpc_status_to_nuraft_code, and I am confused about the necessity here. As a result, at homestore level , when error happens, we can only get the error of a nuraft type for data_service_request_bidirectional . so if we do encapsulation here, we have three choices:
-
do not re-encapsulate, return directly as it is.
-
add a new function to convert nuraft error to grpc error, but this will miss somed detailed information. for example.
case ::grpc::StatusCode::INVALID_ARGUMENT:
case ::grpc::StatusCode::UNIMPLEMENTED:
case ::grpc::StatusCode::UNAUTHENTICATED:
case ::grpc::StatusCode::PERMISSION_DENIED:
case ::grpc::StatusCode::RESOURCE_EXHAUSTED:
case ::grpc::StatusCode::OUT_OF_RANGE:
return nuraft::cmd_result_code::BAD_REQUEST;
if we want to convert nuraft::cmd_result_code::BAD_REQUEST to grpc error , which grpc should we chose.
3 add a rude convert, which will lump all the nuraft error types under a single umbrella error type with no detailed info. then the upper lay can only know this transfer failure, but have no idea why it fails.
for now, I chose 1, although it looks not that ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well each layer is expected to lost some information , for example when you calling an HTTP service got a 500, how you expect to know the detail error inside the 500, is that GW cannot talk to Nudata? or GW cannot reach DM?
Another example you got EIO when your disk IO fail, the specific device error code is hidden. .
When an underlying error is important , then it is time to create a specific error mapping, for that error. We can add a log line in the error converter if needed.
It is even fine we use same type and directly return what upstream returns, but just , dont directly expose the upstream type .
|
|
||
| bool RaftReplDev::add_data_rpc_service(std::string const& request_name, | ||
| data_service_request_handler_t const& request_handler) { | ||
| return m_msg_mgr.bind_data_service_request(request_name, m_group_id, request_handler); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure, kind of concerning it will be a problem during shutdown, the HO will be deconstructed first then the handler is invalid.
It is not major atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good point. but actually , we call homestore::shutdown in HSHomeObject::shutdown(), this means Homeobject is not destructed until homestore shutdown return.
expose nuraft_message to upper layer, so that they can register their own rpc call by calling
bind_data_service_request