-
Notifications
You must be signed in to change notification settings - Fork 108
[ReplicatedLoglet] Implement remote sequencer find tail #2017
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -20,15 +20,17 @@ use tokio::sync::{mpsc, Mutex, OwnedSemaphorePermit, Semaphore}; | |
|
||
use restate_core::{ | ||
network::{ | ||
rpc_router::{RpcRouter, RpcToken}, | ||
rpc_router::{RpcError, RpcRouter, RpcToken}, | ||
NetworkError, NetworkSendError, Networking, Outgoing, TransportConnect, WeakConnection, | ||
}, | ||
task_center, ShutdownError, TaskKind, | ||
}; | ||
use restate_types::{ | ||
config::Configuration, | ||
logs::{metadata::SegmentIndex, LogId, LogletOffset, Record}, | ||
net::replicated_loglet::{Append, Appended, CommonRequestHeader, SequencerStatus}, | ||
logs::{metadata::SegmentIndex, LogId, LogletOffset, Record, SequenceNumber, TailState}, | ||
net::replicated_loglet::{ | ||
Append, Appended, CommonRequestHeader, GetSequencerInfo, SequencerStatus, | ||
}, | ||
replicated_loglet::ReplicatedLogletParams, | ||
GenerationalNodeId, | ||
}; | ||
|
@@ -205,6 +207,76 @@ where | |
|
||
Ok(connection) | ||
} | ||
|
||
/// Attempts to find tail. | ||
/// | ||
/// This first tries to find tail by synchronizing with sequencer. If this failed | ||
/// duo to sequencer not reachable, it will immediately try to find tail by querying | ||
/// fmajority of loglet servers | ||
pub async fn find_tail(&self) -> Result<TailState<LogletOffset>, OperationError> { | ||
// try to sync with sequencer | ||
if self.sync_sequencer_tail().await.is_ok() { | ||
return Ok(*self.known_global_tail.get()); | ||
} | ||
|
||
// otherwise we need to try to fetch this from the log servers. | ||
self.sync_log_servers_tail().await?; | ||
Ok(*self.known_global_tail.get()) | ||
} | ||
|
||
/// Synchronize known_global_tail with the sequencer | ||
async fn sync_sequencer_tail(&self) -> Result<(), NetworkError> { | ||
let result = self | ||
.sequencers_rpc | ||
.info | ||
.call( | ||
&self.networking, | ||
self.params.sequencer, | ||
GetSequencerInfo { | ||
header: CommonRequestHeader { | ||
log_id: self.log_id, | ||
loglet_id: self.params.loglet_id, | ||
segment_index: self.segment_index, | ||
}, | ||
}, | ||
) | ||
.await | ||
Comment on lines
+232
to
+243
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What happens if the request or response get lost? Would There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, that's definitely something that need to be handled. I have paused working on this draft PR in favour of #2019 and wasn't sure if I should continue on this one. |
||
.map(|incoming| incoming.into_body()); | ||
|
||
let info = match result { | ||
Ok(info) => info, | ||
Err(RpcError::Shutdown(shutdown)) => return Err(NetworkError::Shutdown(shutdown)), | ||
Err(RpcError::SendError(err)) => return Err(err.source), | ||
}; | ||
|
||
match info.header.status { | ||
SequencerStatus::Ok => { | ||
// update header info | ||
if let Some(offset) = info.header.known_global_tail { | ||
self.known_global_tail.notify_offset_update(offset); | ||
} | ||
} | ||
SequencerStatus::Sealed => { | ||
self.known_global_tail.notify( | ||
true, | ||
info.header | ||
.known_global_tail | ||
.unwrap_or(LogletOffset::INVALID), | ||
); | ||
} | ||
_ => { | ||
unreachable!() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe state why this case is unreachable. Maybe also don't use the wildcard. That way we'll see the places where a newly There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point. Thank you :) |
||
} | ||
}; | ||
|
||
Ok(()) | ||
} | ||
|
||
/// A fallback mechanism in case sequencer is not available | ||
/// to try and sync known_global_tail with fmajority of LogServers | ||
async fn sync_log_servers_tail(&self) -> Result<(), OperationError> { | ||
todo!() | ||
} | ||
Comment on lines
+277
to
+279
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's where #2019 comes into play, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
} | ||
|
||
/// RemoteSequencerConnection represents a single open connection | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there value in racing these two variants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I get your question here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether we should run both variants for obtaining the known global tail in parallel/concurrently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarification. Yeah, that's definitely a good idea.