-
Notifications
You must be signed in to change notification settings - Fork 10
feat: implement retryable state and preserve message context on failure #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
📝 WalkthroughWalkthroughThis PR introduces a Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Coverage: 88.74% → 89.99% (+1.25%) |
- Preserve message_event_id, epoch, and mls_group_id when transitioning to Failed state - Allow reprocessing of messages in Retryable state - Update message processing logic to handle retry scenarios - Fix message state persistence in storage implementations
fd4fc49 to
067ca95
Compare
✅ Coverage: 88.74% → 90.68% (+1.94%) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/mdk-core/src/messages.rs (1)
1787-1804: Preserveepoch/mls_group_idfrom the existing record when transitioning to Failed.
Right now onlymessage_event_idis preserved;epochandmls_group_idare overwritten with the current group values. For wrong-epoch/mismatch paths this records the wrong context and defeats the preservation goal. Consider reusing existing values with fallback to the current group.💡 Suggested pattern (apply to the failure branches above)
- let message_event_id = existing_record.as_ref().and_then(|r| r.message_event_id); + let message_event_id = existing_record.as_ref().and_then(|r| r.message_event_id); + let mls_group_id = existing_record + .as_ref() + .and_then(|r| r.mls_group_id.clone()) + .or_else(|| Some(group.mls_group_id.clone())); + let epoch = existing_record + .as_ref() + .and_then(|r| r.epoch) + .or_else(|| Some(group.epoch)); ... - epoch: Some(group.epoch), - mls_group_id: Some(group.mls_group_id.clone()), + epoch, + mls_group_id,Also applies to: 1839-1856, 1868-1885, 1897-1914, 1939-1956
🤖 Fix all issues with AI agents
In `@crates/mdk-core/src/messages.rs`:
- Around line 1615-1656: When marking a Retryable processed_message as
MessageState::Processed (in the retry branch around
processed_message.message_event_id handling), also clear/overwrite its prior
failure metadata: set processed_message.failure_reason to None (or empty) and
update processed_message.processed_at to the current timestamp before calling
self.storage().save_processed_message(processed_message.clone()). Likewise
ensure the corresponding stored message state is updated to
MessageState::Processed (already done via message.state) and persisted with
save_message; use the same timestamp source you use elsewhere (e.g., Utc::now())
so downstream consumers do not see stale failure_reason/processed_at when
ProcessedMessageState::Processed is stored.
🧹 Nitpick comments (1)
crates/mdk-memory-storage/src/messages.rs (1)
751-847: Moveusestatements to module scope (guideline).
In-function imports at Lines 753, 800, and 814 violate the “use statements at top of scope” rule. Consider hoisting them to the test module level.As per coding guidelines, keep `use` statements at the top of their scope.♻️ Proposed refactor
#[cfg(test)] mod tests { use std::collections::BTreeSet; use mdk_storage_traits::groups::GroupStorage; use mdk_storage_traits::groups::types::{Group, GroupState}; + use mdk_storage_traits::messages::error::MessageError; + use mdk_storage_traits::messages::types::ProcessedMessage; use nostr::Keys; use super::*; @@ #[test] fn test_mark_processed_message_retryable() { - use mdk_storage_traits::messages::types::ProcessedMessage; - let storage = MdkMemoryStorage::new(); @@ #[test] fn test_mark_nonexistent_message_retryable_fails() { - use mdk_storage_traits::messages::error::MessageError; - let storage = MdkMemoryStorage::new(); @@ #[test] fn test_mark_non_failed_message_retryable_fails() { - use mdk_storage_traits::messages::error::MessageError; - use mdk_storage_traits::messages::types::ProcessedMessage; - let storage = MdkMemoryStorage::new();
| message_types::ProcessedMessageState::Retryable => { | ||
| // Retryable messages are ones that previously failed due to wrong epoch keys | ||
| // but have been marked for retry after a rollback. For our own messages, | ||
| // we should have cached content - try to retrieve and return it. | ||
| tracing::debug!(target: "mdk_core::messages::process_message", "Retrying own message after rollback"); | ||
|
|
||
| if let Some(message_event_id) = processed_message.message_event_id | ||
| && let Ok(Some(mut message)) = | ||
| self.get_message(&group.mls_group_id, &message_event_id) | ||
| { | ||
| // Update states to mark as successfully processed | ||
| message.state = message_types::MessageState::Processed; | ||
| self.storage() | ||
| .save_message(message) | ||
| .map_err(|e| Error::Message(e.to_string()))?; | ||
|
|
||
| processed_message.state = | ||
| message_types::ProcessedMessageState::Processed; | ||
| self.storage() | ||
| .save_processed_message(processed_message.clone()) | ||
| .map_err(|e| Error::Message(e.to_string()))?; | ||
|
|
||
| tracing::info!( | ||
| target: "mdk_core::messages::process_message", | ||
| "Successfully retried own cached message after rollback" | ||
| ); | ||
| let message = self | ||
| .get_message(&group.mls_group_id, &message_event_id)? | ||
| .ok_or(Error::MessageNotFound)?; | ||
| return Ok(MessageProcessingResult::ApplicationMessage(message)); | ||
| } | ||
|
|
||
| // No cached content available - this shouldn't happen for our own messages, | ||
| // but if it does, we can't recover | ||
| tracing::warn!( | ||
| target: "mdk_core::messages::process_message", | ||
| "Retryable own message has no cached content - cannot recover" | ||
| ); | ||
| Ok(MessageProcessingResult::Unprocessable { | ||
| mls_group_id: group.mls_group_id.clone(), | ||
| }) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clear failure metadata when a retry succeeds.
When a Retryable message is marked Processed, the prior failure_reason is left intact (and processed_at stays stale). That leaves contradictory state for downstream consumers.
💡 Suggested fix
- processed_message.state =
- message_types::ProcessedMessageState::Processed;
+ processed_message.state =
+ message_types::ProcessedMessageState::Processed;
+ processed_message.failure_reason = None;
+ processed_message.processed_at = Timestamp::now();🤖 Prompt for AI Agents
In `@crates/mdk-core/src/messages.rs` around lines 1615 - 1656, When marking a
Retryable processed_message as MessageState::Processed (in the retry branch
around processed_message.message_event_id handling), also clear/overwrite its
prior failure metadata: set processed_message.failure_reason to None (or empty)
and update processed_message.processed_at to the current timestamp before
calling self.storage().save_processed_message(processed_message.clone()).
Likewise ensure the corresponding stored message state is updated to
MessageState::Processed (already done via message.state) and persisted with
save_message; use the same timestamp source you use elsewhere (e.g., Utc::now())
so downstream consumers do not see stale failure_reason/processed_at when
ProcessedMessageState::Processed is stored.
Summary
This PR implements better handling for retryable message states and ensures message context (like
message_event_id) is preserved when messages fail processing.Changes
process_messageinmdk-coreto look up existing records and preservemessage_event_id,epoch, andmls_group_idwhen creating aFailedstate record.Retryable.Reasoning
Previously, failing messages might lose context about their origin, making it difficult to retry them correctly or correlate them with the original send event. This change ensures that even if processing fails, we keep the link to the original message event.
This PR implements message context preservation for failed message processing and introduces a new
Retryablemessage state to the MDK protocol. When message processing fails, the code now retains the originalmessage_event_id,epoch, andmls_group_idso that retries and correlation with original send events remain possible. This enables the system to reprocess messages marked asRetryablewithout losing critical metadata.What changed:
Retryablevariant added toProcessedMessageStateenum with corresponding string serialization support (mdk-storage-traits).mark_processed_message_retryable()method to transition failed messages to Retryable state while preserving failure context.NotFounderror variant added toMessageErrorenum for cases where a message does not exist or is not in the expected state (mdk-storage-traits).API surface:
ProcessedMessageState::Retryable(mdk-storage-traits).MessageStorage::mark_processed_message_retryable(&self, event_id: &EventId) -> Result<(), MessageError>(mdk-storage-traits).MessageError::NotFoundfor missing or incorrectly-state messages (mdk-storage-traits).Testing:
message_event_id,epoch, andmls_group_id.mark_processed_message_retryable()succeeds only for messages in Failed state and returnsNotFoundfor missing or non-Failed messages.