Merged
Conversation
Untested though Also make the rpc_client generation makefile code easier to run
Open
Open
ChaosManager at Actor TBD Co-authored-by: Brian Sajeev <Brian-1402@users.noreply.github.com>
Co-authored-by: Brian Sajeev <Brian-1402@users.noreply.github.com>
satyamjay-iitd
previously requested changes
Aug 29, 2025
| Json(actor_info): Json<RemoteActorInfo>, | ||
| ) -> impl IntoResponse { | ||
| state | ||
| .clone() |
Owner
There was a problem hiding this comment.
is cloning required here?
node/src/rpc.rs
Outdated
| post, | ||
| path = "/stop_actor", | ||
| responses( | ||
| (status = 200, description = "Actor stop initiated") |
Owner
There was a problem hiding this comment.
Also return 404 for actor not found
Separate Chaos Endpoints for cleaner chaos request types Probability & Factor Toml validation Co-authored-by: Brian Sajeev <Brian-1402@users.noreply.github.com>
Chaos Refactored into apply/revert types
added delay chaos
Also formatting and clippy satisfaction
* added docs workflow * Update docs.yml * Update docs.yml #skipci * Update docs.yml [skip ci] * Update docs.yml [skip ci] * added browser based job controller * updated ui * formatting * Update api (#49) * updated generated api clients * clippy * Added #actor macro to register the actor (#50) * updated generated api clients * clippy * updated api to get list of registerd apis * wip ui * added how_to.md * added register actor macro
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR for chaos features in #41. Core logic done. Review and small fixes remain.
Progress:
General fixes:
404 Actor Not FoundQOF fixes:
Some notes/questions to resolve:
job_manager::NodeHandle. is that necessary? A clone ofNodeHandleis being made for each actor crashing async task, can this be avoided?stop_job()is executed before background running chaos timers are complete, what happens?stop_actornode endpoint takesRemoteActorInfoas input, but only actor name is really required to crash actor.Design thoughts for actor restarts and timed starts
ChaosEvent's of the respective actor from chaos schedule of the node -> Does the actor rx being closed handle this implicitly? -> Is it better to not remove theChaosEvents and have the node ignore them if the actor is not found?ChaosEvents. Receive chaos endpoint 404 and ignore.RemoteActorInfoisn't removed fromNodeHandle.actorsduring crashes, so we can reuse that info.SpawnArgsinsideNodeHandlesomewhere, mapped byactor_name.start_actorendpoint of the node? or do we callplacefunction of theNodeHandle?start_actor. Remaining code inplace()is for setting up chaos etc. which doesnt change with crashes.job_manager::NodeHandle.actors, will need for restart maybe.node::actor_control_loop()::local_actorsupdation for normal spawning of actors, there's no checking of another existing actor of same name. it just overrides that hashmap entry. This in general could be bad coz undefined behavior and wrong input not being handled. But could be useful if we want user to specify restarting of an actor but with different payload/settings. So should decide what the behavior is for crash and restart.notify_remote_actor_addedendpoint? Would we have to create anotify_remote_actor_removedendpoint as well when actor is stopped?notify_remote_actor_removedendpoint: It seems like it would be better to not have this endpoint and for that matter, not update any nodes state about remote actors being crashed. Let nodes have stale info about the remote actors.ActorUnreachable(Will allow for testing FT, network partitions etc).notify_remote_actor_addedendpoint here eitherChaosEventagain for the restarted actor? (If we had not removed theChaosEventof the actor then it should be there already)ChaosEventduring a crash.JobControllerstarts?