You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* instance_manager: Refactor process_block to receive context by mutable reference
* instance_manager: Change 'first_run' variable name to 'should_try_unfail'
* instance_manager: Fix unfail mechanism based of determinism
Before the "PoI for failed subgraphs" feature, where we advance the
deployment head a block if the error is deterministic, we would do
the `unfail` logic after a block is successfully processed, like this:
```rust
match process_block() {
Ok(needs_restart) => {
if first_run {
store.unfail();
}
// ...
}
// ...
}
```
With this feature the `unfail` mechanism had to change, we did that by
changing two things about it:
- It would run before `process_block`
- Making it revert to the parent block if the error was deterministic
This worked fine for deterministic cases, however for non-deterministic
ones we faced an issue.
We need to run `unfail` after `process_block` went successful (returned
`Ok`). This is necessary because we can only unfail the deployment if
the subgraph actually advanced past the error block range.
Example with the `unfail` previous to this commit:
- Subgraph state:
- failed
- deployment head at 1399
- non-determinstic error happened at 1400
- Once the index-node gets restarted, `unfail` would run, but nothing
would happen since the deployment head only advances after a
successful `process_block` execution.
- Subgraph would continue to advance it's pointer but `unfail` wouldn't
run anymore (only once at start). This would make the subgraph be in
the `failed` state, but advancing it's pointer normally.
To fix the issue, we need to run `unfail` for non-deterministic errors
**after** the `process_block` went successful:
- Subgraph state:
- failed
- deployment head at 1399
- non-determinstic error happened at 1400
- Once the index-node gets restarted, `unfail_determinstic_error` would
run and be NOOP, then the next block would be processed (1400), the
error can be fixed (non-deterministic) when `process_block` gets to
run and returns `Ok`, then `unfail_non_determinstic_error` executes
and **unfails** the subgraph finally 🙌
- Subgraph continues to advance it's pointer and the status is correct
* deployment_store: Abstract common logic into a single function
* store: Add proper testing for all unfailing situations
* instance_manager: Improve comment for unfailing deterministic errors
* deployment_store: Remove unfail common function/abstraction
* store: Combine deployment::get_fatal_error_id with detail::error
0 commit comments