Skip to content

Conversation

@avilagaston9
Copy link
Contributor

@avilagaston9 avilagaston9 commented Nov 4, 2024

Add User Error Metrics

Warning

This PR requires the deploy of new metrics/dashboards

Description

This PR adds a new panel to our batcher Grafana dashboard.

  • User Error Count.

User Error Count

Tracks the occurrence of the different user errors in the batcher. The error types included are:

  • InvalidChainId
  • InvalidSignature
  • ProofTooLarge
  • DisabledVerifier
  • RejectedProof
  • InsufficientBalance
  • InvalidNonce
  • EthRpcError
  • InvalidMaxFee
  • AddToBatchError
  • InvalidReplacementMessage
image

How to test

Run:

make anvil_start_with_block_time
make batcher_start_local
make run_metrics

Go to localhost:3000 and you should see the new panel in the batcher dashboard.

Then, you can modify the task sender in batcher/aligned/src/main.rs to trigger the different errors in the batcher. For example, you can modify the chain_id with an invalid value:

//let chain_id = get_chain_id(eth_rpc_url.as_str()).await?;
let chain_id: u64 = 4;

Or change the public_input, to get a rejected_proof error:

VerificationData {
        proving_system,
        proof,
        // pub_input,
        pub_input: None,
        verification_key,
        vm_program_code,
        proof_generator_addr,
    }

and run:

make batcher_send_risc0_burst

Type of change

  • New feature

Checklist

  • “Hotfix” to testnet, everything else to staging
  • Linked to Github Issue
  • This change depends on code or research by an external entity
    • Acknowledgements were updated to give credit
  • Unit tests added
  • This change requires new documentation.
    • Documentation has been added/updated.
  • This change is an Optimization
    • Benchmarks added/run
  • Has a known issue
  • If your PR changes the Operator compatibility (Ex: Upgrade prover versions)
    • This PR adds compatibility for operator for both versions and do not change batcher/docs/examples
    • This PR updates batcher and docs/examples to the newer version. This requires the operator are already updated to be compatible

@avilagaston9 avilagaston9 marked this pull request as ready for review November 5, 2024 14:01
@avilagaston9 avilagaston9 self-assigned this Nov 5, 2024
Copy link
Contributor

@JulianVentura JulianVentura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested it on my machine and worked very well.
I'll leave just some notes:

  • If no error happened, the grafana panel shows "No data" and I don't know if there is a problem with the metric or there aren't any errors. Can we initialize the total number with zero so the panel shows right from the start?
  • I tried to trigger some other user errors by modifying the provided data to the batcher and spotted a bug which I made an issue of. Maybe you want to address it later.

@avilagaston9
Copy link
Contributor Author

avilagaston9 commented Nov 5, 2024

If no error happened, the grafana panel shows "No data" and I don't know if there is a problem with the metric or there aren't any errors. Can we initialize the total number with zero so the panel shows right from the start?

IMO initializing every value to zero doesn’t seem like a clean approach in the code. Also, it is not that bad to show "no data" until the errors start to appear, it's how Grafana behaves in that case. Anyway, I’m open to changing it if that’s what you think is best!

I tried to trigger some other user errors by modifying the provided data to the batcher and spotted a bug which #1376. Maybe you want to address it later.

Nice catch! I agree this should be handled in a separate PR.
@JulianVentura

@avilagaston9 avilagaston9 linked an issue Nov 5, 2024 that may be closed by this pull request
@JulianVentura
Copy link
Contributor

IMO initializing every value to zero doesn’t seem like a clean approach in the code. Also, it is not that bad to show "no data" until the errors start to appear, it's how Grafana behaves in that case. Anyway, I’m open to changing it if that’s what you think is best!

Yes, it's ok, I just wanted to point it in case you didn't notice. We can leave it like that, if that's how Grafana usually behaves in these scenarios.

Copy link
Contributor

@MauroToscano MauroToscano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the disabled verifier error dedicated panel, and proving system panel, if the other covers these 2

@avilagaston9
Copy link
Contributor Author

Remove the disabled verifier error dedicated panel, and proving system panel, if the other covers these 2
@MauroToscano

Done!

@MauroToscano MauroToscano added this pull request to the merge queue Nov 7, 2024
Merged via the queue into staging with commit d589ea5 Nov 7, 2024
@MauroToscano MauroToscano deleted the user-errors-telemetry branch November 7, 2024 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add metrics of errored messages received in batcher

4 participants