Skip to content

Commit 6c77551

Browse files
QuentinIshenkeyao
authored andcommitted
Document on interesting log lines (#190)
1 parent 78d978e commit 6c77551

File tree

3 files changed

+145
-0
lines changed

3 files changed

+145
-0
lines changed

README_ESPRESSO.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,9 @@ docker volume prune -a
321321
the genesis file. Replace the corresponding `hash` field in `rollup-devnet.json`, then rerun the
322322
failed command.
323323
324+
### Log monitoring
325+
For a selection of important metrics to monitor for and corresponding log lines see `espresso/docs/metrics.md`
326+
324327
## Continuous Integration environment
325328
326329
### Running enclave tests in EC2

espresso/docs/metrics.md

Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
# Metrics
2+
3+
This document outlines the monitoring framework for our system components, organized into the following categories:
4+
5+
- **Key Metrics**: Metrics that belong on the dashboard for operational visibility
6+
- **Recoverable Errors**: Events that we need to monitor and raise alerts if they're encountered often, but do not necessarily lead to liveness or safety violations
7+
- **Critical Errors**: Events that need to raise urgent alerts as they indicate full chain stall or stall of the particular service
8+
- **Potential Issue Indicators**: Non-errors that can indicate preconditions for a problem to occur
9+
10+
Each indicator points to a log event to monitor.
11+
12+
## Batcher
13+
14+
### Key Metrics
15+
16+
Metrics that belong on the dashboard:
17+
18+
- Blocks enqueued for batching to L1/AltDA:
19+
`"Added L2 block to channel manager"`
20+
- Espresso batch submissions
21+
`"Submitted transaction to Espresso"`
22+
- L1 batch submissions
23+
`"Transaction confirmed"`
24+
- Espresso transaction queue size
25+
`"Espresso transaction submitter queue status"`
26+
- AltDA submissions
27+
`"Sent txdata to altda layer and received commitment"`
28+
- Espresso batches fetched
29+
`"Inserting accepted batch"`
30+
31+
### Recoverable Errors
32+
33+
Events that we need to monitor and raise alerts if they're encountered often:
34+
35+
- State reset (even once is suspicious)
36+
`"Clearing state"`
37+
- Espresso transaction creation failed
38+
`"Failed to derive batch from block"`
39+
- L1 submission failed
40+
`"Transaction failed to send"`
41+
- AltDA submission failed
42+
`"DA request failed"`
43+
- L2 reorg detected
44+
`"Found L2 reorg"`
45+
46+
### Critical Errors
47+
48+
- L1 finalized height not increasing
49+
- L2 unsafe height not increasing
50+
- L2 safe height not increasing
51+
52+
### Potential Issue Indicators
53+
54+
Non-errors that can indicate preconditions for a problem to occur:
55+
56+
- Gas price too high
57+
`effectiveGasPrice` field of `"Transaction confirmed"` log
58+
- Espresso transaction backlog is growing
59+
can be derived from Espresso transaction queue metrics above
60+
61+
## Caff Validator Node
62+
63+
### Key Metrics
64+
65+
- Espresso batches fetched
66+
`"Inserting accepted batch"`
67+
- New L1 safe blocks
68+
`"New L1 safe block"`
69+
- New L2 unsafe blocks
70+
`"Inserted new L2 unsafe block"`
71+
- New L2 safe blocks
72+
`"Derivation complete: reached L2 block as safe"`
73+
74+
### Recoverable Errors
75+
76+
- Pipeline errors
77+
`"Derivation process error"`
78+
- Malformed batch
79+
`"Dropping batch"`, `"Failed to parse frames"`
80+
81+
### Critical Errors
82+
83+
Events that need to raise urgent alerts as they indicate full chain stall:
84+
85+
- L1 finalized height not increasing
86+
- L2 unsafe height not increasing
87+
- L2 safe height not increasing
88+
89+
## Non-caff Validator Node
90+
91+
### Key Metrics
92+
93+
- New L1 safe blocks
94+
`"New L1 safe block"`
95+
- New L2 unsafe blocks
96+
`"Inserted new L2 unsafe block"`
97+
- New L2 safe blocks
98+
`"Derivation complete: reached L2 block as safe"`
99+
100+
### Recoverable Errors
101+
102+
- Pipeline errors
103+
`"Derivation process error"`
104+
- Malformed batch
105+
`"Dropping batch"`, `"Failed to parse frames"`
106+
107+
### Critical Errors
108+
109+
Events that need to raise urgent alerts as they indicate full chain stall:
110+
111+
- L1 finalized height not increasing
112+
- L2 unsafe height not increasing
113+
- L2 safe height not increasing
114+
115+
## Sequencer
116+
117+
All events of Decaff Validator Node, and:
118+
119+
### Key Metrics
120+
121+
- Blocks produced
122+
`"Sequencer sealed block"`
123+
124+
### Recoverable Errors
125+
126+
- Engine failure
127+
`"Engine failed temporarily, backing off sequencer"`
128+
- Engine reset
129+
`"Engine reset confirmed, sequencer may continue"`

op-batcher/batcher/espresso.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,13 +265,23 @@ func evaluateSubmission(jobResp espressoSubmitTransactionJobResponse) JobEvaluat
265265
// submitted, it will then submit a job to the verify receipt job queue to
266266
// verify the receipt of the transaction.
267267
func (s *espressoTransactionSubmitter) handleTransactionSubmitJobResponse() {
268+
ticker := time.NewTicker(10 * time.Minute)
269+
defer ticker.Stop()
270+
268271
for {
269272
var jobResp espressoSubmitTransactionJobResponse
270273
var ok bool
271274

272275
select {
273276
case <-s.ctx.Done():
274277
return
278+
case <-ticker.C:
279+
log.Info("Espresso transaction submitter queue status",
280+
"submitJobQueue", len(s.submitJobQueue),
281+
"submitRespQueue", len(s.submitRespQueue),
282+
"verifyReceiptJobQueue", len(s.verifyReceiptJobQueue),
283+
"verifyReceiptRespQueue", len(s.verifyReceiptRespQueue))
284+
continue
275285
case jobResp, ok = <-s.submitRespQueue:
276286
if !ok {
277287
// Our channel is closed, and we are done
@@ -528,6 +538,9 @@ func espressoSubmitTransactionWorker(
528538

529539
// Submit the transaction to Espresso
530540
hash, err := cli.SubmitTransaction(ctx, *jobAttempt.job.transaction)
541+
if err == nil {
542+
log.Info("submitted transaction to Espresso", "hash", hash)
543+
}
531544

532545
jobAttempt.job.attempts++
533546
resp := espressoSubmitTransactionJobResponse{

0 commit comments

Comments
 (0)