Skip to content

Commit 91a734f

Browse files
committed
Update Logbook
1 parent b22dc91 commit 91a734f

File tree

1 file changed

+55
-0
lines changed

1 file changed

+55
-0
lines changed

Logbook.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,60 @@
11
# Log Book
22

3+
## 2025-04-16
4+
5+
### Antithesis meeting
6+
7+
* We are able to find the network p2p bug -> the node crashes after 200s
8+
* 2 bugs in place -> an exception raised + change in exception policy that should only have killed the connection but killed the node
9+
* why should it take 1 hour to detect? there's an hourly scheduled churn for hosts
10+
* there are probably different ways of triggering the bug
11+
* working on eventually.sh script -> this might lead to triggering more interesting bugs
12+
* it's purpose is to causing the chain to diverge
13+
* we are all trying to write more easily properties
14+
* Q: what are the priorities?
15+
* block fetch bug (Brown M&Ms)? -> this is an old network bug (triggered by CPU load) when node takes too long to demote a peer
16+
* consensus bug -> we need to wipe the DB and restarting
17+
* There was a bug in the converge.sh script that would rewrite return code to 0
18+
* we need to change the way we handle SIGTERM -> exit code will be 1
19+
* There is a property that checks container exit codes, the problem is exit w/ 1 is not distinguishable from any other exit
20+
* AT could check the SIGTERM in the tester
21+
* Q: what about logs?
22+
* logs in file make it possible to leverage SDK
23+
* we are investigating how to write the program for asserting "sometimes it forks" and use shared volume
24+
* AT will write a fault injector that wipes out volume and restart a container
25+
* Also will write a test composer that can spin up a new container
26+
* delay startup of a node after some time?
27+
* wrap cardano-node to be able to start/stop/delay?
28+
* AT doesn't have disk faults for now
29+
* something that's been asked by other people
30+
31+
#### TODO:
32+
33+
* (AB) reach out to core team to test UTxO HD
34+
* discuss w/ Javier about fault simulation
35+
* (AT) schedule multiverse debugging session
36+
* (AT) start/stop containers wiping out dir
37+
* (KK) block fetch bug
38+
* (JL) write sometimes property using cardano-tracer
39+
* (AT) check SIGTERM exit
40+
* (AT) share docs of all the faults
41+
42+
## 2025-04-13
43+
44+
### Adding cardano-tracer
45+
46+
* One of the issues we had with our initial setup was with logging, as the antithesis platform puts some limits on the amount of log one can output, something which is even checked by a property, currently set at 200MB/core/CPU hour
47+
* The [cardano-tracer](https://github.com/IntersectMBO/cardano-node/blob/master/cardano-tracer) is the new recommended tracing infrastructure for cardano-node that provides a protocol and an agent to forward logs to. This allows logging and tracing across a cluster of nodes to be aggregated which is something that should prove useful to define properties
48+
* We have added the needed machinery in the [compose](compose) infrastructure:
49+
1. compile a docker image to run the cardano-tracer executable as it's not available in a pre-compiled form by default
50+
2. provide tracer configuration to expose prometheus metrics and enable connection from any number of other nodes
51+
3. modify node's configuration to enable tracing and logging, which was turned off by default
52+
4. run tracer container as part of compose stack along with cardano-node and "sidecar"
53+
* Some minor roadblocks we hit on this journey:
54+
* managing users and r/w rights across shared volumes can be tricky. All services are run with a non-privileged user `cardano` but volumes are mounted with owner `root` by default (could not find a way to designate a different user in [compose](https://docs.docker.com/reference/compose-file/volumes/) documentation). We resorted to the usual technique of wrapping up `cardano-tracer` service in a script that modifies owner and rights on the fly upon startup
55+
* for the cardano-node to forward traces require specific configuration, even though it's enabled by default since 10.2. if `TraceOptions` key is not present, the node won't start
56+
* While a first step would be to just read or follow the logs the cardano-tracer writes to files, the trace-forwarding protocol could be leveraged by write test and properties in a more direct manner, eg. to write a service ingesting logs and traces direclty and using default AT SDK to generate tests
57+
358
## 2025-04-09
459

560
### Meeting summary

0 commit comments

Comments
 (0)