Skip to content

Commit 13fd979

Browse files
committed
update logbook
1 parent 1695ede commit 13fd979

File tree

1 file changed

+36
-0
lines changed

1 file changed

+36
-0
lines changed

Logbook.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,41 @@
11
# Log Book
22

3+
## 2025-04-09
4+
5+
### Meeting summary
6+
7+
* We had our weekly touchpoint with AT team and CF Task Force
8+
* Over the past week we refined our understand of the platform and the CF team is working on a branch with several different setups, including one "large" cluster test where P2P networking can [lead to a crash](https://github.com/IntersectMBO/ouroboros-network/issues/5058) on some particular circumstances
9+
* We discussed our difficulties chasing that particular network bug which requires more runtime than is customary with AT runs
10+
* We can reproduce the bug using a local docker compose setup so we know it's possible to do it
11+
* We could not find a way to "miniaturise" the bug, e.g reduce network size, parameters, slot duration, $k$, etc.
12+
* A discussion about the kind of faults we are looking at ensued:
13+
* network related faults, like the one above, which should be related to handling of adversarial conditions in the system
14+
* consensus faults which are related to the diffusion logic and chain selection, also depend on resources (eg. selection and diffusiong needs to happen quickly) but mostly related to correct implementation of Ouroboros Praos protocol
15+
* mempool faults, which are related to diffusion of txs, and can have adversarial effect on the system because of excessive use of resources, race conditions, competing with other parts, etc.
16+
* ledger faults which relate to the block/transaction evaluation
17+
* AT work should focus (at least for now?) on the first three as ledger is a pure function, although it could be the case an error in the ledger ripples to other layers (eg. diverging computations, unexpected errors, etc.)
18+
* we note that all layers have an extensive set of property tests
19+
* it's still unclear how to be best use the tools. AT engine uses different computing characteristics for different workloads/SUTs
20+
* It's possible to calibrate those performance characteristics in order to better replicate environment but this is not open to customers
21+
* One problem we was the logs output limit. Its purpose is to help reproducing things faster as obviously more output leads to more resources for each run
22+
* While there are certainly bugs that can be triggered through fuzzing I/Os, we need to have a way to run "adversaries" within the system, to inject blocks/transactions to load the system and possibly trigger issues in consensus, mempool, or ledger
23+
* We don't currently deploy anything like that in our stack, but we should build something
24+
* We discuss how AT does fault injection
25+
* it's completely rerandomized on purpose, in order to remove human biases as much as possible
26+
* with more information about baseline guarantees expected from the system, some issues found could be filter out, eg. triggering errors that are beyond the operational limits of the system
27+
* there's no API to control the various parameters for fault injection
28+
29+
TODOs:
30+
31+
1. extend test execution time to reproduce network bug, and explore how AT analysis can help to understand the bug
32+
1. try to miniaturise the setup to reproduce the bug
33+
2. try to reproduce a consensus bug that requires injection of data
34+
a. implies we need to build some tool to inject blocks/data in the network
35+
3. defining & refining general security properties
36+
4. integrate [cardano-tracer](https://developers.cardano.org/docs/get-started/cardano-node/new-tracing-system/cardano-tracer/) into compose stack to be able to collect logs
37+
5. express assertion on logs collected with tracer
38+
339
## 2025-04-02
440

541
### Official kick-off meeting

0 commit comments

Comments
 (0)