Skip to content

Commit 4c0f880

Browse files
author
Denys Fakhritdinov
committed
revert notes.md deletion
1 parent 5ccc229 commit 4c0f880

1 file changed

Lines changed: 49 additions & 0 deletions

File tree

notes.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
Kafka Journal
2+
=============
3+
4+
# Requirements:
5+
* Journal should be operational even when replication is offline, ideally up to topic retention period. Let's say 24h
6+
* Journal will operate on huge streams of events, it is impossible to fit all in memory
7+
8+
# The challenge
9+
Stream data from two storage when one is not consistent and the second lost his tail.
10+
11+
You might naively think that `kafka-journal` stores events in Kafka. That is not really true. It stores actions.
12+
13+
# Actions
14+
15+
Actions:
16+
* **Append**: Kafka record that contains list of events saved atomically
17+
* **Mark**: With help of `Mark` action we can prevent from consuming Kafka infinitely and stop upon marker found in topic
18+
* **Delete**: Indicates attempt to delete all existing events up to passed `seqNr`. It will not delete future events
19+
20+
# Reading flow
21+
22+
1. `Action.Mark` pushed to the topic in parallel with querying Cassandra for processed topic offsets
23+
2. Initiation query to Cassandra with streaming capabilities in parallel with initiating consumer from offsets just to
24+
make sure there are no deletions
25+
3. Start streaming data from Cassandra filtering out deleted records
26+
4. When finished reading from Cassandra might need to start consuming data from Kafka, in case replication is slow or
27+
does not happen at the moment
28+
29+
# Performance optimisations
30+
31+
1. If `Action.Mark`'s offset <= topic offsets from eventual storage, we don't need to consume Kafka at all (impossible best case
32+
scenario, hope dies last)
33+
2. No need to wait for `Action.Mark` produce completed to start querying Cassandra
34+
3. In case reading Kafka while looking up for `Action.Delete`, we can buffer some part of head to not initiate second
35+
read (most common scenario, replicating app is on track)
36+
4. We don't need to initiate second consumer in case we managed to replicate all within reading duration
37+
5. Use pool of consumers and cache either head or deletions
38+
6. Subscribe for particular partition rather than the whole topic, however this limits caching capabilities
39+
7. Don't deserialise unrelated Kafka records
40+
41+
# Corner cases
42+
43+
* We cannot stream data to client before making sure there are no `Action.Delete` in not yet replicated Kafka head
44+
45+
# TODO
46+
47+
* Measure three important operations of Kafka consumer: init, seek, poll. So we can optimise journal even better
48+
* More corner cases to come in order to support re-partition [_]
49+
* Decide on when to clean/cut head caches

0 commit comments

Comments
 (0)