|
| 1 | +Kafka Journal |
| 2 | +============= |
| 3 | + |
| 4 | +# Requirements: |
| 5 | +* Journal should be operational even when replication is offline, ideally up to topic retention period. Let's say 24h |
| 6 | +* Journal will operate on huge streams of events, it is impossible to fit all in memory |
| 7 | + |
| 8 | +# The challenge |
| 9 | +Stream data from two storage when one is not consistent and the second lost his tail. |
| 10 | + |
| 11 | +You might naively think that `kafka-journal` stores events in Kafka. That is not really true. It stores actions. |
| 12 | + |
| 13 | +# Actions |
| 14 | + |
| 15 | +Actions: |
| 16 | +* **Append**: Kafka record that contains list of events saved atomically |
| 17 | +* **Mark**: With help of `Mark` action we can prevent from consuming Kafka infinitely and stop upon marker found in topic |
| 18 | +* **Delete**: Indicates attempt to delete all existing events up to passed `seqNr`. It will not delete future events |
| 19 | + |
| 20 | +# Reading flow |
| 21 | + |
| 22 | +1. `Action.Mark` pushed to the topic in parallel with querying Cassandra for processed topic offsets |
| 23 | +2. Initiation query to Cassandra with streaming capabilities in parallel with initiating consumer from offsets just to |
| 24 | + make sure there are no deletions |
| 25 | +3. Start streaming data from Cassandra filtering out deleted records |
| 26 | +4. When finished reading from Cassandra might need to start consuming data from Kafka, in case replication is slow or |
| 27 | + does not happen at the moment |
| 28 | + |
| 29 | +# Performance optimisations |
| 30 | + |
| 31 | +1. If `Action.Mark`'s offset <= topic offsets from eventual storage, we don't need to consume Kafka at all (impossible best case |
| 32 | + scenario, hope dies last) |
| 33 | +2. No need to wait for `Action.Mark` produce completed to start querying Cassandra |
| 34 | +3. In case reading Kafka while looking up for `Action.Delete`, we can buffer some part of head to not initiate second |
| 35 | + read (most common scenario, replicating app is on track) |
| 36 | +4. We don't need to initiate second consumer in case we managed to replicate all within reading duration |
| 37 | +5. Use pool of consumers and cache either head or deletions |
| 38 | +6. Subscribe for particular partition rather than the whole topic, however this limits caching capabilities |
| 39 | +7. Don't deserialise unrelated Kafka records |
| 40 | + |
| 41 | +# Corner cases |
| 42 | + |
| 43 | +* We cannot stream data to client before making sure there are no `Action.Delete` in not yet replicated Kafka head |
| 44 | + |
| 45 | +# TODO |
| 46 | + |
| 47 | +* Measure three important operations of Kafka consumer: init, seek, poll. So we can optimise journal even better |
| 48 | +* More corner cases to come in order to support re-partition [^_^] |
| 49 | +* Decide on when to clean/cut head caches |
0 commit comments