|
| 1 | +# 9. Kafka streaming |
| 2 | + |
| 3 | +## Status |
| 4 | + |
| 5 | +Current |
| 6 | + |
| 7 | +## Context |
| 8 | + |
| 9 | +Many facilities stream bluesky documents to an event-bus for consumption by out-of-process listeners. |
| 10 | +Event buses used for this purpose at other facilities include ZeroMQ, RabbitMQ, Kafka, Redis, NATS, and |
| 11 | +others. |
| 12 | + |
| 13 | +The capability this provides is that callbacks can be run in different processes or on other computers, |
| 14 | +without holding up or interfering with the local `RunEngine`. Other groups at ISIS have expressed some |
| 15 | +interest in being able to subscribe to bluesky documents. |
| 16 | + |
| 17 | +## Decision |
| 18 | + |
| 19 | +- We will stream our messages to Kafka, as opposed to some other message bus. This is because we already |
| 20 | +have Kafka infrastructure available for other purposes (e.g. event data & sample-environment data). |
| 21 | +- At the time of writing, we will not **depend** on Kafka for anything critical. This is because the |
| 22 | +central Kafka instance is not currently considered "reliable" in an experiment controls context. However, |
| 23 | +streaming the documents will allow testing to be done. Kafka will eventually be deployed in a "reliable" |
| 24 | +way accessible to each instrument. |
| 25 | +- We will encode messages from bluesky using `msgpack` (with the `msgpack-numpy` extension), because: |
| 26 | + - It is the default encoder used by the upstream `bluesky-kafka` integration |
| 27 | + - It is a schema-less encoder, meaning we do not have to write/maintain fixed schemas for all the |
| 28 | +documents allowed by `event-model` |
| 29 | + - It has reasonable performance in terms of encoding speed and message size |
| 30 | + - `msgpack` is very widely supported in a range of programming languages |
| 31 | +- Kafka brokers will be configurable via an environment variable, `IBEX_BLUESKY_CORE_KAFKA_BROKER` |
| 32 | + |
| 33 | +```{note} |
| 34 | +Wherever Kafka is mentioned above, the actual implementation may be a Kafka-like (e.g. RedPanda). |
| 35 | +``` |
| 36 | + |
| 37 | +### Alternatives considered |
| 38 | + |
| 39 | +Encoding bluesky documents into JSON and then wrapping them in the |
| 40 | +[`json_json.fbs` flatbuffers schema](https://github.com/ess-dmsc/streaming-data-types/blob/58793c3dfa060f60b4a933bc085f831744e43f17/schemas/json_json.fbs) |
| 41 | +was considered. |
| 42 | + |
| 43 | +We chose `msgpack` instead of json strings + flatbuffers because: |
| 44 | +- It is more standard in the bluesky community (e.g. it is the default used in `bluesky-kafka`) |
| 45 | +- Bluesky events will be streamed to a dedicated topic, which is unlikely to be confused with data |
| 46 | +using any other schema. |
| 47 | + |
| 48 | +Performance/storage impacts are unlikely to be noticeable for bluesky documents, but nonetheless: |
| 49 | +- `msgpack`-encoded documents are 30-40% smaller than `json` + flatbuffers |
| 50 | +for a typical bluesky document |
| 51 | +- `msgpack`-encoding messages is ~5x faster than `json` + flatbuffers encoding |
| 52 | +for a typical bluesky document. |
| 53 | + |
| 54 | +## Justification & Consequences |
| 55 | + |
| 56 | +We will stream bluesky documents to Kafka, encoded using `msgpack-numpy`. |
| 57 | + |
| 58 | +At the time of writing this is purely to enable testing, and will not be used for "production" workflows. |
0 commit comments