|
| 1 | +## BatchMap Interface |
| 2 | +The BatchMap interface allows developers to |
| 3 | +process multiple data items together in a single UDF handler. |
| 4 | + |
| 5 | + |
| 6 | +### What is BatchMap? |
| 7 | +BatchMap is an interface that allows developers to process multiple data items |
| 8 | +in a UDF single call, rather than each item in separate calls. |
| 9 | + |
| 10 | + |
| 11 | +The BatchMap interface can be helpful in scenarios |
| 12 | +where performing operations on a group of data can be more efficient. |
| 13 | + |
| 14 | + |
| 15 | +### Understanding the User Interface |
| 16 | +The BatchMap interface requires developers to implement a handler with a specific signature. |
| 17 | +Here is the signature of the BatchMap handler: |
| 18 | + |
| 19 | +```python |
| 20 | +async def handler(datums: AsyncIterable[Datum]) -> BatchResponses: |
| 21 | +``` |
| 22 | +The handler takes an iterable of `Datum` objects and returns |
| 23 | +`BatchResponses`. |
| 24 | +The `BatchResponses` object is a list of the *same length* as the input |
| 25 | +datums, with each item corresponding to the response for one request datum. |
| 26 | + |
| 27 | +To clarify, let's say we have three data items: |
| 28 | + |
| 29 | +``` |
| 30 | +data_1 = {"name": "John", "age": 25} |
| 31 | +data_2 = {"name": "Jane", "age": 30} |
| 32 | +data_3 = {"name": "Bob", "age": 45} |
| 33 | +``` |
| 34 | + |
| 35 | +These data items will be grouped together by numaflow and |
| 36 | +passed to the handler as an iterable: |
| 37 | + |
| 38 | +```python |
| 39 | +result = await handler([data_1, data_2, data_3]) |
| 40 | +``` |
| 41 | + |
| 42 | +The result will be a BatchResponses object, which is a list of responses corresponding to each input data item's processing. |
| 43 | + |
| 44 | +### Important Considerations |
| 45 | +When using BatchMap, there are a few important considerations to keep in mind: |
| 46 | + |
| 47 | +- Ensure that the `BatchResponses` object is tagged with the *correct request ID*. |
| 48 | +Each Datum has a unique ID tag, which will be used by Numaflow to ensure correctness. |
| 49 | + |
| 50 | +```python |
| 51 | +async for datum in datums: |
| 52 | + batch_response = BatchResponse.from_id(datum.id) |
| 53 | +``` |
| 54 | + |
| 55 | + |
| 56 | +- Ensure that the length of the `BatchResponses` |
| 57 | +list is equal to the number of requests received. |
| 58 | +**This means that for every input data item**, there should be a corresponding |
| 59 | +response in the BatchResponses list. |
| 60 | + |
| 61 | +Use batch processing only when it makes sense. In some |
| 62 | +scenarios, batch processing may not be the most |
| 63 | +efficient approach, and processing data items one by one |
| 64 | +could be a better option. |
| 65 | +The burden of concurrent processing of the data will rely on the |
| 66 | +UDF implementation in this use case. |
0 commit comments