You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: packages/pynumaflow/README.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ To build the package locally, run the following command from the root of the pro
23
23
24
24
```bash
25
25
make setup
26
-
````
26
+
```
27
27
28
28
To run unit tests:
29
29
```bash
@@ -57,22 +57,21 @@ There are different types of gRPC server mechanisms which can be used to serve t
57
57
These have different functionalities and are used for different use cases.
58
58
59
59
Currently we support the following server types:
60
+
60
61
- Sync Server
61
62
- Asyncronous Server
62
63
- MultiProcessing Server
63
64
64
65
Not all of the above are supported for all UDFs, UDSource and UDSinks.
65
66
66
67
For each of the UDFs, UDSource and UDSinks, there are seperate classes for each of the server types.
67
-
This helps in keeping the interface simple and easy to use, and the user can start the specific server type based
68
-
on the use case.
68
+
This helps in keeping the interface simple and easy to use, and the user can start the specific server type based on the use case.
69
69
70
70
71
71
#### SyncServer
72
72
73
73
Syncronous Server is the simplest server type. It is a multithreaded threaded server which can be used for simple UDFs and UDSinks.
74
-
Here the server will invoke the handler functionfor each message. The messaging is synchronous and the server will wait
75
-
for the handler to return before processing the next message.
74
+
Here the server will invoke the handler function for each message. The messaging is synchronous and the server will wait for the handler to return before processing the next message.
Asyncronous Server is a multi threaded server which can be used for UDFs which are asyncronous. Here we utilize the asyncronous capabilities of Python to process multiple messages in parallel. The server will invoke the handler function for each message. The messaging is asyncronous and the server will not wait for the handler to return before processing the next message. Thus this server type is useful for UDFs which are asyncronous.
84
83
The handler function for such a server should be an async function.
85
84
86
-
```
85
+
```py
87
86
grpc_server = MapAsyncServer(handler)
88
87
```
89
88
90
89
#### MultiProcessServer
91
90
92
-
MultiProcess Server is a multi process server which can be used for UDFs which are CPU intensive. Here we utilize the multi process capabilities of Python to process multiple messages in parallel by forking multiple servers in different processes.
91
+
MultiProcess Server is a multi process server which can be used for UDFs which are CPU intensive. Here we utilize the multi process capabilities of Python to process multiple messages in parallel by forking multiple servers in different processes.
93
92
The server will invoke the handler function for each message. Individually at the server level the messaging is synchronous and the server will wait for the handler to return before processing the next message. But since we have multiple servers running in parallel, the overall messaging also executes in parallel.
94
93
95
94
This could be an alternative to creating multiple replicas of the same UDF container as here we are using the multi processing capabilities of the system to process multiple messages in parallel but within the same container.
@@ -140,7 +139,8 @@ should follow the same signature.
140
139
141
140
For using the class based handlers the user can inherit from the base handler class for each of the functionalities and implement the handler function.
142
141
The base handler class for each of the functionalities has the same signature as the handler function for the respective server type.
143
-
The list of base handler classes for each of the functionalities is given below -
142
+
The list of base handler classes for each of the functionalities is given below:
143
+
144
144
- UDFs
145
145
- Map
146
146
- Mapper
@@ -159,5 +159,5 @@ The list of base handler classes for each of the functionalities is given below
159
159
- SideInput
160
160
- SideInput
161
161
162
-
More details about the signature of the handler function for each of the server types is given in the
162
+
More details about the signature of the handler function for each of the server types is given in the
The Batch Mapper module offers tools for building BatchMap UDFs, allowing you to process multiple messages simultaneously. This enables more efficient handling of workloads such as bulk API requests or batch database operations by grouping messages and processing them together in a single operation.
The Map Streamer module provides classes and functions for implementing MapStream UDFs that stream results as they're produced.
4
+
Unlike regular Map which returns all messages at once, Map Stream yields messages one at a time as they're ready, reducing latency for downstream consumers.
The Reduce Streamer module provides classes and functions for implementing ReduceStream UDFs that emit results incrementally during reduction.
4
+
Unlike regular Reduce which outputs only when the window closes, Reduce Stream emits results as they're computed. This is useful for early alerts or real-time dashboards.
Side Input allows you to inject external data into your UDFs. This is useful for configuration, lookup tables, or any data that UDFs need but isn't part of the main data stream.
0 commit comments